Spark and Data Tips for November 2018

Full Hadoop / HBase data platform for testing spark I found the following docker very handy for testing hadoop. https://hub.docker.com/r/bigdatauniversity/spark2/ docker pull bigdatauniversity/spark2 docker run -it –name bdu_spark2 -P -p 4040:4040 -p 4041:4041 -p 8080:8080 -p 8081:8081 bigdatauniversity/spark2:latest /etc/bootstrap.sh -bash Spark Notebooks I found these sites useful – http://spark-notebook.io/ and https://github.com/spark-notebook/spark-notebook and https://github.com/IBM?language=jupyter+notebook Version Mismatch […]

UCD: Application Processes Branching

Urban Code Deploy (UCD) is a tool we use to manage the deployment of our healthcare platform. I needed to branch between two different processes, and the setup and steps to get branching done between Application processes was not clearly documented. I built some custom bash logic to switch based on results: #!/bin/bash if [ […]

Maven Repository – Go Offline with dependencies

Maven Repository My team uses the pom.xml to generate a repository which is handed off to the secondary developers. For instance, I have a custom db2 jar ## Update your localRepository – Start a Shell – cd ~/.m2 – vim settings.xml – add `<localRepository>/Users/userid/git/client-app/documentation/repo/local_repo</localRepository>` Note: the path is relative to the location of my repo […]

Hadoop KMS Ranger API – Tips and cURLs

I use Hadoop KMS Ranger in one environment. Some sample rest api calls are below, along with two tips. versionName is used in multiple queries. When not using kerberos – set ?user.name=hdfs on the URL   References https://hadoop.apache.org/docs/current/hadoop-kms/index.html#KMS_HTTP_REST_API https://hadoop.apache.org/docs/current/hadoop-kms/index.html#Get_Key_Names https://stackoverflow.com/questions/37601763/authentication-issue-with-kms-hadoop

Lightweight HBase Client

As many developers know, HBase’s default client has everything, netting 10s of Megabytes of size. Lilyproject reduces this to a more manageable and useful size. https://github.com/NGDATA/lilyproject/blob/master/global/hbase-client/pom.xml <dependency> <groupId>org.lilyproject</groupId> <artifactId>lily-hbase-client</artifactId> <version>2.6.1</version> </dependency>

Ambari All Sorts of Messed Up

My team and I run Ambari and Ambari agents which controls our HDFS/HBase and general HADOOP/Apache ecosystem machines.  Our bare metal machines hung, and we could not get anything restarted. In the logs, we had: {‘msg’: ‘Unable to read structured output from /var/lib/ambari-agent/data/structured-out-status.json’} We found a link at https://community.hortonworks.com/content/supportkb/49517/services-are-running-but-ambari-reports-them-faile.html and the fix. Remove /var/lib/ambari-agent/data/structured-out-status.json Restart […]