Maven Thread Speed Up

Like many developers, I have tons of jobs running to compile, unit and integration test my code.  These jobs take anywhere from 30 seconds to 30 minutes.

Some simple operations took a while…. I wondered why… Thanks to Oleg @ ZeroTurnAround I have an answer – Your Maven build is slow. Speed it up!

I applied the setting to speed up my build (30 minutes dropped to 10 minutes)

mvn clean package -T 4 -S local-m2/settings.xml

I hope this helps others.

Hadoop KMS Ranger API – Tips and cURLs

I use Hadoop KMS Ranger in one environment. Some sample rest api calls are below, along with two tips.

versionName is used in multiple queries.

When not using kerberos – set ?user.name=hdfs on the URL

 

References
https://hadoop.apache.org/docs/current/hadoop-kms/index.html#KMS_HTTP_REST_API
https://hadoop.apache.org/docs/current/hadoop-kms/index.html#Get_Key_Names
https://stackoverflow.com/questions/37601763/authentication-issue-with-kms-hadoop

Ambari All Sorts of Messed Up

My team and I run Ambari and Ambari agents which controls our HDFS/HBase and general HADOOP/Apache ecosystem machines.  Our bare metal machines hung, and we could not get anything restarted.

In the logs, we had:

{'msg': 'Unable to read structured output from /var/lib/ambari-agent/data/structured-out-status.json'}

We found a link at https://community.hortonworks.com/content/supportkb/49517/services-are-running-but-ambari-reports-them-faile.html and the fix.

  1. Remove /var/lib/ambari-agent/data/structured-out-status.json
  2. Restart ambari agent.

Our ambari and setup now works.

VIM – JOIN Conditions with Unicode and ASCII

JOIN Conditions with Unicode and ASCII

I cannot stress the dangers of copying data from Excel or HTML and assuming that it’s ASCII. For example U+0040 is the unicode version of @. We ingested the unicode version and couldn’t see why a JOIN condition on the data table wasn’t working.

I looked at the source JSON ( a FHIR DSTU2 Group ) and loaded in VIM and used the following trick:

set encoding=latin1

We ended up showing that our data table’s contents were different using:

SELECT HEX(RESOURCE_VALUE) FROM FHIR.DIM_GROUP
0A40 vs 40

References

https://unix.stackexchange.com/questions/108020/can-vim-display-ascii-characters-only-and-treat-other-bytes-as-binary-data

Remove Duplicates in DB2 Columnar Format

I had dupe data in my OLAP table, where the columnar data can be duplicated based on event id. (I loaded data 2x). I had to differentiate the data and remove the duplicates, so I assigned row_numbers over a partition ordered by.

I hope this helps you.
db2 "update (select OME.*, row_number() over(partition by IDN_EVENT_ID order by IDN_EVENT_ID) as rnk from X.OLAP OME) set APP_NM = rnk"

Then I removed using this.
db2 "DELETE X.OLAP OME WHERE APP_NM = 2"

I recommend the two-phase, as you can in theory run this en batch, or async, and double check, versus hope it works.