Advanced HBase Shell

Recently, I’ve had to do some advanced HBase Shell scripting to check data consistency.

Tip # 1 – You can easily establish min/max times in nanoseconds from BASH and feed them into your script.

  $(date +%s%6N) 

gets you 6 digits of precision 3 for millis 3 for nanos.

Tip # 2 – Use the ‘include Java’ line to get access to all JARs that HBase has access to.

Tip # 3 – Forget about Bytes…. convert to Base64 to make it human readable (and you don’t want to break lines – the number 8 keeps it from wrapping). https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Base64.html#DONT_BREAK_LINES

import org.apache.hadoop.hbase.util.Base64;
content = Bytes.toString(tableName.getValue(Bytes.toBytes("m"),Bytes.toBytes("d")))
x = Base64.encodeBytes(r.getRow(), 8)
puts "#{x}"

Tip # 4 – use GSON to parse JSON efficiently across a scan.

import com.google.gson.JsonParser;
parser = JsonParser.new
jsonBody = Bytes.toString(tableName.getValue(Bytes.toBytes("d"),Bytes.toBytes("b")))

json = parser.parse(jsonBody)
object = json.getAsJsonObject()
metaObj = object.get('mx')
objVer = metaObj.get('vid').getAsString()
objId = object.get('id').getAsString()

Tip #5 – use it as a script

time /usr/iop/current/hbase-client/bin/hbase org.jruby.Main /tmp/check.rb > check.log

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.