Fix DB_RUNRECOVERY

Ran into this issue today, and a fast fix.

[root@xyz ~]# yum
mut_tas:172, pid: 16043, flag: 19
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 – (-30973)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed
[root@xyz ~]#

rpm –rebuilddb

[root@xyz~]# rpm –rebuilddb
[root@xyz ~]# yum
Loaded plugins: search-disabled-repos
You need to give some command
Usage: yum [options] COMMAND

Kerberos and Java

I have worked on a kerberos smoke test for my team. I learned a few tips in the process.

The useTicketCache is a preferred use in case the java process dies while the KDC is down.

HBase Canary Testing runs on a kerberos enabled cluster using hbase canaryhttp://hbase.apache.org/book.html#trouble.client

If you are port forwarding over SSH, you’ll want to switch to tcp using this trick in your krb5.conf file. Thanks to IBM’s site, it’s an easy fix… https://www.ibm.com/support/knowledgecenter/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/tsec_kerb_create_conf.html

A working example for Kerberos is as follows:

The site Kerberos Java site describes in detail how to build a kerberos client.

Forwarding DGram in node.js

For a project I am working on I needed to rewrite a DGram port. I moved the ports around and found a few quick tests.

Testing with NC

my-machine:~$ echo -n “data-message” | nc -v -4u -w1 localhost 88
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif (null)
src 127.0.0.1 port 53862
dst 127.0.0.1 port 88
rank info not available
Connection to localhost port 88 [udp/radan-http] succeeded!

Rewriting incoming datagrams to another port

You can run the sample, and get the results as follows

server listening 0.0.0.0:88
server got: j��0����

COS and Hadoop FS issue

I ran into this issue with Python and IBM Cloud Object Storage.

Py4JJavaError: An error occurred while calling o34.parquet.
: java.io.IOException: No FileSystem for scheme: cos
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:547)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:643)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

I applied a quick fix: pyspark –packages com.ibm.stocator:stocator:1.0.24

https://github.com/ibm-watson-data-lab/ibmos2spark/tree/master/python

https://github.com/ibm-watson-data-lab/ibmos2spark/tree/master/python
https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f
https://stackoverflow.com/questions/46011671/no-filesystem-for-scheme-cos



Kafka, Zookeeper… and Kerberos

My team runs a Kafka service for data ingestion, we ran across a rare timeout when our main Key Distribution Center (KDC) went down. When the zookeeper service restarted, zookeeper worked flawlessly. I checked the services with the zookeeper four-letter commands. However, the kafka-broker/zookeeper startup and authentication failed and the brokers went down.

We checked each system with the following:

echo ruok | nc localhost 2181
iamok

echo stat | nc localhost 2181
Zookeeper version: 3.4.6-1569965, built on XXXXXX
Clients:
/10.10.10.10:3888[1](queued=0,recved=95261,sent=95261)

Latency min/avg/max: 0/0/316
Received: 1
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x2100000000
Mode: follower
Node count: 200

We checked zookeeper with Kerberos/JAAS using the shell.

export JVMFLAGS="-Djava.security.auth.login.config=/usr/iop/current/kafka-broker/config/kafka_jaas.conf"

zookeeper-client -server `hostname --long`:2181 ls /

You’ll see a failover to the secondary server after 90 seconds and the final list. This indicates that the server finally fails over the secondary KDC.

[root@kafka-1 ~]# export JVMFLAGS="-Djava.security.auth.login.config=/usr/iop/current/kafka-broker/config/kafka_jaas.conf"
[root@kafka-1 ~]# time zookeeper-client -server `hostname --long`:2181 ls /
Connecting to kafka-1:2181
2019-03-12 17:31:44,765 - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-IBM_4--1, built on 06/17/2016 01:58 GMT
2019-03-12 17:31:44,767 - INFO [main:Environment@100] - Client environment:host.name=kafka-1
2019-03-12 17:31:44,767 - INFO [main:Environment@100] - Client environment:java.version=1.8.0_77
2019-03-12 17:31:44,769 - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2019-03-12 17:31:44,769 - INFO [main:Environment@100] - Client environment:java.home=/usr/jdk64/java-1.8.0-openjdk/jre
2019-03-12 17:31:44,769 - INFO [main:Environment@100] - Client environment:java.class.path=/usr/iop/4.2.0.0/zookeeper/bin/../build/classes:/usr/iop/4.2.0.0/zookeeper/bin/../build/lib/*.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/log4j-1.2.17.jar:/usr/iop/4.2.0.0/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/iop/4.2.0.0/zookeeper/bin/../zookeeper-3.4.6_IBM_4.jar:/usr/iop/4.2.0.0/zookeeper/bin/../src/java/lib/*.jar:/usr/iop/4.2.0.0/zookeeper/conf::/usr/iop/4.2.0.0/zookeeper/conf:/usr/iop/4.2.0.0/zookeeper/zookeeper-3.4.6_IBM_4.jar:/usr/iop/4.2.0.0/zookeeper/zookeeper.jar:/usr/iop/4.2.0.0/zookeeper/lib/jline-0.9.94.jar:/usr/iop/4.2.0.0/zookeeper/lib/log4j-1.2.17.jar:/usr/iop/4.2.0.0/zookeeper/lib/netty-3.7.0.Final.jar:/usr/iop/4.2.0.0/zookeeper/lib/slf4j-api-1.6.1.jar:/usr/iop/4.2.0.0/zookeeper/lib/slf4j-log4j12-1.6.1.jar:/usr/share/zookeeper/*
2019-03-12 17:31:44,769 - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-03-12 17:31:44,769 - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:os.name=Linux
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:os.arch=amd64
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:os.version=3.10.0-514.21.1.el7.x86_64
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:user.name=root
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:user.home=/root
2019-03-12 17:31:44,770 - INFO [main:Environment@100] - Client environment:user.dir=/root
2019-03-12 17:31:44,771 - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=kafka-1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@68de145
2019-03-12 17:31:44,875 - INFO [main-SendThread(kafka-1:2181):Login@327] - successfully logged in.
2019-03-12 17:31:44,878 - INFO [Thread-0:Login$1@156] - TGT refresh thread started.
2019-03-12 17:31:44,882 - INFO [main-SendThread(kafka-1:2181):ZooKeeperSaslClient$1@285] - Client will use GSSAPI as SASL mechanism.
2019-03-12 17:31:44,892 - INFO [Thread-0:Login@335] - TGT valid starting at: Tue Mar 12 17:31:44 UTC 2019
2019-03-12 17:31:44,892 - INFO [Thread-0:Login@336] - TGT expires: Wed Mar 13 17:31:44 UTC 2019
2019-03-12 17:31:44,892 - INFO [Thread-0:Login$1@210] - TGT refresh sleeping until: Wed Mar 13 12:45:17 UTC 2019
2019-03-12 17:31:44,894 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@975] - Opening socket connection to server kafka-1/192.168.1.1:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2019-03-12 17:31:44,952 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@852] - Socket connection established to kafka-1/192.168.1.1:2181, initiating session
2019-03-12 17:31:44,966 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server kafka-1/192.168.1.1:2181, sessionid = 0x16972ce2b3f002b, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

WATCHER::

WatchedEvent state:SaslAuthenticated type:None path:null
[controller_epoch, controller, brokers, zookeeper, kafka-acl, kafka-acl-changes, admin, isr_change_notification, consumers, config]

real 0m1.328s
user 0m0.573s
sys 0m0.102s

We removed the down KDC in our /etc/krb5.conf file. (We eventually added it back when the server was restarted). When I executed the same command as above, we were able to return the system to operation and reduce the time to get a ticket and authorize our services on startup.

019-03-12 17:33:09,364 - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=kafka-1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@68de145
2019-03-12 17:33:09,460 - INFO [main-SendThread(kafka-1:2181):Login@327] - successfully logged in.
2019-03-12 17:33:09,462 - INFO [Thread-0:Login$1@156] - TGT refresh thread started.
2019-03-12 17:33:09,465 - INFO [main-SendThread(kafka-1:2181):ZooKeeperSaslClient$1@285] - Client will use GSSAPI as SASL mechanism.
2019-03-12 17:33:09,477 - INFO [Thread-0:Login@335] - TGT valid starting at: Tue Mar 12 17:33:09 UTC 2019
2019-03-12 17:33:09,478 - INFO [Thread-0:Login@336] - TGT expires: Wed Mar 13 17:33:09 UTC 2019
2019-03-12 17:33:09,478 - INFO [Thread-0:Login$1@210] - TGT refresh sleeping until: Wed Mar 13 13:27:51 UTC 2019
2019-03-12 17:33:09,479 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@975] - Opening socket connection to server kafka-1/192.168.1.1:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2019-03-12 17:33:09,536 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@852] - Socket connection established to kafka-1/192.168.1.1:2181, initiating session
2019-03-12 17:33:09,554 - INFO [main-SendThread(kafka-1:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server kafka-1/192.168.1.1:2181, sessionid = 0x16972ce2b3f002c, negotiated timeout = 30000

real 0m0.718s
user 0m0.573s
sys 0m0.102s

I hope this helps you get your service back up and working.

Running a Long Running Thread in the HBase shell

I had to write a fast script to process data into HBase (it’s fast and dirty), and I knew there was a likelihood of disconnection. I found a small tip in case you get disconnected. You can use nohup along with the hbase shell. I hope it helps you.

#constants
LOG_FILE_OUT=/var/log/logged-action.log
LOG_FILE_ERR=/var/log/logged-action.err

# Create a ruby file
cat << EOF > test.rb
include Java
print("Starting the Export")
import java.lang.Thread
Thread.sleep(10000)
print("\ndone waiting")
STDOUT.flush
EOF

nohup /usr/iop/current/hbase-client/bin/hbase shell -n \
test.rb "${@}" > ${LOG_FILE_OUT} 2> ${LOG_FILE_ERR} &

Tracking down RPM install dates/reasons/who

I had to find the date of an RPM install to track the lineage of an RPM. I found two very helpful commands. The “rpm -q basesystem –qf ‘%{installtime:date}\n'”, which I found thanks to StackExchange, was exceptionally helpful for placing the date/time of an installed RPM. The rpm -qi basesystem also provided some excellent additional details.

The RPM query returned the UTC time of the installed RPM

[root@vm ~]# rpm -q basesystem --qf '%{installtime:date}\n'
Thu 03 Mar 2016 04:22:55 PM UTC

To get further details on install times and histories, I can see the ordering with yum history list all. I am able to pinpoint who, installed and at what time. The lineage was critical to trace back to the automation which kicked off the installation. (Note: I use basesystem as an example, but it can be used to estimate the time the OS was installed).

[paul@vm ~]# yum history list all
Loaded plugins: search-disabled-repos
ID | Login user | Date and time | Action(s) | Altered
-------------------------------------------------------------------------------
363 | System <unset> | 2019-01-09 19:53 | Install | 2
362 | System <unset> | 2019-01-07 19:10 | Erase | 2
361 | System <unset> | 2019-01-07 19:04 | Install | 1
...
3 | cloud | 2018-05-14 23:45 | Install | 1
2 | cloud | 2018-05-14 23:45 | O, U | 8
1 | cloud | 2018-05-14 23:45 | Install | 1

Jupyter Notebook: Email Analysis to a Lotus Notes View

I wanted to do an analysis of my emails since I joined IBM, and see the flow of messages in-and-out of my inbox.

With my preferences for Jupyter Notebooks, I built a small notebook for analysis.

Steps
Open IBM Lotus Notes Rich Client

Open the Notes Database with the View you want to analyze.

Select the View you are interested in ‘All Documents’. For instance the All Documents view, like my inbox *obfuscated* with a purpose.

Click File > Export

Enter a file name – email.csv

Select Format “Comma Separate Value”

Click Export

Upload the Notebook to your Jupyter server

The notebook is describes the flow through my process. If you encounter ValueError: (‘Unknown string format:’, ’12/10/2018 08:34 AM’), you can refer to https://stackoverflow.com/a/8562577/1873438

iconv -c -f utf-8 -t ascii email.csv > email.csv.clean

You can break the data into month-year-day analysis with the following, and peek the results with df_emailA.head()

When you run the final cell, the code generates a Year-Month-Day count as a bar graph.

    # Title: Volume in Months when emails are sent.
    # Plots volume based on known year-mm-dd
    # to be included in the list, one must have data in those years.
    # Kind is a bar graph, so that the (Y - YYYY,MM can be read)
    y_m_df = df_emailA.groupby(['year','month','day']).year.count()
    y_m_df.plot(kind="bar")

    plt.title('Numbers submitted By YYYY-MM-DD')
    plt.xlabel('Email Flow')
    plt.ylabel('Year-Month-Day')
    plt.autoscale(enable=True, axis='both', tight=False)
    plt.rcParams['figure.figsize'] = [20, 200]

You’ll see the trend of emails I receive over the years.

Trends of Email

LSLogins Quick Summary

I needed a summary of my datastage admin id for analysis of a login problem. lslogins was super helpful. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sec-displaying_comprehensive_user_information

[root@myserver ~]# lslogins dsadm --time-format=iso
Username: dsadm
UID: 1120
Gecos field:
Home directory: /home/dsadm
Shell: /bin/bash
No login: no
Password is locked: no
Password not required: no
Login by password disabled: no
Primary group: dstage
GID: 1119
Supplementary groups: db2iadm
Supplementary group IDs: 1117
Hushed: no
Password expiration warn interval: 7
Password changed: 2018-12-31
Minimum change time: 7
Maximum change time: 90
Running processes: 3

Last logs:
2018-12-11T16:51:03+0000 su[7327]: pam_unix(su-l:session): session opened for user dsadm by myuserid(uid=1120)
2018-12-11T16:51:06+0000 su[7327]: pam_unix(su-l:session): session closed for user dsadm
2018-12-21T18:30:02+0000 crontab[25267]: (dsadm) AUTH (crontab command not allowed)

Datastage Randomly Locked out

Suddenly, my datastage pipeline stopped working. I hit this error:

DB2_Connector_2: [Input link 0] SQLConnect reported: SQLSTATE = 42724: Native Error Code = -10,013: Msg = [IBM][CLI Driver] SQL10013N The specified library "GSKit Error: 408" could not be loaded. SQLSTATE=42724 (CC_DB2Connection::connect, file CC_DB2Connection.cpp, line 856)

The error was due to permission change on our SSL credentials (p12/jks)