Spark and Data Tips for November 2018

Full Hadoop / HBase data platform for testing spark

I found the following docker very handy for testing hadoop.
docker pull bigdatauniversity/spark2
docker run -it –name bdu_spark2 -P -p 4040:4040 -p 4041:4041 -p 8080:8080 -p 8081:8081 bigdatauniversity/spark2:latest /etc/ -bash

Spark Notebooks

I found these sites useful – and and

Version Mismatch

If you hit this error

Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions

It’s a quick fix

export PYSPARK_PYTHON=/usr/local/bin/python3

Thanks to


A handy list of kernels

I used the Python 3 kernel.

pip3 install spylon-kernel

I did also play with Toreee

pip3 install toree
jupyter toree install

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.