Few usability improvements:
- Added a proxy to enable viewing worker logs
- Removed seperate webui service
- Modified Zeppelin and spark-ui services to be Loadbalancers
- Changed pyspark example to be platform agnostic
- Improved kubectl context setup
- Minor grammar/flow fixes
This adds a very basic Zeppelin image that works with the existing
Spark example. As can be seen from the documentation, it has a couple
of warts:
* It requires kubectl port-forward (which is unstable across long
periods of time, at least for me, on this app, bug incoming). See
* I needed to roll my own container (none of the existing containers
exactly matched needs, or even built anymore against modern Zeppelin
master, and the rest of the example is Spark 1.5).
The image itself is *huge*. One of the further refinements we need to
look at is how to possibly strip the Maven build for this container
down to just the interpreters we care about, because the deps here
are frankly ridiculous.
This might be a case where, if possible, we might want to open an
upstream request to build things dynamically, then use something like
probably the cut the image down considerably. (This might already be
possible, need to poke at whether you can late-bind interpreters
later.)
* Pod -> ReplicationController, which also forced me to hack around
hostname issue on the master. (Spark master sees the incoming slave
request to spark-master and assumes it's not meant for it, since it's
name is spark-master-controller-abcdef.)
* Remove service env dependencies (depend on DNS instead).
* JSON -> YAML.
* Add GCS connector.
* Make example do something actually useful: A familiar example to
anyone at Google, implement wordcount of all of Shakespeare's works.
* Fix a minor service connection issue in the gluster example.
mention the spark cluster is standalone
add detailed master & worker instructions
add method to get master status
add links option for master status
add links option for worker status
add example use of cluster
add source location