This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
Few usability improvements:
- Added a proxy to enable viewing worker logs
- Removed seperate webui service
- Modified Zeppelin and spark-ui services to be Loadbalancers
- Changed pyspark example to be platform agnostic
- Improved kubectl context setup
- Minor grammar/flow fixes
Search and replace for references to moved examples
Reverted find and replace paths on auto gen docs
Reverting changes to changelog
Fix bugs in test-cmd.sh
Fixed path in examples README
ran update-all successfully
Updated verify-flags exceptions to include renamed files
This adds a very basic Zeppelin image that works with the existing
Spark example. As can be seen from the documentation, it has a couple
of warts:
* It requires kubectl port-forward (which is unstable across long
periods of time, at least for me, on this app, bug incoming). See
* I needed to roll my own container (none of the existing containers
exactly matched needs, or even built anymore against modern Zeppelin
master, and the rest of the example is Spark 1.5).
The image itself is *huge*. One of the further refinements we need to
look at is how to possibly strip the Maven build for this container
down to just the interpreters we care about, because the deps here
are frankly ridiculous.
This might be a case where, if possible, we might want to open an
upstream request to build things dynamically, then use something like
probably the cut the image down considerably. (This might already be
possible, need to poke at whether you can late-bind interpreters
later.)
* Pod -> ReplicationController, which also forced me to hack around
hostname issue on the master. (Spark master sees the incoming slave
request to spark-master and assumes it's not meant for it, since it's
name is spark-master-controller-abcdef.)
* Remove service env dependencies (depend on DNS instead).
* JSON -> YAML.
* Add GCS connector.
* Make example do something actually useful: A familiar example to
anyone at Google, implement wordcount of all of Shakespeare's works.
* Fix a minor service connection issue in the gluster example.