The process is actually quite easy, but the installation docs don't provide the right hints. I found some posts online suggesting others had battled with this, so I share this here in case others get tripped up as well.
1. Install Docker. Create a docker group, add yourself to it, then log out and back in. Verify you can run the "hello docker" sample as yourself.
2. Run the TensorFlow GPU image:
$ docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow:latest-gpu
3. Visit http://172.17.0.1:8888 and verify you can execute the "hello tensorflow" samples. These will run without the GPU--you'll see errors on the console about not finding the CUDA libraries. Close the docker instance.
4. Install the latest nvidia binary display driver for your system. The simplest way to do this is through the Software & Updates GUI, Additional Drivers. Select "using NVIDIA binary driver," apply changes and restart. You can verify you're running the nvidia display driver by running nvidia-settings from the command line.
5. Install CUDA. On 16.04 the easiest way to do this is directly from apt:
$ sudo apt-get install nvidia-cuda-dev nvidia-cuda-toolkit
6. Install cuDNN v4. This needs to be installed manually. See instructions here. (You need to register for an nvidia developer account).
7. Run the TensorFlow GPU image, but this time give it access to the CUDA devices located at /dev/nvidia*. The easiest way to do this is with a script. The one the TensorFlow docs reference doesn't work with 16.04, so use mine.
8. Visit http://172.17.0.1:8888 again. This time when you run the samples you shouldn't see any CUDA errors.