Saturday, September 26, 2015

Docker for Bioinformatics and Genetics - Part 1

Docker Tutorial 1: Palying with Docker to deploy genetics software

This is a tutorial of using docker to set up bioinforamatics and genetics analysis tools. 

1. First let's install the docker

The installation procedures can be found in Docker's offical userguide for Ubuntu
I am running the Ubuntu 14.04.03 LTS server version in VMware Fusion Pro. First confirm the Linux Kernel version with uname -r and my return is 3.19.0-25-generic. Which will be fine since the prerequirement is version higher than 3.10
The curl is preinstalled in the system, otherwise use 
sudo apt-get install curl
 to get one. 
The installation of Docker is done in one step
curl -sSL https://get.docker.com/ | sh
Verify the installation from the log message
...
cgroup-lite start/running
Setting up docker-engine (1.8.2-0~trusty) ...
docker start/running, process 3512
Processing triggers for libc-bin (2.19-0ubuntu6.6) ...
Processing triggers for ureadahead (0.100.0-16) ...
+ sudo -E sh -c docker version
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

If you would like to use Docker as a non-root user, you should now consider
adding your user to the "docker" group with something like:

  sudo usermod -aG docker psytky03

Remember that you will have to log out and back in for this to take effect!
Create a user groud named docker and add the user to this group.
sudo usermod -aG docker psytky03
Logout and back again.
Run a HelloWorld test image
docker run hello-world
Here is the output 
Hello from Docker.
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker Hub account:
 https://hub.docker.com

For more examples and ideas, visit:
 https://docs.docker.com/userguide/
docker run -it ubuntu bash

2. Sign up a DockerHub account

Dockerhub is similar to the concept of github for pushing and pulling docker images
https://hub.docker.com/login/
Here I got a username as "psytky03"

3. Make the first Dockerfile

Dockerfile is the blueprint to tell docker how to create an image. It contains the basic information such as which platform the image is based on, a full collection of the commands for installation of the software, the system path et.c 
Here I am going to install two tools: the latest Eigensoft ver 6.01 for pricinple component analysis (PCA) and Plink ver 1.9. 
The contents of this dockerfile looks like this
FROM ubuntu

MAINTAINER Psytky03
RUN sudo apt-get update
RUN sudo apt-get -y install wget git unzip
RUN sudo apt-get -y install libgsl0ldbl gfortran-4.4
RUN git clone https://github.com/DReichLab/EIG.git

RUN sudo apt-get -y install wget unzip python
RUN wget https://www.cog-genomics.org/static/bin/plink150903/plink_linux_x86_64.zip
RUN unzip plink_linux_x86_64.zip -d plinkbin


ENV PATH $PATH:/EIG/bin:/plinkbin
RUN mkdir data

4. Build the Docker image

mkdir my_first_docker_image
cd my_first_docker_image/

nano Dockerfile   
#Copy and Paste contents list in section 3

docker build -t psytky03/eigandplink .
It will take for a while, finally the log report shows the image is built without error
Successfully built 87f7aefa7968
Check the built image with docker images command
docker images 
------------------------------------------------------------------------------------------
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
psytky03/eigandplink   latest              6fd94c3b6e6e        3 minutes ago       392.7 MB
ubuntu                 latest              91e54dfb1179        5 weeks ago         188.4 MB
hello-world            latest              af340544ed62        7 weeks ago         960 B
The size of this image is 392 MB
Verify the Eigensoft and Plink
docker run psytky03/eigandplink plink --help
docker run psytky03/eigandplink eigenstrat

# To get into the image file
docker run -it psytky03/eigandplink bash

# To check all docker image/container info
docker ps -a
Run Plink with data in the host machine
#First let's grab some test files shipped with Plink

wget https://www.cog-genomics.org/static/bin/plink150903/plink_linux_x86_64.zip
unzip plink_linux_x86_64.zip -d plink1.9
Now we use docker run -v to bridge the folder in the host machine to the data folder in the container:
docker run -v /home/psytky03/plink1.9:/data psytky03/eigandplink \
plink --file data/toy --make-bed --out data/test 
Check the plink1.9 folder and you should be able to see the test.bed test.bim test.fam files. 

5. Push the image to DockerHub

docker login
#Login Succeeded

docker push psytky03/eigandplink

6. Pull back the image at another Linux machine

docker pull psytky03/eigandplink