Docker Build Status GitHub release GitHub release Website
22-Apr-2018
v1.3.0.b1
- Source: https://github.com/sbamin/gdc-client
- Docker: https://hub.docker.com/r/sbamin/gdc-client
- Based on
gdc-clientUbuntu x64 binary at https://gdc.cancer.gov/access-data/gdc-data-transfer-tool
- In the host machine (where you've installed
docker): Create or preferably mount an external disk to store sequence data, e.g.,/mnt/myscratch/.
To avoid potential issues with file permissions on the host machine, avoid mounting entire home directory or path to critical data as a docker shared volume. I usually prefer creating a new directory for shared volume.
- Download and keep GDC manifest file in tsv/txt format for
openaccess files in the shared directory. - Save GDC token file in the same director and do
chmod 600 gdc_token.key - IMPORTANT: Make sure to pass valid (identical to host machine) user and group environment variables in
docker runcommand else stored files may inherit root or strange user ownership.
## path where data will be stored on the host machine export USERMOUNT="/mnt/myscratch" cd "${USERMOUNT}" ## MAKE SURE TO GIVE PROPER USER AND GROUP IDs, matching to those of host machine docker run -d -e HOSTUSER=$USER -e HOSTGROUP=$(id -gn $USER) -e HOSTUSERID=$UID -e HOSTGROUPID=$(id -g $USER) -v "${USERMOUNT}":/scratch sbamin/gdc-client "gdc-client download --log-file=download.log -n 4 -t gdc_token.key -m controlled_manifest.tsv"
Here, we run docker in daemon mode, mount
/mnt/myscratch(supply full path and not relative) directory on the host machine to/scratchlocation within docker container. Then we start,gdc-client downloadwith 4 threads and fetch controlled access data from the downloaded manifest using download key. For logging,-vdoes not seem to work, so using--log-file=download.logto save file in in mounted host volume.
- At the
-mflag, only specify name of the manifest file, e.g.,controlled_manifest.tsvand not the whole path. docker container will start with container work directory,/scratchwhich is mapped to/mnt/myscratchon the host machine. So, docker container would look forcontrolled_manifest.tsvin the mounted/scratchdirectory, i.e.,/mnt/myscratch/controlled_manifest.tsvlocation on the host machine!
- For controlled access data, avoid using daemon mode with
-dflag before testing that API token,gdc_token.keyis working else you may end up requesting too many login requests and get blocked of further data access. - In case of authorization failure, kill docker container using
docker kill <container ID>command. Check container ID using command to check running docker processes:docker ps
- Remove
-t gdc_token.keyand replace-m controlled_manifest.tsvwith-m open_manifest.tsv
## path where data will be stored on the host machine export USERMOUNT="/fastscratch/foo/dump/gdc/test" cd "${USERMOUNT}" ## MAKE SURE TO GIVE PROPER USER AND GROUP IDs, matching to those of host machine docker run -d -e HOSTUSER=$USER -e HOSTGROUP=$(id -gn $USER) -e HOSTUSERID=$UID -e HOSTGROUPID=$(id -g $USER) -v "${USERMOUNT}":/scratch sbamin/gdc-client "gdc-client download --log-file=download.log -n 4 -m open_manifest.tsv"
docker ps docker logs <container NAME or ID>
- To debug in case of download failure, add
--debugflag. Not recommended if gdc-client download is working properly else this will increase write operations a lot!
docker run -d -e HOSTUSER=$USER -e HOSTGROUP=$(id -gn $USER) -e HOSTUSERID=$UID -e HOSTGROUPID=$(id -g $USER) -v "${USERMOUNT}":/scratch sbamin/gdc-client "gdc-client download --debug --log-file=download.log -n 4 -t gdc_token.key -m controlled_manifest.tsv"
Instead of supplying download manifest, you can supply analysis UUID, i.e., first column value of the manifest, and run multiple (prefer not to run more than 2-3 on one compute node) docker instances using one the above two command for open or controlled access data, respectively.
gdc-docker-dn-x64-el6 is a bash wrapper to start per-sample (analysis UUID) docker download instance. It's not ready for release yet, but it formats following docker run command by taking a few user-supplied arguments.
docker run -d --name 1da7105a-f0ff-479d-9f82-6c1d94456c91 -e HOSTUSER=foo -e HOSTGROUP=staff -e HOSTUSERID=1000 -e HOSTGROUPID=1001 -v /fastscratch/foo/gdc:/scratch sbamin/gdc-client:1.3.0.b1 "gdc-client download --log-file=/scratch/docker_logs/docker_1da7105a-f0ff-479d-9f82-6c1d94456c91_22Apr18_124819EDT.log -n 8 -t gdc_token.key 1da7105a-f0ff-479d-9f82-6c1d94456c91"- PS: Valid volume mounts are required (see below) before executing
gdc-client
docker run -e HOSTUSER=$USER -e HOSTGROUP=$(id -gn $USER) -e HOSTUSERID=$UID -e HOSTGROUPID=$(id -g $USER) -v "${USERMOUNT}":/scratch sbamin/gdc-client "gdc-client download --help" docker run -e HOSTUSER=$USER -e HOSTGROUP=$(id -gn $USER) -e HOSTUSERID=$UID -e HOSTGROUPID=$(id -g $USER) -v "${USERMOUNT}":/scratch sbamin/gdc-client "gdc-client upload --help"
END