Docker-instructions
Thomas Rauter
2024-10-08
Docker-instructions.Rmd
Pulling the Docker
Container
To pull the Docker container, use the following command. Make sure to check for the newest version or the specific version you need by visiting the Docker Hub repository.
If you face ‘permission denied’ issues, check out this vignette
Running the Docker
Container
To run the Docker
container, you can use one of the
following commands, depending on your operating system. Before running
the command, ensure that you are in a directory containing two
subfolders: input
and output
. These will be
used to transfer files between your local machine and the Docker
container.
For Linux and macOS:
# Bash
docker run -it -d \
-v $(pwd)/input:/home/rstudio/input \
-v $(pwd)/output:/home/rstudio/output \
-p 8888:8787 \
-e PASSWORD=123 \
--name splineomics \
thomasrauter/splineomics:0.1.0
For Windows:
# PowerShell
docker run -it -d `
-v "${PWD}\input:/home/rstudio/input" `
-v "${PWD}\output:/home/rstudio/output" `
-p 8888:8787 `
-e PASSWORD=123 `
--name splineomics `
thomasrauter/splineomics:0.1.0
Once the container is running, open a web browser and navigate to http://localhost:8888. Log in using the following credentials:
Username: rstudio
Password: The one you set with the -e PASSWORD=123 option (123 in this case)
As long as the container is running, you can work on that localhost
page with RStudio, where also the SplineOmics
package is
installed. /home/rstudio/
is your R session working
folder.
Stop the container:
Start the container again:
Input and Output File Management
The input
and output
directories on your
local machine are mounted to corresponding directories inside the Docker
container. This allows seamless file transfer between your local machine
and the container.
Place your input files (e.g., data, metadata, annotation files) in the
input
directory on your local machine. These files will automatically appear in/home/rstudio/input
inside the container.Any files generated by RStudio within the container should be saved to
/home/rstudio/output
. These files will automatically appear in theoutput
directory on your local machine.
Local Directory | Docker Container Directory | Description |
---|---|---|
/home/rstudio/ |
Working directory in the RStudio session inside the container. | |
input/ |
/home/rstudio/input/ |
Place your input files here on your local machine. |
output/ |
/home/rstudio/output/ |
Files generated in the container will appear here locally. |
Inspect Docker
container installations
To see all the R packages and system installations that make up the
Docker
container, you can run the following command in the
terminal of RStudio on your localhost browser page.
Because the /home/rstudio/output
dir is mounted to your
local filesystem, this will make the installation log files available
there.
Installing additional R packages in the container
New R packages can be installed the normal way:
install.packages("package_name")
However, note that any packages installed in the running container will be lost if the container is deleted or rebuilt.
Permanent additions
If you want to permanently add R packages, R scripts, or other files
to the SplineOmics Docker
image, you can use it as a base
image for building a new image. This will ensure that all changes are
saved into the new image, rather than being lost when the container is
deleted.
For example:
# Use the SplineOmics image as the base image
FROM thomasrauter/splineomics:0.1.0
# Install the data.table package permanently
RUN R -e "install.packages('data.table')"
# Optionally, add custom R scripts to the image
COPY your_script.R /home/rstudio/your_script.R
# Set the working directory
WORKDIR /home/rstudio
# Expose RStudio Server port
EXPOSE 8787
# Start RStudio server
CMD ["/init"]
# Build new image:
# docker build -t your_new_image_name .
Run the container of the new image with the commands described above.
Creating a Reproducible Docker Container with Automated Analysis
When you have your final analysis script inside the Docker container
of the SplineOmics
package, and you want that other
scientists can easily reproduce your results by running just one line of
code, follow the guide below. This will instruct you how to create a new
image based on your container, which you can save for example on
Docker Hub
. Others can download this image, and run the
container to get the exact same results you got.
1. Prepare your analysis and scripts
Ensure all analysis scripts and necessary files are saved in a
dedicated directory inside the container (e.g.,
/home/rstudio/analysis/
). Your analysis script should take
the input files from a directory like /home/rstudio/input/ (which is
already inside the container and does not need to be mounted again when
reproducing the analysis) and output all results to
/home/rstudio/output/. The /home/rstudio/output/ directory is mounted to
a local directory on the user’s machine, making the results accessible
outside the container. Example directory structure:
2. Create an Entry Point Script
Create a bash script (run_analysis.sh) that runs your analysis automatically.
Example run_analysis.sh:
#!/bin/bash
Rscript /home/rstudio/analysis/final_analysis.R
tail -f /dev/null # Keep the container running after analysis
Save this script in /home/rstudio/
.
3. Commit the Container as a New Image with an Entry Point
Once your scripts are ready, commit the running container as a new image and set the new entry point to run the bash script automatically:
4. Push the New Image to Docker Hub
Push the new image to Docker Hub
so others can easily
pull and reproduce the analysis:
Others can pull (download) the container with this command:
5. Running the container to reproduce the results
To reproduce the results, you need to create a local directory where the results will be saved and then mount this directory to the container’s /home/rstudio/output/ directory.
Use the following command to run the container and ensure that the
results are saved to the local output directory (see commands in section
Running the Docker
Container above how to mount the
output
dir in the current working dir).