Docker is not supported by RCC on the Midway cluster because of security concerns that Singularity addresses. Users who wish to use Docker containers should convert them to Singularity containers. Instructions on how to do this are included below.
Usage singularity [OPTIONS] run CONTAINERNAME
- Log into your home directory
- Create a folder in your home or scratch directory to store the container. For bigger image scratch directory would be advised.
mkdir containername
# Substitute containername with the name of the image
- Enter that folder (
cd containername
) - Then run singularity pull containername.sif docker://repo/containername:tags
- Example:
singularity pull ubuntu.sif docker://ubuntu:latest
- It pulls a docker container called "ubuntu" from the "default" repository and looks for the "latest" version. The docker container is then converted to a Singularity container.
- From here, you should be able to run the container by running
singularity run containername.sif
where containername.sif is the name of the container created from the "singularity pull ..." command above**.**
Create a Singularity image from a Docker image that is in the NGC container registry requires an authentication or API key. Normally these can be provided by --docker-login command. To create the API key, use this user guide: NGC Container User Guide for NGC Catalog: NVIDIA GPU Cloud Documentation
From the login node
module load singularity
singularity pull --docker-login dli-nlp-nemo.sif docker://nvcr.io/nvidia/dli/dli-nlp-nemo:v3-nemo1.0.1
prompted to enter your API Key
The dli-nlp-nemo file is located at /project/rcc/pnsinha/dli-nlp-nemo.sif
sinteractive --account=rcc-staff --partition=gpu --gres=gpu:1 --mem=16gb
module load singularity
singularity run dli-nlp-nemo.sif
As the singularity image is read only, it is desirable to make directories available inside the container
singularity exec --bind /path/outside/image/:/path/inside/image/ --bind $PWD:/run/user dli-nlp-nemo.sif
where --bind $PWD:/run/user is setting up the working directory to be accessible inside container.
Opening the Jupyter Lab
Option 1: You can use Thinlinc and open Firefox and navigate to url http://<compute-node>:8888/lab/lab
where is the name of node you have your interactive session is connected.
Option 2: Or you can create an ssh tunnel and forward all the traffic of jupyter notebook port to your local machine
ssh -N -f -L 8888:<compute-node>:8888 [email protected]
open any browser and navigate to http://127.0.0.1:8888 or http://localhost:8888
#!/bin/bash
#SBATCH --job-name=jupyter_notebook
#SBATCH --time=05:00:00
#SBATCH --output=jupyter_notebook_%j.txt
#SBATCH --error=jupyter_notebook_%j.err
#SBATCH --account=pi-<group>
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=16gb
#assign random port between 8000 and 9000
PORT_NUM=$(shuf -i8000-9000 -n1)
node=$(hostname -s)
user=$(whoami)
cluster="midway3"
#As the singularity image is read only, it is desirable to make directories available inside the container
#/path/outside/image/ is path in your midway3 dir
#/path/inside/image/ is path inside the container
# print tunneling instructions jupyter-log
echo -e "
# Note:# Check jupyter_notebook_%j.err to find the port.
# Command to create SSH tunnel:
ssh -N -f -L $PORT_NUM:${node}:$PORT_NUM ${user}@${cluster}.rcc.uchicago.edu
# Use a browser on your local machine to go to:
http://localhost:$PORT_NUM/
"
module load singularity
singularity exec --bind /path/outside/image/:/path/inside/image/ --bind $PWD:/run/user dli-nlp-nemo.sif jupyter lab --no-browser --ip=${node} --port=$PORT_NUM
# keep it alive
sleep 36000
When using Singularity, you may encounter issues related to temporary file storage, particularly when working with large containers. Below are some common troubleshooting steps to help you resolve these issues effectively.
By default, Singularity utilizes the /tmp directory on the local machine for temporary files and the ~/.singularity/cache directory for cache files generated during the build process. If you are working with containers that exceed a few gigabytes in size, you might run into the following error:
write /tmp/... : no space left on device
This error typically indicates that the /tmp directory has filled up on the machine you are using. Additionally, the cache could potentially consume all available space in your HOME directory.
To resolve these issues, you can override Singularity's default temporary and cache directories by setting specific environment variables before running your Singularity commands. Here’s how to do it:
-
Create New Directories: Create a new directory in your scratch space to hold the temporary and cache files:
mkdir -p $SCRATCH/$USER/singularity/{cache,tmp}
-
Set Environment Variables: Export the following environment variables to redirect the temporary and cache directories:
export SINGULARITY_CACHEDIR="$SCRATCH/$USER/singularity/cache" export SINGULARITY_TMPDIR="$SCRATCH/$USER/singularity/tmp"
- SINGULARITY_CACHEDIR: This is the directory where Singularity will download and cache files.
- SINGULARITY_TMPDIR: This is the directory where Singularity will write temporary files, including during the building of (squashfs) images.
-
Cleanup: After successfully building your container, you can delete everything in the tmp and cache directories to free up space.
On Midway3, $SCRATCH/$USER
refers to /scratch/midway3/<cnetid>
.
-
If you continue to experience issues, consider testing with a sandbox build. This can help determine if the problem is related to temporary storage space. You can build a sandbox using the following command:
sudo singularity build --sandbox [sandbox_name] Singularity
-
If you are still encountering problems, you may want to specify a sanitized home directory using the
-H
option when executing your container. For example:rm -rf /tmp/homie && mkdir -p /tmp/homie && singularity exec -H /tmp/homie analysis.img /bin/bash
This approach can help mitigate issues related to software modules installed in your HOME directory.
By following these steps, you should be able to effectively troubleshoot and resolve issues related to temporary file storage in Singularity.
References:
- https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dli/containers/dli-nlp-nemo
- https://www.hpcwire.com/2017/05/04/singularity-hpc-container-technology-moves-lab/
- https://github.com/apptainer/singularity
- https://apptainer.org/
For any additional question please email [email protected]