This article is intended for our new users to begin using the cluster quickly. It summarizes basic information and points to the more detailed information. Users who understand and follow our policies and procedures will meet the our expectations of being good citizens. Please take the time to read our documentation so you don’t inconvenience your fellow community members.
Documentation and Policies
All new “Mind” cluster users are expected to read and understand our Cluster Policies page(s). This should only take a few minutes to read over. If you have questions, ask David before proceeding.
Connecting to the Mind Cluster
You applied for a cluster account, and you’ve been notified that your account on the cluster is created. This article explains how to connect to the MIND cluster.
File and Data Storage
By default, users land in their home directory ( /home/<username> ) when logging onto the cluster. There are various storage areas on the cluster for specified usage. Understand what volumes your expected to use for code, configurations, data and results. The Data Management article will help you with this.
Two critical rules our users must follow regarding their data storage:
- Users should only have data currently being used for processing and the results of recent processing saved on the cluster.
- Use these three storage areas correctly.
- /home/<username> is the user’s home directory and should only be used to store small files such as configuration files, documents, and source code [keep this to a maximum of 300GB]
- /user_data<username> space is a 1TB partition for users to save their work.
- /lab_data/<whateverlab> is a 10TB partition for lab members to share and save their work.
These volumes are accessible cluster wide (on all compute nodes as well as on the head node).
This is an example of navigating to the various volumes.
[dpane@mind ~]$ cd /lab_data/fisherlab/ [dpane@mind fisherlab]$ ls [dpane@mind fisherlab]$ pwd /lab_data/fisherlab [dpane@mind fisherlab]$ cd [dpane@mind ~]$ cd /user_data/qianoum/ [dpane@mind qianoum]$ ls [dpane@mind qianoum]$ ls -ltra total 9 drwxr-xr-x 2 qianoum qianoum 2 Jan 9 09:28 . drwxrwxrwx 172 root root 172 Jan 10 10:51 .. [dpane@mind qianoum]$
You may also find this article on file and directory permissions helpful.
Data transfer
Transferring files can be done using any secure transfer protocol. Samba is NOT a secure protocol and is NOT supported on the cluster. Various commands such as scp, sftp, rsync over ssh, etc. are supported.
An example command from a Linux or MacOS computer to the MIND cluster
rsync -avz <src directory path> -e "ssh -l <username>" mind.cs.cmu.edu:<full destination path on mind>
Tectia File Transfer(Windows) , Fetch (MacOs), or some other sftp software will work.
https://www.cmu.edu/computing/software/
Software
As new versions of software packages become available, we can made them available by installing them and then change the default user environment. A easy way to change the individual user environment is using the module command. The Modules package is a tool that simplifies shell initialization and lets users easily modify their environment during the session.
[dpane@mind-1-1 ~]$ module avail --------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------- anaconda3 dsi-studio julia-1.2.0 python36-extras cuda-10.0 freesurfer-5.3.0 matlab-9.11 qt-4.8.2 cuda-10.1 freesurfer-6.0.0 matlab-9.5 qt-4.8.5 cuda-10.2 freesurfer-7.1.0 matlab-9.7 R-3.6.1 cuda-11.1.1 fsl-6.0.3 module-git rocks-openmpi cuda-9.2 gcc-4.7.4 module-info rocks-openmpi_ib cudnn-10.0-7.3 gcc-4.9.2 modules rstudio-1.2.5033 cudnn-10.1-v7.6.5.32 gcc-6.3.0 mrtrix3-3.0.0-git singularity cudnn-10.2-v7.6.5.32 git-2.23 null use.own cudnn-11.1.1-v8.0.4.30 git-2.30 openmpi-1.10-x86_64 cudnn-9.2-7.6 glx-indirect openmpi-1.8-x86_64 dot julia-0.3.6 openmpi-x86_64 [dpane@mind-1-1 ~]$
Use module load <module filename> to activate the software. (e.g.module load anaconda3 ) and module unload <module filename> to deactivate the package.
Example of changing versions of Python and pip using the anaconda3 module file.
[dpane@mind ~]$ which python /usr/bin/python [dpane@mind ~]$ python --version Python 2.7.5 x[dpane@mind ~]$ which pip /usr/bin/pip [dpane@mind ~]$ pip --version pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7) [dpane@mind ~]$ module load anaconda3 [dpane@mind ~]$ which python /opt/anaconda3/bin/python [dpane@mind ~]$ which pip /opt/anaconda3/bin/pip [dpane@mind ~]$ pip --version pip 18.1 from /opt/anaconda3/lib/python3.7/site-packages/pip (python 3.7) [dpane@mind ~]$ python --version Python 3.7.1 [dpane@mind ~]$ [dpane@mind ~]$ module unload anaconda3 [dpane@mind ~]$
When users are creating batch job scripts, they should load the appropriate module within the script so their software will run.
Running jobs
Users – Avoid processing on the login node ( or “head node”). This includes VSCode usage (see this article on VSCode setup) The head node is the machine that all users log into – hostname: mind.cs.cmu.edu. To run jobs, users need to schedule their jobs using the cluster job scheduler. The scheduler will assign the job to a compute node. The Slurm Workload Manager (or Slurm for short) job scheduler is the only job scheduler we use on the mind cluster. Users can request resources (machines, cpus, cpus, memory, time) and Slurm with allocate the appropriate resources (a compute node that meets the requested resources). If the resources aren’t available, the scheduler will keep the job request in the queue until a machine(s) with the resources are available. User requests that cannot be met due to the cluster not having the required resources will never get scheduled. The scheduler uses a number of factors and a complicated algorithm to allocate resources fairly.
There are two main job queues which all users are able to submit their jobs to.
- cpu – this encompasses all of the cpu nodes
- gpu – this encompasses all of the gpu nodes
You are expected to only submit to the GPU queue when your job(s) will be using GPU(s) for its’ processing.
Users are able to request batch or interactive jobs. An interactive job will assign the user a session on a compute node so they can work interactively.
Example interactive session request on a specific cpu node (you can eliminate specifying a node if you want):
srun -p cpu --cpus-per-task=1 --gres=gpu:0 --mem=10GB --time=4:00:00 --nodelist=mind-0-11 --pty bash
Example of interactive session request with X11 display on non gpu node:
srun --x11 -p cpu --cpus-per-task=1 --gres=gpu:0 --mem=0 --time=4:00:00 --pty $SHELL
The following is an example of requesting an interactive session on a compute node. In the example, I simply ran the command `hostname` when I was on the head node; requested an interactive session on a node in the cpu queue where I ran the same ‘host name’ command; I then exited the interactive session and got back onto the head node.
[dpane@mind ~]$ hostname mind.cs.cmu.edu [dpane@mind ~]$ srun -p cpu --cpus-per-task=1 --gres=gpu:0 --mem=10GB --time=4:00:00 --nodelist=mind-0-11 --pty bash [dpane@mind-0-11 ~]$ [dpane@mind-0-11 ~]$ [dpane@mind-0-11 ~]$ hostname mind-0-11.eth [dpane@mind-0-11 ~]$ exit exit [dpane@mind ~]$
More information and examples can be found on the Slurm Scheduler page.