Preprocesing datasets on Beluga with fMRIPrep¶
This document purpose is to allow anyone to preprocess datasets that are stored on the Alliance Canada lab allocation. We will specifically use fmriprep-slurm, a python tool developped internally in the lab to automatically generate slurm files from a BIDS dataset.
fMRIPrep is the pipeline that we use internally to preprocess BIDS-compatible datasets. It uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and AFNI.
If it is the first time you are using our tool, you should go through the Preparation steps.
Pre-requisites¶
- Basic knowledge of preprocessing pipeline
- Our tutorial on ../tutorials/hpc
- Understanding containerized app (Docker, singularity)
What will you learn ?¶
- Use fmriprep to preprocess a dataset
- Gain knowledge on HPC optimization for big data
Preparation steps¶
Freesrufer license¶
As part of the fMRIPrep pipeline, Freesurfer is mostly used for the reconstruction steps. It is free, but it requires a license to be used so you will need one before everything else.
To obtain a freesurfer license, register in the website.
For the institution_name you should use “CRIUGM” and the institution_type is “nonprofit_education_research”.
Once downloaded, you can move the file to beluga:
scp ~/Downloads/license.txt beluga.computecanada.ca:~/.freesurfer.txt
Software environment¶
fmriprep-slurm depends on pybids (to manage a BIDS compatible dataset) and templateflow (a repository holding multiple templates for neuroimaging).
We use singularity to make sure that the python dependencies are in agreement with the fMRIPrep container, and to manage the pybids caching mechanism.
Warning
You must check that singularity, the fMRIPrep container and fmriprep-slurm is available on the system, this should be the case for Béluga .
Templateflow¶
fMRIPrep uses common brain templates which are managed by Templateflow. Our utility script takes care of it so you don’t have any additionnal setup to do for that.
BIDS validation¶
As lot of other neuroimaging tools, fMRIPrep heavily relies on the BIDS layout. It is then all natural to check if your input dataset is indeed BIDS compliant, with a tool called BIDS-validator. You don’t need to install it as it is already available on Beluga, just run the following command:
singularity exec -B PATH/TO/BIDS/DATASET:/DATA /lustre03/project/6003287/containers/fmriprep-20.2.1lts.sif bids-validator /DATA
Generating the slurm files¶
A convenience script is available to help you run the singularity command with fmriprep-slurm. The following command run the script inside a compute node:
/lustre03/project/6003287/fmriprep-slurm/singularity_run.bash PATH/TO/BIDS/DATASET fmriprep
Note
There are lot of different options, check the github page for more informations.
For example, you might want to add your email with the --email argument.
Warning
You might also want to add additionnal fmrirep command, for example to enable ICA_AROMA and disable FreeSurfer reconstruction.
In this case, you should add the argument as --fmriprep-args=\"--use-aroma --fs-no-reconall\" (don’t forget the escaping character \).
It should take some time since the filesystem is slow, grab a cup of coffee! If it takes too much time, you should run this inside a compute node:
salloc --account=rrg-pbellec --mem-per-cpu=2G --time=4:00:0 --cpus-per-task=2
Submitting the preprocesing jobs¶
If everything worked as expected, all the slurm files should be inside a new folder under your scratch space SCRATCH/DATASET_NAME/UNIX_TIME/.slurm.
There should be one slurm script per subject sub, allowing you to preprocess them in parrallel.
Check the content of the slurms scripts, and more specifically the time and hardware requests since it impacts our allocation usage even if the job fails.
You are now ready to submit the jobs with sbatch:
find ${SCRATCH}/DATASET_NAME/UNIX_TIME/.slurm/smriprep_sub-*.sh -type f | while read file; do sbatch $file; done
Checking the output¶
Output and error logs¶
Once the jobs are finished, the output smriprep_sub-*.out and error smriprep_sub-*.err logs should be under the same folder as previously SCRATCH/DATASET_NAME/UNIX_TIME.
Double-check your input dataset, and if you have any further issues, contact one of the data admins.
Warning
It is possible that you encounter BIDS errors due to bad pybids caching behavious, because the filesystem is slow on Beluga.
In this case, you should re-run the tool as described in Generating the slurm files with the `--force-reindex` argument.
fMRIPrep outputs¶
A first file available is the resource_monitor.json under ${SCRATCH}/DATASET_NAME/UNIX_TIME, to help you track the usage for each subject.
All the preprocessing outputs should also be inside ${SCRATCH}/DATASET_NAME/UNIX_TIME/fmriprep.
Finally, if fMRIPrep unexpectedly crashed, you can check its working directory in ${SCRATCH}/DATASET_NAME/UNIX_TIME/smriprep_sub-XXXX.workdir.
To go further¶
Look at the fMRIPrep documentation, and more specifically the section on singularity.
Questions ?¶
If you have any issues using Alliance Canada, don’t hesitate to ask your questions on the SIMEXP lab slack in #alliance_canada channel!