Preprocesing datasets on Beluga with fMRIPrep

This document purpose is to allow anyone to preprocess datasets that are stored on the Alliance Canada lab allocation. We will specifically use fmriprep-slurm, a python tool developped internally in the lab to automatically generate slurm files from a BIDS dataset.

fMRIPrep is the pipeline that we use internally to preprocess BIDS-compatible datasets. It uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and AFNI.

If it is the first time you are using our tool, you should go through the Preparation steps.

Pre-requisites

  • Basic knowledge of preprocessing pipeline
  • Our tutorial on ../tutorials/hpc
  • Understanding containerized app (Docker, singularity)

What will you learn ?

  • Use fmriprep to preprocess a dataset
  • Gain knowledge on HPC optimization for big data

Preparation steps

Freesrufer license

As part of the fMRIPrep pipeline, Freesurfer is mostly used for the reconstruction steps. It is free, but it requires a license to be used so you will need one before everything else.

To obtain a freesurfer license, register in the website. For the institution_name you should use “CRIUGM” and the institution_type is “nonprofit_education_research”.

Once downloaded, you can move the file to beluga:

scp ~/Downloads/license.txt beluga.computecanada.ca:~/.freesurfer.txt

Software environment

fmriprep-slurm depends on pybids (to manage a BIDS compatible dataset) and templateflow (a repository holding multiple templates for neuroimaging).

We use singularity to make sure that the python dependencies are in agreement with the fMRIPrep container, and to manage the pybids caching mechanism.

Warning

You must check that singularity, the fMRIPrep container and fmriprep-slurm is available on the system, this should be the case for Béluga .

Templateflow

fMRIPrep uses common brain templates which are managed by Templateflow. Our utility script takes care of it so you don’t have any additionnal setup to do for that.

BIDS validation

As lot of other neuroimaging tools, fMRIPrep heavily relies on the BIDS layout. It is then all natural to check if your input dataset is indeed BIDS compliant, with a tool called BIDS-validator. You don’t need to install it as it is already available on Beluga, just run the following command:

singularity exec -B PATH/TO/BIDS/DATASET:/DATA /lustre03/project/6003287/containers/fmriprep-20.2.1lts.sif bids-validator /DATA

Generating the slurm files

A convenience script is available to help you run the singularity command with fmriprep-slurm. The following command run the script inside a compute node:

/lustre03/project/6003287/fmriprep-slurm/singularity_run.bash PATH/TO/BIDS/DATASET fmriprep

Note

There are lot of different options, check the github page for more informations. For example, you might want to add your email with the --email argument.

Warning

You might also want to add additionnal fmrirep command, for example to enable ICA_AROMA and disable FreeSurfer reconstruction. In this case, you should add the argument as --fmriprep-args=\"--use-aroma --fs-no-reconall\" (don’t forget the escaping character \).

It should take some time since the filesystem is slow, grab a cup of coffee! If it takes too much time, you should run this inside a compute node:

salloc --account=rrg-pbellec --mem-per-cpu=2G --time=4:00:0 --cpus-per-task=2

Submitting the preprocesing jobs

If everything worked as expected, all the slurm files should be inside a new folder under your scratch space SCRATCH/DATASET_NAME/UNIX_TIME/.slurm. There should be one slurm script per subject sub, allowing you to preprocess them in parrallel.

Check the content of the slurms scripts, and more specifically the time and hardware requests since it impacts our allocation usage even if the job fails.

You are now ready to submit the jobs with sbatch:

find ${SCRATCH}/DATASET_NAME/UNIX_TIME/.slurm/smriprep_sub-*.sh -type f | while read file; do sbatch $file; done

Checking the output

Output and error logs

Once the jobs are finished, the output smriprep_sub-*.out and error smriprep_sub-*.err logs should be under the same folder as previously SCRATCH/DATASET_NAME/UNIX_TIME.

Double-check your input dataset, and if you have any further issues, contact one of the data admins.

Warning

It is possible that you encounter BIDS errors due to bad pybids caching behavious, because the filesystem is slow on Beluga. In this case, you should re-run the tool as described in Generating the slurm files with the `--force-reindex` argument.

fMRIPrep outputs

A first file available is the resource_monitor.json under ${SCRATCH}/DATASET_NAME/UNIX_TIME, to help you track the usage for each subject.

All the preprocessing outputs should also be inside ${SCRATCH}/DATASET_NAME/UNIX_TIME/fmriprep.

Finally, if fMRIPrep unexpectedly crashed, you can check its working directory in ${SCRATCH}/DATASET_NAME/UNIX_TIME/smriprep_sub-XXXX.workdir.

To go further

Look at the fMRIPrep documentation, and more specifically the section on singularity.

Questions ?

If you have any issues using Alliance Canada, don’t hesitate to ask your questions on the SIMEXP lab slack in #alliance_canada channel!