Running the Pipeline¶
Getting Help¶
All steps in the MAVIS pipeline are called following the main mavis entry point. The usage menu can be viewed by running without any arguments, or by giving the -h/–help option
Example:
>>> mavis -h
Help sub-menus can be found by giving the pipeline step followed by no arguments or the -h options
>>> mavis cluster -h
Running MAVIS using a Job Scheduler¶
The default setup and main ‘pipeline’ step of MAVIS is set up to use a job scheduler on a compute cluster. Two schedulers are currently supported: SLURM and SGE. Using the pipeline step will generate submission scripts and a wrapper bash script for the user to execute on their cluster head node.
Standard¶
The most common use case is auto-generating a configuration file and then running the pipeline setup step. The pipeline setup step will run clustering and create scripts for running the other steps.
>>> mavis config .... -w config.cfg
>>> mavis pipeline config.cfg -o /path/to/top/output_dir
This will create submission scripts as follows
output_dir/
|-- library1/
| |-- validation/<jobdir>/submit.sh
| `-- annotation/<jobdir>/submit.sh
|-- library2/
| |-- validation/<jobdir>/submit.sh
| `-- annotation/<jobdir>/submit.sh
|-- pairing/submit.sh
|-- summary/submit.sh
`-- submit_pipeline_<batchid>.sh
The submit_pipeline_<batchid>.sh is the wrapper script which can be executed on the head node
>>> ssh cluster_head_node
>>> cd /path/to/output_dir
>>> bash submit_pipeline_<batchid>.sh
Non-Standard¶
To set up a non-standard pipeline and skip steps use the skip stage option.
>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage cluster
>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage validate
Or to skip both clustering and validation, simply call the option twice.
>>> mavis pipeline /path/to/config -o /path/to/output/dir --skip_stage cluster --skip_stage validate
Note
skipping clustering will still produce and output directory and files, but no merging will be done
Configuring Scheduler Settings¶
There are mutiple ways to configure the scheduler settings. Some of the configurable options are listed below
- queue
MAVIS_QUEUE
- memory_limit
MAVIS_MEMORY_LIMIT
- time_limit
MAVIS_TIME_LIMIT
- import_env
MAVIS_IMPORT_ENV
- scheduler
MAVIS_SCHEDULER
For example to set the job queue default using an environment variable
export MAVIS_QUEUE=QUEUENAME
Or to give it as an argument during config generation
mavis config -w /path/to/config --queue QUEUENAME
Finally it can also be added to the config file manually
[schedule]
queue = QUEUENAME