MPRAlib documentation

GitHub Repository DOI GitHub License GitHub Release PyPI version Bioconda Version Tests Coverage badge GitHub Issues GitHub Pull Requests

MPRAlib is a Python library and CLI for processing MPRA (Massively Parallel Reporter Assay) data.

Citation

If you use MPRAlib in your work, please cite our recent preprint:

Uniform processing and analysis of IGVF massively parallel reporter assay data with MPRAsnakeflow Jonathan D. Rosen, Arjun Devadas Vasanthakumari, Kilian Salomon, Nikola de Lange, Pyaree Mohan Dash, Pia Keukeleire, Ali Hassan, Alejandro Barrera, Martin Kircher, Michael I. Love, Max Schubach bioRxiv (2025). 2025.09.25.678548

Installation

PyPI

pip install mpralib

Conda

From the bioconda channel:

conda install -c bioconda mpralib

Usage

Command Line Interface

Use the mpralib command to access various functionalities.

Validate a file

MPRAlib provides a CLI tool for validating MPRA data files against supported schemas.

mpralib validate-file <schema> --input <input_file>
  • <schema>: One of reporter-sequence-design, reporter-barcode-to-element-mapping, reporter-experiment-barcode, reporter-experiment, reporter-element, reporter-variant, reporter-genomic-element, reporter-genomic-variant

  • <input_file>: Path to your data file (e.g., .tsv.gz, .bed.gz)

Example:

mpralib validate-file reporter-sequence-design --input data/reporter_sequence_design.example.tsv.gz

Functional analysis

Filter barcodes using multiple filters, like setting min/max counts or detect barcode outliers

mpralib functional <schema> <inputs>
  • <schema>: One of activities, compute-correlation, filter

  • <inputs>: Please use --help for more details on the schema.

Example:

mpralib functional filter --input data/reporter_experiment_barcode.example.tsv.gz --method max_count --method-values '{"rna_max_count": 500, "dna_max_count": 300}' --output-barcode data/reporter_experiment_barcode.filtered.tsv.gz

Plotting

Generate plots of your data.

mpralib plot <schema> --input <input_file> --bc-threshold <bc_threshold> --output <output_file> <other_inputs>
  • <schema>: One of barcodes-per-oligo, correlation, dna-vs-rna, outlier

  • <input_file>: MPRA experiment or experiment barcode.

  • <bc_threshold>: Minimum number of barcodes per oligo to include in the plot.

  • <output_file>: Path to save the plot (e.g., .png, .pdf)

  • <other_inputs>: Please use --help for more details on the schema.

Example:

mpralib plot correlation --input data/reporter_experiment_barcode.example.tsv.gz --oligos --bc-threshold 10 --modality activity --output data/test.png

Combine counts with other outputs

Combines counts data with MPRA sequence design and quantification from other tools like BCalm to create several other output data, like bed file tracks. Can also create a variant map from the MPRA design file as an input for mpralm and BCalm.

mpralib combine <schema> <inputs>
  • <schema>: One of get-counts, get-reporter-elements, get-reporter-genomic-elements, get-reporter-genomic-variants, get-reporter-variants, get-variant-counts, get-variant-map

  • <inputs>: Please use --help for more details on the schema.

Example:

mpralib combine get-variant-map --input data/reporter_experiment_barcode.example.tsv.gz --sequence-design data/mpra_sequence_design.example.tsv.gz --output data/variant_map_of_oligo.tsv.gz

Python API

MPRAlib is primarily intended to be used as a library. Please see our notebook mpralib.ipynb for a more detailed example.

License

MIT License