Enhancing Plasmodium Research through PlasmoRUtils: A one-stop R Package for Apicomplexan Biology

Enhancing Plasmodium Research through PlasmoRUtils: A one-stop R Package for Apicomplexan Biology


Author(s): Rohit Satyam

Affiliation(s): King Abdullah University of Science & Technology, Saudi Arabia



The major bottleneck in understanding Plasmodium biology is the relative lack of omics-driven experimental data and the presence of attention biases. Compounding this issue, half of the Plasmodium proteome encodes proteins whose functions remain unknown, aggravating the challenges. This is also true for other apicomplexan parasites. Although the PlasmoDB database offers the latest annotations for all Plasmodium species, it does not integrate or reference information from numerous privately maintained databases. Besides, some of these databases provide annotation using discontinued gene/protein IDs. Accessing these databases typically requires manual effort, which is both cumbersome and time-consuming for both wet-lab personnel and bioinformaticians. The PlasmoRUtils R package tackles these obstacles by providing various single-liner R functions for web scraping using gene IDs as input and summarise the results as data tables. It interfaces with databases like HitPredict, ApicoTFDB, Malaria.tools, MPMP database, MIIP, Phenoplasm, PlasmoBase, MalBoost, Uniprot, and others. Additionally, it offers a variety of utility functions and vignettes for visualising UniProt annotations, conducting GO/pathway enrichment analysis using the MPMP dataset and performing other routine bioinformatic analyses typically difficult and non-trivial for non-model organisms. This package also provides functions that simplify the creation of ready-to-use Seurat objects for multiple single-cell datasets from the Malaria Cell Atlas (MCA) and other similar apicomplexan datasets. The annotations gathered from these databases, along with in-house analyses, will be made available through an easily accessible and queryable Shiny web server, designed to be user-friendly for those outside the bioinformatics community. plasmoRUtils is accompanied by the plasmoRdata package, which holds essential datasets (single-cell datasets and reanalysed bulk RNAseq datasets uniformly analysed by our nextflow pipeline RNAgrinder) routinely used by researchers for tasks such as single-cell annotation, hypothesis testing, and exploring gene expression patterns throughout the life cycle. It also includes gene annotations generated by our lab, annotations from published literature, and protein-protein interaction data. The development of PlasmoRUtils and PlasmoRdata fills a vital gap in the data resources available for apicomplexan parasites research in the R environment. These packages are designed for ease data retrieval, facilitate integration from different sources, and improve the reproducibility of scientific research in combating one of the world's most severe infectious diseases.