Batch effect detection and visual quality control with CytoMDS, a Bioconductor package for low dimensional representation of distances between cytometry samples

Batch effect detection and visual quality control with CytoMDS, a Bioconductor package for low dimensional representation of distances between cytometry samples


Author(s): Philippe Hauchamps,Dan Lin,Laurent Gatto

Affiliation(s): Computational Biology and Bioinformatics, de Duve Institute, UCLouvain, Belgium



Quality Control (QC) of samples is an essential preliminary step in cytometry data analysis. Notably, identification of potential batch effects and sample outliers is paramount, to avoid mistaking these effects for true biological signal in downstream analyses. However, this task can prove to be delicate and tedious, especially for datasets with many samples. Here, we present *CytoMDS*, a Bioconductor package implementing a dedicated method for low dimensional representation of cytometry samples composed of marker expressions for up to millions of single cells. This method combines Earth Mover’s Distance (EMD) [1] for assessing dissimilarities between multidimensional distributions, and Multi Dimensional Scaling (MDS) [2] for low dimensional projection of distances. Some additional visual tools, both for projection quality diagnosis and for user interpretation of the projection axes, are also provided in the package. We demonstrate the strengths and advantages of CytoMDS for QC of cytometry data on real biological datasets, revealing the presence of low quality samples, batch effects and biological signal between sample groups. ### References [1] Haidong Yi and Natalie Stanley. 2022. “CytoEMD: Detecting and Visualizing between-Sample Variation in Relation to Phenotype with Earth Mover’s Distance.” In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 1–14. BCB ’22 28. New York, NY, USA: Association for Computing Machinery. [2] Jan de Leeuw and Patrick Mair. 2009. “Multidimensional Scaling Using Majorization: SMACOF in R.” Journal of Statistical Software 31 (3): 1–30.