Comprehensive and standardised workflow for single-cell proteomics data analysis using scp and scplainer.

Comprehensive and standardised workflow for single-cell proteomics data analysis using scp and scplainer.


Author(s): Samuel Grégoire,Christophe Vanderaa,Laurent Gatto

Affiliation(s): UCLouvain



Single cell proteomics (SCP) via mass spectrometry has become achievable thanks to technological advancements innovated by various research teams, resulting in a broad landscape of cutting-edge methodologies [1]. While this progress has enabled the measurement of thousands of proteins at the single cell resolution, it has also resulted in various complex and divergent analysis workflows. To efficiently tackle biologically relevant questions, the field of SCP must confront the challenges inherent in SCP data. SCP data are particularly prone to technical variations, batch effects, and missing values [2]. To address these challenges, our team has developed several tools packaged within the `scp` R/Bioconductor package. The latest addition is the `scplainer` approach, which offers a standardised approach grounded in linear modeling. `scplainer` provides key tools to extracts meaningful insights from SCP data through variance analysis, differential abundance analysis and component analysis, while streamlining the visualisation of the results. Integrated into the `scp` package, `scplainer` leverages `QFeatures` and `SingleCellExperiment` infrastructures, providing a comprehensive interface with numerous data processing functions. In addition, we also developed `scpdata`, a package containing standardised and annotated single-cell proteomics data, which we are still actively extending. In this work, we provide a comprehensive overview of SCP data processing using the `scp` package, starting from the output table generated by the search engine software through data processing, modeling and downstream analyses. [1] Petrosius et Schoof (2023), « Recent advances in the field of single-cell proteomics ». [2] Vanderaa et Gatto (2021), « Replication of single-cell proteomics data reveals important computational challenges ».