Fast trajectory-based differential expression and splicing analyses

Fast trajectory-based differential expression and splicing analyses


Author(s): Alexandre Segers,Jeroen Gilis,Davide Risso,Koen Van den Berge,Lieven Clement

Affiliation(s): Ghent University



Trajectory inference methods have revolutionized the analysis of dynamic gene expression changes through single cell RNA-sequencing (scRNA-seq). To this end, methods have been developed to identify differentially expressed genes across various lineages or conditions, often using generalized additive models (GAMs) to account for pseudotime in cellular processes. With the ever-increasing size of scRNA-seq datasets, particularly multi-patient data, the computational burden has exploded, requiring methods to resort to the log-normal distribution, although this ignores heteroscedasticity. We use an innovative approach for parameter estimation for count-based GAMs, building upon our recent work on fast fitting generalized linear models. Particularly, we extend our fast Newton-Raphson algorithm with an OLS-like update in each iteration by including a ridge penalty. We show that the updated equations remain the same in all iterations and over all genes that share the same penalty. Next, we adopt this approach to obtain fast estimation of the mean model parameters of GAMs using their ridge regression representation. Unlike conventional GAMs, which scale quadratically with respect to the number of covariates in high throughput settings, our method scales linearly, offering significant computational gains due to the extensive number of covariates in GAMs. For statistical inference, we develop fast approximate likelihood ratio tests and address the hierarchical correlation in multi-patient data using subject-specific effects. Further, we extend tradeSeq, which currently only detects differential gene expression, to detect differential splicing across lineages or conditions, utilizing quasi-binomial GAMs instead of the conventional negative binomial GAMs. Our fast parameter estimation is here adapted towards modelling proportions instead of counts, which again allows for fast and scalable computation. We illustrate our fast tradeSeq framework in simulation studies and real case studies and show how it dramatically reduces the computational complexity at the cost of a minimal performance reduction. Moreover, our fast implementation is outperforming methods to resort to the log-normal distribution.