Article: Khan, Elcheikhali et al, 2024

 

Methods used:

Model system: human testes

 

Summary

Single-cell tissue atlases commonly use RNA abundances as surrogates for protein abundances. Yet, protein abundance also depends on the regulation of protein synthesis and degradation rates. To estimate the contributions of such post transcriptional regulation, we quantified the proteomes of 5,883 single cells from human testis using 3 distinct mass spectrometry methods (SCoPE2, pSCoPE, and plexDIA. To distinguish between biological and technical factors contributing to differences between protein and RNA levels, we developed BayesPG, a Bayesian model of transcript and protein abundance that systematically accounts for technical variation and infers biological differences. We use BayesPG to jointly model RNA and protein data collected from 29,709 single cells across different methods and datasets. BayesPG estimated consensus mRNA and protein levels for 3,861 gene products and quantified the relative protein-to-mRNA ratio (rPTR) for each gene across six distinct cell types in samples from human testis. About 28 % of the gene products exhibited significant differences at protein and RNA levels and contributed to about 1,500 significant GO groups. We observe that specialized and context specific functions, such as those related to spermatogenesis are regulated after transcription. Among hundreds of detected post translationally modified peptides, many show significant abundance differences across cell types. Furthermore, some phosphorylated peptides covary with kinases in a cell-type dependent manner, suggesting cell-type specific regulation. Our results demonstrate the potential of inferring protein regulation in complex tissues from single-cell proteogenomic data and provide a generalizable model, BayesPG, for performing such analyses.

 

Raw Data

Raw files with acquired mass spectra are available via MassIVE

 

Processed Data

The processed data are reported according to the community guidelines and available within a Google Drive folder

The folder is organized as follows:

  • 001-MSData: search outputs from raw mass spectrometry data and associated files to process to single cell matrices
    • 001-searchedFiles: output tables from Maxquant (including variable modification PTM search) and DIA-NN
    • 002-AuxiliaryFiles: metadata and [nPOP][nPOP] files to process search data into single cell matrices, configurations for DART-ID, inclusion lists and methods for pSCoPE and FASTA’s used and complex groupings used.
  • 002-SingleCellMatrices: processed single cell matrices
    • 001-mRNA: contains data from both studies used, one in the standard 10x output folder (Sohni et al) and the other as a digital gene expression matrix (Shami et al)
    • 002-Protein: cell x protein matrices for each sample preparation batch (unimputed: protein and peptide level ; BayesPG uses peptide level) and a single batch corrected matrix with all sample preparation batches
    • 003-alignmentOutputs: tables that contain RNA and protein cell type labels as assigned after dataset alignment via LIGER, as well as list of gene products that form feature space as selected by correlation vector analysis.
  • 003-BayesPGOutput: tables output from testing for significant rPTR at the levels of gene products, GO groups and protein complexes as well similar tables for the correlation tests
    • 001-unfiltered: all the datums
    • 002-filtered: important note, the filtered data will not be a complete (at the level of cell types); gene products/grouping significance is cell type specific for the rPTR tests.