Proteomics data coming soon, stay tuned!
Have we missed a relevant study? Please let us know !
Publicly available gene expression datasets from human skeletal muscle biopsies were collected from the Gene Expression Omnibus (GEO), including both RNA-seq and microarray platforms. RNA-seq raw count data were processed using the edgeR package, with gene filtering, TMM normalization, and voom transformation. Microarray CEL files were normalized using the RMA method in the oligo package and annotated with platform-specific databases.
Sample metadata were manually curated. Sex was validated based on the expression of the XIST and RPS4Y1 genes. Samples with inconsistent metadata or low average Spearman correlation were excluded as outliers.
All datasets were merged by gene symbol, log2-transformed, and centered on the median expression per gene. Batch effects between studies were corrected using the removeBatchEffect function from the limma package. Genes with over 10% missing values were excluded. Data quality was assessed using principal component analysis (PCA), heatmaps, boxplots, and expression histograms.