Publicly available transcriptomic datasets profiling human skeletal muscle were retrieved from the Gene Expression Omnibus (GEO), including both RNA sequencing (RNA-seq) and Affymetrix microarray platforms (Human Exon 1.0 ST and HGU133Plus2). Metadata for each dataset, including age, sex, diagnosis, and platform, were manually curated. Only datasets that reported individual age data were retained.
Raw count data from GTEx and multiple GEO datasets were aggregated by gene and annotated using the Homo.sapiens Bioconductor package. Gene expression filtering and normalization were performed using edgeR, followed by log2 transformation and median centering. For microarrays, raw CEL files were processed with the oligo package, normalized using RMA, filtered for low expression, annotated, and collapsed by gene symbol.
All expression matrices were merged by gene symbol and batch-corrected using limma. Principal component analysis was used to assess integration and sample clustering. Sex assignments were verified using expression of sex-specific markers (RPS4Y1 and XIST) and adjusted where discrepancies were observed. Outliers were detected based on pairwise sample correlations, and samples below the 1st percentile of average correlation were excluded. Final metadata and normalized expression matrices were saved for visualization and analysis.