Perform default preprocessing of raw RNA-seq counts
preprocess_rna_seq_data.Rd
The `preprocess_rna_seq_data()` function performs essential preprocessing steps for raw RNA-seq counts. This includes creating a `DGEList` object, normalizing the counts using the default TMM (Trimmed Mean of M-values) normalization via the `edgeR::calcNormFactors` function, and applying the `voom` transformation from the `limma` package to obtain log-transformed counts per million (logCPM) with associated precision weights. If you require a different normalization method, you can supply your own custom normalization function.
Arguments
- raw_counts
A matrix of raw RNA-seq counts (genes as rows, samples as columns).
- meta
A dataframe containing the metadata for data.
- spline_params
Parameters for spline functions (optional). Must contain the named elements spline_type, which must contain either the string "n" for natural cubic splines, or "b", for B-splines, the named element degree in the case of B-splines, that must contain only an integer, and the named element dof, specifying the degree of freedom, containing an integer and required both for natural and B-splines.
- design
A design formula for the limma analysis, such as '~ 1 + Phase*X + Reactor'.
- normalize_func
An optional normalization function. If provided, this function will be used to normalize the `DGEList` object. If not provided, TMM normalization (via `edgeR::calcNormFactors`) will be used by default. Must take as input the y of: y <- edgeR::DGEList(counts = raw_counts) and output the y with the normalized counts.