Skip to contents

The `preprocess_rna_seq_data()` function performs essential preprocessing steps for raw RNA-seq counts. This includes creating a `DGEList` object, normalizing the counts using the default TMM (Trimmed Mean of M-values) normalization via the `edgeR::calcNormFactors` function, and applying the `voom` transformation from the `limma` package to obtain log-transformed counts per million (logCPM) with associated precision weights. If you require a different normalization method, you can supply your own custom normalization function.

Usage

preprocess_rna_seq_data(
  raw_counts,
  meta,
  spline_params,
  design,
  normalize_func = NULL
)

Arguments

raw_counts

A matrix of raw RNA-seq counts (genes as rows, samples as columns).

meta

A dataframe containing the metadata for data.

spline_params

Parameters for spline functions (optional). Must contain the named elements spline_type, which must contain either the string "n" for natural cubic splines, or "b", for B-splines, the named element degree in the case of B-splines, that must contain only an integer, and the named element dof, specifying the degree of freedom, containing an integer and required both for natural and B-splines.

design

A design formula for the limma analysis, such as '~ 1 + Phase*X + Reactor'.

normalize_func

An optional normalization function. If provided, this function will be used to normalize the `DGEList` object. If not provided, TMM normalization (via `edgeR::calcNormFactors`) will be used by default. Must take as input the y of: y <- edgeR::DGEList(counts = raw_counts) and output the y with the normalized counts.

Value

A `voom` object, which includes the log2-counts per million (logCPM) matrix and observation-specific weights.