Skip to contents

The `preprocess_rna_seq_data()` function performs essential preprocessing steps for raw RNA-seq counts. This includes creating a `DGEList` object, normalizing the counts using the default TMM (Trimmed Mean of M-values) normalization via the `edgeR::calcNormFactors` function, and applying the `voom` transformation from the `limma` package to obtain log-transformed counts per million (logCPM) with associated precision weights. If you require a different normalization method, you can supply your own custom normalization function.

Usage

preprocess_rna_seq_data(splineomics, normalize_func = NULL)

Arguments

splineomics

An S3 object of class `SplineOmics` that must contain the following elements:

  • data: A matrix of the omics dataset, with feature names optionally as row headers (genes as rows, samples as columns).

  • meta: A dataframe containing metadata corresponding to the data. The dataframe must include a 'Time' column and a column specified by the condition.

  • design: A character string representing the design formula for the limma analysis (e.g., '~ 1 + Phase*X + Reactor').

  • spline_params: A list of spline parameters used in the analysis. This can include:

    • spline_type: A character string specifying the type of spline. Must be either 'n' for natural cubic splines or 'b' for B-splines.

    • dof: An integer specifying the degrees of freedom. Required for both natural cubic splines and B-splines.

    • degree: An integer specifying the degree of the spline (for B-splines only).

    • knots: Positions of the internal knots (for B-splines).

    • bknots: Boundary knots (for B-splines).

  • dream_params: A named list or NULL. When not NULL, it can contain:

    • dof: An integer greater than 1, specifying the degrees of freedom for the dream topTable.

    • KenwardRoger: A boolean indicating whether to use the Kenward-Roger method.

normalize_func

An optional normalization function. If provided, this function will be used to normalize the `DGEList` object. If not provided, TMM normalization (via `edgeR::calcNormFactors`) will be used by default. Must take as input the y of: y <- edgeR::DGEList(counts = raw_counts) and output the y with the normalized counts.

Value

A `voom` object, which includes the log2-counts per million (logCPM) matrix and observation-specific weights.