cluster_hits()

Performs clustering on hits from top tables generated by differential expression analysis. This function filters hits based on adjusted p-value thresholds, extracts spline coefficients for significant features, normalizes these coefficients, and applies hierarchical clustering. The results, including clustering assignments and normalized spline curves, are saved in a specified directory and compiled into an HTML report.

Usage

cluster_hits(
  splineomics,
  clusters,
  adj_pthresholds = c(0.05),
  adj_pthresh_avrg_diff_conditions = 0,
  adj_pthresh_interaction_condition_time = 0,
  genes = NULL,
  plot_info = list(y_axis_label = "Value", time_unit = "min", treatment_labels = NA,
    treatment_timepoints = NA),
  plot_options = list(cluster_heatmap_columns = FALSE, meta_replicate_column = NULL),
  raw_data = NULL,
  report_dir = here::here(),
  report = TRUE
)

Arguments

splineomics

An S3 object of class `SplineOmics` that contains all the necessary data and parameters for the analysis, including:

data: The original expression dataset used for differential expression analysis.
meta: A dataframe containing metadata corresponding to the data, must include a 'Time' column and any columns specified by conditions.
design: A character of length 1 representing the limma design formula.
condition: Character of length 1 specifying the column name in meta used to define groups for analysis.
spline_params: A list of spline parameters for the analysis.
meta_batch_column: A character string specifying the column name in the metadata used for batch effect removal.
meta_batch2_column: A character string specifying the second column name in the metadata used for batch effect removal.
limma_splines_result: A list of data frames, each representing a top table from differential expression analysis, containing at least 'adj.P.Val' and expression data columns.

clusters

Character or integer vector specifying the number of clusters

adj_pthresholds

Numeric vector of p-value thresholds for filtering hits in each top table.

adj_pthresh_avrg_diff_conditions

p-value threshold for the results from the average difference of the condition limma result. Per default 0 ( turned off).

adj_pthresh_interaction_condition_time

p-value threshold for the results from the interaction of condition and time limma result. Per default 0 (turned off).

genes

A character vector containing the gene names of the features to be analyzed.

plot_info

List containing the elements y_axis_label (string), time_unit (string), treatment_labels (character vector), treatment_timepoints (integer vector). All can also be NA. This list is used to add this info to the spline plots. time_unit is used to label the x-axis, and treatment_labels and -timepoints are used to create vertical dashed lines, indicating the positions of the treatments (such as feeding, temperature shift, etc.).

plot_options

List with specific fields (cluster_heatmap_columns = Bool) that allow for customization of plotting behavior.

raw_data

Optional. Data matrix with the raw (unimputed) data, still containing NA values. When provided, it highlights the datapoints in the spline plots that originally where NA and that were imputed.

report_dir

Character string specifying the directory path where the HTML report and any other output files should be saved.

report

Boolean TRUE or FALSE value specifing if a report should be generated.

Value

A list where each element corresponds to a group factor and contains the clustering results, including `clustered_hits` data frame, hierarchical clustering object `hc`, `curve_values` data frame with normalized spline curves, and `top_table` with cluster assignments.

Usage

Arguments

Value

See also