Performs clustering on hits from top tables generated by differential expression analysis. This function filters hits based on adjusted p-value thresholds, extracts spline coefficients for significant features, normalizes these coefficients, and applies hierarchical clustering. The results, including clustering assignments and normalized spline curves, are saved in a specified directory and compiled into an HTML report.
Usage
cluster_hits(
splineomics,
clusters,
adj_pthresholds = c(0.05),
adj_pthresh_avrg_diff_conditions = 0,
adj_pthresh_interaction_condition_time = 0,
genes = NULL,
plot_info = list(y_axis_label = "Value", time_unit = "min", treatment_labels = NA,
treatment_timepoints = NA),
plot_options = list(cluster_heatmap_columns = FALSE, meta_replicate_column = NULL),
raw_data = NULL,
report_dir = here::here(),
report = TRUE
)
Arguments
- splineomics
An S3 object of class `SplineOmics` that contains all the necessary data and parameters for the analysis, including:
data
: The original expression dataset used for differential expression analysis.meta
: A dataframe containing metadata corresponding to thedata
, must include a 'Time' column and any columns specified byconditions
.design
: A character of length 1 representing the limma design formula.condition
: Character of length 1 specifying the column name inmeta
used to define groups for analysis.spline_params
: A list of spline parameters for the analysis.meta_batch_column
: A character string specifying the column name in the metadata used for batch effect removal.meta_batch2_column
: A character string specifying the second column name in the metadata used for batch effect removal.limma_splines_result
: A list of data frames, each representing a top table from differential expression analysis, containing at least 'adj.P.Val' and expression data columns.
- clusters
Character or integer vector specifying the number of clusters
- adj_pthresholds
Numeric vector of p-value thresholds for filtering hits in each top table.
- adj_pthresh_avrg_diff_conditions
p-value threshold for the results from the average difference of the condition limma result. Per default 0 ( turned off).
- adj_pthresh_interaction_condition_time
p-value threshold for the results from the interaction of condition and time limma result. Per default 0 (turned off).
- genes
A character vector containing the gene names of the features to be analyzed.
- plot_info
List containing the elements y_axis_label (string), time_unit (string), treatment_labels (character vector), treatment_timepoints (integer vector). All can also be NA. This list is used to add this info to the spline plots. time_unit is used to label the x-axis, and treatment_labels and -timepoints are used to create vertical dashed lines, indicating the positions of the treatments (such as feeding, temperature shift, etc.).
- plot_options
List with specific fields (cluster_heatmap_columns = Bool) that allow for customization of plotting behavior.
- raw_data
Optional. Data matrix with the raw (unimputed) data, still containing NA values. When provided, it highlights the datapoints in the spline plots that originally where NA and that were imputed.
- report_dir
Character string specifying the directory path where the HTML report and any other output files should be saved.
- report
Boolean TRUE or FALSE value specifing if a report should be generated.