Skip to contents

This function generates a overrepresentation analysis report based on clustered hit levels, gene data, and specified databases. It accomplishes this by using the R package clusterProfiler. As output, you will receive a list of the plot objects it generated, and an HTML report with embedded files containing the enrichment results, and dotplots visualizing the enrichment.

Usage

run_ora(
  levels_clustered_hits,
  databases,
  report_info,
  cluster_hits_report_name,
  clusterProfiler_params = NA,
  mapping_cfg = list(method = "none", from_species = NULL, to_species = NULL),
  enrichGO_cfg = NULL,
  plot_titles = NA,
  universe = NULL,
  report_dir = here::here()
)

Arguments

levels_clustered_hits

A list of dataframes that contain the clustered hits of the different levels. When clustering_results is the variable that collects the output of the SplineOmics::cluster_hits() function, then an easy way to get this is clustering_results$clustered_hits_levels. Every element of that list is a dataframe, with the three columns feature, cluster, gene. feature contains the index number of the feature (for example a protein ), cluster is an integer specifying in which cluster this feature was placed, and gene contains the gene name. It is essential that the gene name matches the gene names used in the databases that are used for this enrichment here.

databases

A dataframe with the three columns: DB containing the database name, Geneset containng the name of the geneset, and Gene, containing the name of the gene. This dataframe can be obtained by specifying the desired Enrichr databases and downloading them to a for example .tsv file with the help of the SplineOmics::download_enrichr_databases function, and then loading this .tsv file as a dataframe. In essence, this dataframe then contains all the database info used for the gene set enrichment analysis with clusterProfiler in this function.

report_info

A list containing information for the report generation, such as omics_data_type and data_description (this is the list used for all report generating functions of this package).

cluster_hits_report_name

Single character string specifying the name of the cluster_hits() function report, that contains the results that were used for the overprepresentation analysis here. Must be specified, because otherwise, the connection is not documented.

clusterProfiler_params

A list that specifies the parameters for the clusterProfiler, such as for example: clusterProfiler_params <- list( pvalueCutoff = 0.05, pAdjustMethod = "BH", minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.2 ) (Those are all the parameters that can be controlled here). The names are equivalent to the argument names of clusterProfiler, therefore, check out the documentation of clusterProfiler for their description. When this argument is not specified, it is per default NULL, in which case default parameters for those are selected, which are equivalent to the parameter values shown in the example definition above.

mapping_cfg

A named list that controls the optional behavior of automatically mapping gene symbols across species. This is useful when your input gene symbols (e.g., from CHO cells) do not match the species used by the enrichment databases (e.g., human or mouse). By default, no mapping is performed and gene symbols are used as-is. If mapping is desired, this list must contain the following **three** elements:

method

Mapping method to use. One of `"none"` (default; no mapping), `"gprofiler"` (online, via the g:Profiler API), or `"orthogene"` (offline, if installed).

from_species

Source species code (e.g., `"cgriseus"` for CHO). Must match the expected format for the selected tool.

to_species"

Target species code (e.g., `"hsapiens"` for human). This must be the species used in your ORA database.

enrichGO_cfg

A named list specifying the configuration for running GO enrichment with Bioconductor's enrichGO. This is only needed when you want to perform GO Biological Process (BP), Molecular Function (MF), or Cellular Component (CC) enrichment using Bioconductor's organism databases (e.g., org.Mm.eg.db for mouse).

The list must be named according to the GO ontology, e.g., "GO_BP", "GO_MF", "GO_CC". Each entry must provide:

  • OrgDb: The organism database, e.g., org.Mm.eg.db.

  • keyType: The gene identifier type, e.g., "SYMBOL".

  • ontology: One of "BP", "MF", or "CC".

If enrichGO_cfg is NULL (default), no Bioconductor-based GO enrichment is performed. All enrichment runs through enricher with the provided TERM2GENE mappings.

plot_titles

Titles for the enrichment dotplots generated in the HTML report, default is NA.

universe

Enrichment background data, default is NULL. This is a parameter of clusterProfiler, for the documentation, please check the documentation of the clusterProfiler R package.

report_dir

Directory where the report will be saved, default is `here::here()`.

Value

A list of all plot objects, generated for the ora report.