About this tutorial
This tutorial demonstrates the excursion-analysis
capabilities of the SplineOmics
package by walking through
a complete, real-world example from start to finish.
In the context of time series data, an excursion refers to a transient deviation from a baseline — a concept borrowed from signal processing. These short-term, local shifts can reveal biologically meaningful events that are not captured by global trends.
Details of how excursions are detected are described in the section on the PVC test, a method specifically designed to identify such local patterns.
While the core strength of SplineOmics
lies in modeling
global temporal patterns using smoothing splines, many
biologically important phenomena present as localized
events. These may not be well captured by standard spline fits
due to their smooth and global nature.
For example, in phosphoproteomics data, you might observe a sharp, temporary phosphorylation spike at a specific timepoint following a stimulus — a pattern that returns to baseline shortly afterward. Such a transient peak can indicate a key signaling activation event that is functionally critical, but would be smoothed out or diluted in traditional spline-based analyses.
To handle these scenarios, we developed the PVC test (described below), which is specifically designed to detect such sharp, local excursions in time-resolved omics data.
Note 1
The documentation of all the SplineOmics package functions can be viewed here
The PVC Test
PVC stands for Peak, Valley, and Cliff. The PVC test is a method to identify these distinct local patterns (called excursions) in time series data using a compound contrast approach within the limma framework.
In this context:
- A peak occurs when a timepoint is significantly higher than both its immediate neighbors.
- A valley occurs when a timepoint is significantly lower than both its neighbors.
- A cliff describes a sharp directional shift: when one neighbor is similar in value, but the other is significantly different (higher or lower).
Method
The PVC analysis is implemented as a rolling, compound
contrast applied across the time series. Specifically, a window
of three consecutive timepoints is evaluated at a time:(Tᵢ₋₁, Tᵢ, Tᵢ₊₁)
where Tᵢ
is the center of the window.
The first and last timepoints are excluded from this analysis, since they lack both left and right neighbors.
For each triplet, a contrast is defined as:
2*Tᵢ - Tᵢ₋₁ - Tᵢ₊₁
This contrast measures whether the center point Tᵢ
deviates from both its neighbors in the same direction:
- If
Tᵢ
is greater than both neighbors → large positive contrast → indicates a peak - If
Tᵢ
is lower than both neighbors → large negative contrast → indicates a valley - If
Tᵢ
lies between its neighbors → effects cancel → contrast is near zero → no signal
This contrast is tested using limma
, and the result is a
p-value that indicates whether the observed pattern is statistically
significant for each feature (e.g., gene, phosphosite).
Multiple Testing Correction
Because the test is performed independently at each center timepoint across all features, multiple testing correction is applied at the timepoint level. This ensures that significance is not inflated due to repeated testing across features.
Importantly, PVC does not assess whether an entire feature contains a full peak/valley/cliff pattern over time — it evaluates each timepoint independently for such local patterns.
Details about the dataset
The example dataset used here involves a time-series proteomics experiment, where CHO (chinese hamster ovary) cells were cultivated in three bioreactors (three biological replicates). The experiment includes the following setup:
- Samples were taken both during the exponential and stationary growth phases.
- Samples were collected in triplicates from each reactor at defined
timepoints relative to cell feeding:
- 60 minutes before feeding
- 15, 60, 90, 120, and 240 minutes after feeding
Load the packages
library(SplineOmics)
library(here) # For managing filepaths
#> here() starts at /home/thomas/Documents/PhD/projects/DGTX/SplineOmics_hub/SplineOmics
library(dplyr) # For data manipulation
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Load the files
In this example, the proteomics_data.rds file contains the numeric values (the intensities) and also the feature descriptions, such as gene and protein name (= annotation part). Usually, you would load the data from for example an Excel file, but the .rds file is more compressed, which is the reason this format was chosen here to limit the size of the SplineOmics package.
The file meta.xlsx contains the meta information, which are the descriptions of the columns of the numeric values of data.
(These example files are part of the package and don’t have to be present on your system).
Please note that this dataset is an actual experimental dataset, but the annotation information, such as gene names, has been removed since it was not yet published at the time of making the SplineOmics package public. Instead, the dataset includes randomly generated gene symbols and gene names corresponding to Cricetulus griseus (Chinese Hamster) for each row. This is intended to demonstrate the functionality of the package.
The left part of data contains the numeric values, and the right part the annotation info, which can be copied in a separate dataframe, as shown below.
data <- readRDS(xzfile(system.file(
"extdata",
"proteomics_data.rds.xz",
package = "SplineOmics"
)))
meta <- read.csv(
system.file(
"extdata",
"proteomics_meta.csv",
package = "SplineOmics"
),
stringsAsFactors = FALSE
)
# Extract the annotation part from the dataframe.
first_na_col <- which(is.na(data[1, ]))[1]
annotation <- data |>
dplyr::select((first_na_col + 1):ncol(data)) |>
dplyr::slice(-c(1:3))
print(head(data))
#> # A tibble: 6 × 40
#> `Sample ID` `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Reactor E09 E10 E12 E09 E10 E12 E09 E10 E12 E09 E10
#> 2 Time Point TP01 TP01 TP01 TP02 TP02 TP02 TP03 TP03 TP03 TP04 TP04
#> 3 Phase of Fe… Expo… Expo… Expo… Expo… Expo… Expo… Expo… Expo… Expo… Expo… Expo…
#> 4 NA 15.2… 15.2… 15.3… 15.1… 15.2… 15.0… 15.2… 15.2… 15.2… 15.1… 15.2…
#> 5 NA 15.0… 15.1… 15.2… 15.1… 15.1… 15.2… 15.2… 15.3… 15.2… 15.1… 15.1…
#> 6 NA 14.5… 14.7… 14.6… 14.5… 14.6… 14.6… 14.5… 14.6… 14.7… 14.5… 14.6…
#> # ℹ 28 more variables: `12` <chr>, `13` <chr>, `14` <chr>, `15` <chr>,
#> # `16` <chr>, `17` <chr>, `18` <chr>, `19` <chr>, `20` <chr>, `21` <chr>,
#> # `22` <chr>, `23` <chr>, `24` <chr>, `25` <chr>, `26` <chr>, `27` <chr>,
#> # `28` <chr>, `29` <chr>, `30` <chr>, `31` <chr>, `32` <chr>, `33` <chr>,
#> # `34` <chr>, `35` <chr>, `36` <chr>, ...38 <lgl>, Gene_symbol <chr>,
#> # Gene_name <chr>
print(meta)
#> Sample.ID Reactor Time.Point Phase Time
#> 1 E09_TP01_Exponential E09 TP01 Exponential -60
#> 2 E10_TP01_Exponential E10 TP01 Exponential -60
#> 3 E12_TP01_Exponential E12 TP01 Exponential -60
#> 4 E09_TP02_Exponential E09 TP02 Exponential 15
#> 5 E10_TP02_Exponential E10 TP02 Exponential 15
#> 6 E12_TP02_Exponential E12 TP02 Exponential 15
#> 7 E09_TP03_Exponential E09 TP03 Exponential 60
#> 8 E10_TP03_Exponential E10 TP03 Exponential 60
#> 9 E12_TP03_Exponential E12 TP03 Exponential 60
#> 10 E09_TP04_Exponential E09 TP04 Exponential 90
#> 11 E10_TP04_Exponential E10 TP04 Exponential 90
#> 12 E12_TP04_Exponential E12 TP04 Exponential 90
#> 13 E09_TP05_Exponential E09 TP05 Exponential 120
#> 14 E10_TP05_Exponential E10 TP05 Exponential 120
#> 15 E12_TP05_Exponential E12 TP05 Exponential 120
#> 16 E09_TP06_Exponential E09 TP06 Exponential 240
#> 17 E10_TP06_Exponential E10 TP06 Exponential 240
#> 18 E12_TP06_Exponential E12 TP06 Exponential 240
#> 19 E09_TP07_Stationary E09 TP07 Stationary -60
#> 20 E10_TP07_Stationary E10 TP07 Stationary -60
#> 21 E12_TP07_Stationary E12 TP07 Stationary -60
#> 22 E09_TP08_Stationary E09 TP08 Stationary 15
#> 23 E10_TP08_Stationary E10 TP08 Stationary 15
#> 24 E12_TP08_Stationary E12 TP08 Stationary 15
#> 25 E09_TP09_Stationary E09 TP09 Stationary 60
#> 26 E10_TP09_Stationary E10 TP09 Stationary 60
#> 27 E12_TP09_Stationary E12 TP09 Stationary 60
#> 28 E09_TP10_Stationary E09 TP10 Stationary 90
#> 29 E10_TP10_Stationary E10 TP10 Stationary 90
#> 30 E12_TP10_Stationary E12 TP10 Stationary 90
#> 31 E09_TP11_Stationary E09 TP11 Stationary 120
#> 32 E10_TP11_Stationary E10 TP11 Stationary 120
#> 33 E12_TP11_Stationary E12 TP11 Stationary 120
#> 34 E09_TP12_Stationary E09 TP12 Stationary 240
#> 35 E10_TP12_Stationary E10 TP12 Stationary 240
#> 36 E12_TP12_Stationary E12 TP12 Stationary 240
print(annotation)
#> # A tibble: 4,162 × 2
#> Gene_symbol Gene_name
#> <chr> <chr>
#> 1 LOC113838844 cone-rod homeobox protein-like
#> 2 Wdr83os WD repeat domain 83 opposite strand
#> 3 Cubn cubilin
#> 4 Dynlt1 dynein light chain Tctex-type 1
#> 5 Ostc oligosaccharyltransferase complex non-catalytic subunit
#> 6 Unc5cl unc-5 family C-terminal like
#> 7 Cfl1 cofilin 1
#> 8 LOC100752202 HEN methyltransferase 1
#> 9 LOC100755162 acyl-coenzyme A synthetase ACSM5, mitochondrial
#> 10 LOC100768921 40S ribosomal protein S21
#> # ℹ 4,152 more rows
Bring the Inputs into the Standardized Format
data <- SplineOmics::extract_data(
# The dataframe with the numbers on the left and info on the right.
data = data,
# Use this annotation column for the feature names.
feature_name_columns = c("Gene_name"),
top_row = 4,
bottom_row = 4165,
right_col = 37,
left_col = 2
)
SplineOmics Object
# Those fields are mandatory, because we believe that when such a report is
# opened after half a year, those infos can be very helpful.
report_info <- list(
omics_data_type = "PTX",
data_description = "Proteomics data of CHO cells",
data_collection_date = "February 2024",
analyst_name = "Thomas Rauter",
contact_info = "thomas.rauter@plus.ac.at",
project_name = "DGTX"
)
# splineomics now contains the SplineOmics object.
splineomics <- SplineOmics::create_splineomics(
data = data,
meta = meta,
annotation = annotation,
report_info = report_info,
condition = "Phase", # Column of meta that contains the levels.
meta_batch_column = "Reactor", # For batch effect removal
)
Run the PVC-analysis
plot_info = list(
y_axis_label = "log2 intensity",
time_unit = "min",
treatment_labels = list("feeding"),
treatment_timepoints = list(0)
)
#> Column 'Reactor' of meta will be used to remove the batch effect for the plotting
#>
#> Detected 0 total pattern hits for condition level: Exponential
#>
#> Summary by pattern type:
#> p: 0, v: 0, b: 0, t: 0
#>
#> Breakdown by timepoint:
#> -60: p=0; v=0; b=0; t=0
#> 15: p=0; v=0; b=0; t=0
#> 60: p=0; v=0; b=0; t=0
#> 90: p=0; v=0; b=0; t=0
#> 120: p=0; v=0; b=0; t=0
#> 240: p=0; v=0; b=0; t=0
#>
#> Detected 0 total pattern hits for condition level: Stationary
#>
#> Summary by pattern type:
#> p: 0, v: 0, b: 0, t: 0
#>
#> Breakdown by timepoint:
#> -60: p=0; v=0; b=0; t=0
#> 15: p=0; v=0; b=0; t=0
#> 60: p=0; v=0; b=0; t=0
#> 90: p=0; v=0; b=0; t=0
#> 120: p=0; v=0; b=0; t=0
#> 240: p=0; v=0; b=0; t=0
#> Info PVC report generation completed successfully.
#> Your HTML reports are located in the directory: /home/thomas/Documents/PhD/projects/DGTX/SplineOmics_hub/SplineOmics/results/pvc_report .
#> Please note that due to embedded files, the reports might be flagged as
#> harmful by other software. Rest assured that they provide no harm.
The terminal output of the function gives information about the total amount of excursions found in each level, and also a breakdown by timepoint and type of excursion (p = peak, v = valley, b = bottom of cliff, t = top of cliff, always for the respective timepoint that is mentioned in that row). Here you can see the resulting HTML report.
Session Info
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/local/R-4.3.3/lib/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Vienna
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 here_1.0.1 SplineOmics_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rdpack_2.6.2 bitops_1.0-9 gridExtra_2.3
#> [4] rlang_1.1.5 magrittr_2.0.3 clue_0.3-66
#> [7] GetoptLong_1.0.5 matrixStats_1.5.0 compiler_4.3.3
#> [10] reshape2_1.4.4 png_0.1-8 systemfonts_1.2.1
#> [13] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
#> [16] shape_1.4.6.1 crayon_1.5.3 fastmap_1.2.0
#> [19] backports_1.5.0 utf8_1.2.4 caTools_1.18.3
#> [22] rmarkdown_2.29 nloptr_2.1.1 ragg_1.3.3
#> [25] purrr_1.0.4 xfun_0.50 cachem_1.1.0
#> [28] jsonlite_1.8.9 progress_1.2.3 EnvStats_3.0.0
#> [31] remaCor_0.0.18 BiocParallel_1.36.0 broom_1.0.7
#> [34] parallel_4.3.3 prettyunits_1.2.0 cluster_2.1.6
#> [37] R6_2.6.1 bslib_0.9.0 stringi_1.8.4
#> [40] RColorBrewer_1.1-3 limma_3.58.1 boot_1.3-29
#> [43] numDeriv_2016.8-1.1 jquerylib_0.1.4 Rcpp_1.0.14
#> [46] iterators_1.0.14 knitr_1.49 base64enc_0.1-3
#> [49] IRanges_2.36.0 Matrix_1.6-5 splines_4.3.3
#> [52] tidyselect_1.2.1 rstudioapi_0.17.1 yaml_2.3.10
#> [55] viridis_0.6.5 doParallel_1.0.17 gplots_3.2.0
#> [58] codetools_0.2-19 plyr_1.8.9 lmerTest_3.1-3
#> [61] lattice_0.22-5 tibble_3.2.1 withr_3.0.2
#> [64] Biobase_2.62.0 evaluate_1.0.3 desc_1.4.3
#> [67] zip_2.3.2 circlize_0.4.16 pillar_1.10.1
#> [70] BiocManager_1.30.25 KernSmooth_2.23-22 renv_1.1.1
#> [73] foreach_1.5.2 stats4_4.3.3 reformulas_0.4.0
#> [76] generics_0.1.3 rprojroot_2.0.4 S4Vectors_0.40.2
#> [79] hms_1.1.3 ggplot2_3.5.1 munsell_0.5.1
#> [82] scales_1.3.0 aod_1.3.3 minqa_1.2.8
#> [85] gtools_3.9.5 RhpcBLASctl_0.23-42 glue_1.8.0
#> [88] pheatmap_1.0.12 tools_4.3.3 fANCOVA_0.6-1
#> [91] dendextend_1.19.0 variancePartition_1.32.5 lme4_1.1-36
#> [94] openxlsx_4.2.8 mvtnorm_1.3-3 fs_1.6.5
#> [97] grid_4.3.3 tidyr_1.3.1 rbibutils_2.3
#> [100] colorspace_2.1-1 nlme_3.1-164 patchwork_1.3.0
#> [103] cli_3.6.4 textshaping_1.0.0 viridisLite_0.4.2
#> [106] svglite_2.1.3 ComplexHeatmap_2.18.0 corpcor_1.6.10
#> [109] gtable_0.3.6 sass_0.4.9 digest_0.6.37
#> [112] BiocGenerics_0.48.1 pbkrtest_0.5.3 ggrepel_0.9.6
#> [115] rjson_0.2.23 htmlwidgets_1.6.4 farver_2.1.2
#> [118] htmltools_0.5.8.1 pkgdown_2.1.1 lifecycle_1.0.4
#> [121] GlobalOptions_0.1.2 statmod_1.5.0 MASS_7.3-60.0.1