About this tutorial
This tutorial demonstrates the excursion-analysis
capabilities of the SplineOmics
package by walking through
a complete, real-world phosphoproteomics example from start to
finish.
In the context of time series data, an excursion refers to a transient deviation from a baseline — a concept borrowed from signal processing. These short-term, local shifts can reveal biologically meaningful events that are not captured by global trends.
Details of how excursions are detected are described in the section on the PVC test, a method specifically designed to identify such local patterns.
While the core strength of SplineOmics
lies in modeling
global temporal patterns using smoothing splines, many
biologically important phenomena present as localized
events. These may not be well captured by standard spline fits
due to their smooth and global nature.
For example, in phosphoproteomics data, you might observe a sharp, temporary phosphorylation spike at a specific timepoint following a stimulus — a pattern that returns to baseline shortly afterward. Such a transient peak can indicate a key signaling activation event that is functionally critical, but would be smoothed out or diluted in traditional spline-based analyses.
To handle these scenarios, we developed the PVC test (described below), which is specifically designed to detect such sharp, local excursions in time-resolved omics data.
Note 1
The documentation of all the SplineOmics package functions can be viewed here
The PVC Test
PVC stands for Peak, Valley, and Cliff. The PVC test is a method to identify these distinct local patterns (called excursions) in time series data using a compound contrast approach within the limma framework.
In this context:
- A peak occurs when a timepoint is significantly higher than both its immediate neighbors.
- A valley occurs when a timepoint is significantly lower than both its neighbors.
- A cliff describes a sharp directional shift: when one neighbor is similar in value, but the other is significantly different (higher or lower).
Method
The PVC analysis is implemented as a rolling, compound
contrast applied across the time series. Specifically, a window
of three consecutive timepoints is evaluated at a time:(Tᵢ₋₁, Tᵢ, Tᵢ₊₁)
where Tᵢ
is the center of the window.
The first and last timepoints are excluded from this analysis, since they lack both left and right neighbors.
For each triplet, a contrast is defined as:
2*Tᵢ - Tᵢ₋₁ - Tᵢ₊₁
This contrast measures whether the center point Tᵢ
deviates from both its neighbors in the same direction:
- If
Tᵢ
is greater than both neighbors → large positive contrast → indicates a peak - If
Tᵢ
is lower than both neighbors → large negative contrast → indicates a valley - If
Tᵢ
lies between its neighbors → effects cancel → contrast is near zero → no signal
This contrast is tested using limma
, and the result is a
p-value that indicates whether the observed pattern is statistically
significant for each feature (e.g., gene, phosphosite).
Multiple Testing Correction
Because the test is performed independently at each center timepoint across all features, multiple testing correction is applied at the timepoint level. This ensures that significance is not inflated due to repeated testing across features.
Importantly, PVC does not assess whether an entire feature contains a full peak/valley/cliff pattern over time — it evaluates each timepoint independently for such local patterns.
Details about the dataset
The example dataset used here involves a time-series phosphoproteomics experiment, where CHO (chinese hamster ovary) cells were cultivated in three bioreactors (three biological replicates). The experiment includes the following setup:
- Samples were taken both during the exponential and stationary growth phases.
- Samples were collected in triplicates from each reactor at defined
timepoints relative to cell feeding:
- 60 minutes before feeding
- 15, 60, 90, 120, and 240 minutes after feeding
Note that the dataset was truncated to 1000 rows for file size reasons and the annotation info, such as gene name, was randomly shuffled.
Load the packages
library(SplineOmics)
library(here) # For managing filepaths
#> here() starts at /home/thomas/Documents/PhD/projects/DGTX/SplineOmics_hub/SplineOmics
library(dplyr) # For data manipulation
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(knitr) # For Showing the head of the data and the meta tables.
Load the files
In this example, the proteomics_data.rds file contains the numeric values (the intensities) and also the feature descriptions, such as gene and protein name (= annotation part). Usually, you would load the data from for example an Excel file, but the .rds file is more compressed, which is the reason this format was chosen here to limit the size of the SplineOmics package.
The file meta.xlsx contains the meta information, which are the descriptions of the columns of the numeric values of data.
(These example files are part of the package and don’t have to be present on your system).
Please note that this dataset is an actual experimental dataset, but the annotation information, such as gene names, has been removed since it was not yet published at the time of making the SplineOmics package public. Instead, the dataset includes randomly generated gene symbols and gene names corresponding to Cricetulus griseus (Chinese Hamster) for each row. This is intended to demonstrate the functionality of the package.
The left part of data contains the numeric values, and the right part the annotation info, which can be copied in a separate dataframe, as shown below.
data <- readRDS(xzfile(system.file(
"extdata",
"phosphoproteomics_data.rds.xz",
package = "SplineOmics"
)))
meta <- read.csv(
system.file(
"extdata",
"phosphoproteomics_meta.csv",
package = "SplineOmics"
),
stringsAsFactors = FALSE
)
# Extract the annotation part from the dataframe.
first_na_col <- which(is.na(data[1, ]))[1]
annotation <- data |> dplyr::select((first_na_col + 1):ncol(data))
Show top rows of data
TP01_10 MaxLFQ Intensity | TP01_12 MaxLFQ Intensity | TP01_9 MaxLFQ Intensity | TP02_10 MaxLFQ Intensity | TP02_12 MaxLFQ Intensity | TP02_9 MaxLFQ Intensity | TP03_10 MaxLFQ Intensity | TP03_12 MaxLFQ Intensity | TP03_9 MaxLFQ Intensity | TP04_10 MaxLFQ Intensity | TP04_12 MaxLFQ Intensity | TP04_9 MaxLFQ Intensity | TP05_10 MaxLFQ Intensity | TP05_12 MaxLFQ Intensity | TP05_9 MaxLFQ Intensity | TP06_10 MaxLFQ Intensity | TP06_12 MaxLFQ Intensity | TP06_9 MaxLFQ Intensity | TP07_10 MaxLFQ Intensity | TP07_12 MaxLFQ Intensity | TP07_9 MaxLFQ Intensity | TP08_10 MaxLFQ Intensity | TP08_12 MaxLFQ Intensity | TP08_9 MaxLFQ Intensity | TP09_10 MaxLFQ Intensity | TP09_12 MaxLFQ Intensity | TP09_9 MaxLFQ Intensity | TP10_10 MaxLFQ Intensity | TP10_12 MaxLFQ Intensity | TP10_9 MaxLFQ Intensity | TP11_10 MaxLFQ Intensity | TP11_12 MaxLFQ Intensity | TP11_9 MaxLFQ Intensity | TP12_10 MaxLFQ Intensity | TP12_12 MaxLFQ Intensity | TP12_9 MaxLFQ Intensity | …37 | N..Best.Localization.Probability | T..Protein | T..Index | T..Gene | T..Protein.ID | T..Peptide |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18.8809 | 18.7333 | 18.736 | 18.6562 | 18.767 | 18.7793 | 18.6554 | 18.7401 | 18.5699 | 18.5877 | 18.7569 | 18.6083 | 18.7291 | 18.7104 | 18.5587 | 18.7192 | 18.7356 | 18.5746 | 18.7935 | 18.6219 | 18.2898 | 18.324 | 18.8198 | 18.4836 | 16.39 | 16.5426 | 18.3209 | 18.395 | 19.0248 | 18.4475 | 18.4724 | 18.7529 | 18.3047 | 18.5518 | 18.6099 | 18.3153 | NA | 0.9186 | tr|A0A3L7HQ14|A0A3L7HQ14_CRIGR | A0A3L7HUN2_S117 | Top2b | A0A3L7IN31 | NsPMAQTPPCHLPNIPGVTSPGTLIEDSK;NsPMAQtPPCHLPNIPGVTSPGTLIEDSK;RNsPMAQTPPCHLPNIPGVTSPGTLIEDSK;RNsPMAQTPPCHLPNIPGVTSPGtLIEDSK;RNsPMAQtPPCHLPNIPGVTSPGTLIEDSK |
15.3527 | NaN | NaN | NaN | NaN | NaN | 15.7263 | NaN | NaN | 15.4816 | 15.5787 | 15.598 | NaN | 14.9219 | NaN | 15.5459 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15.6507 | 15.3509 | NaN | NaN | NaN | NaN | NaN | 15.3698 | NaN | NaN | NaN | NaN | NA | 1.0000 | tr|A0A3L7HYQ7|A0A3L7HYQ7_CRIGR | A0A061I228_S641 | NA | A0A098KXG8 | NFQDsPK |
16.3484 | NaN | NaN | NaN | NaN | 16.5751 | 16.6528 | NaN | 16.0512 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 16.4891 | 16.9102 | 17.0032 | 16.7853 | NaN | NaN | NaN | 17.1439 | 17.163 | 16.2732 | 16.5507 | 17.2955 | 16.9251 | 16.5331 | 17.6113 | NaN | 16.3595 | NaN | 17.1642 | NA | 1.0000 | tr|A0A061IGN9|A0A061IGN9_CRIGR | A0A3L7GUZ0_S522 | Snip1 | A0A3L7HFU7 | QHLENDPGsNEDTDIPK |
16.2122 | 15.7589 | 15.8026 | 15.9001 | 15.4823 | 15.7732 | 16.452 | 15.884 | 15.8153 | 16.1033 | 16.2624 | 16.5246 | 15.773 | 15.9233 | 16.1753 | 15.6983 | 15.5666 | 16.251 | 15.7702 | 16.2542 | 16.307 | 15.5049 | 15.7808 | 16.5767 | 15.6377 | 16.0753 | 15.7783 | 15.6542 | 15.9858 | 16.0209 | 15.6547 | 16.1474 | 16.4154 | 16.1201 | 16.0911 | 16.1712 | NA | 0.9601 | tr|A0A061HXV6|A0A061HXV6_CRIGR | A0A3L7HQF8_S50 | Tomm34 | A0A3L7HPC6 | ASFITDEEQHAsPR;ASFITDEEQHAsPRPAPQLSR |
17.0699 | NaN | 17.3155 | NaN | 16.9908 | 16.9393 | 17.6292 | NaN | 16.9251 | 17.3094 | 17.4428 | 17.6153 | 17.301 | NaN | NaN | 17.2152 | 16.9523 | NaN | 16.1536 | NaN | 16.2584 | 15.6612 | 16.263 | 16.5353 | NaN | NaN | 16.0288 | NaN | NaN | 16.3069 | NaN | NaN | NaN | 16.3798 | 16.2436 | NaN | NA | 0.9561 | tr|A0A8C2L7B4|A0A8C2L7B4_CRIGR | A0A061IGN9_S52 | Pus3 | A0A098KXG8 | sRSPLHPTNEVPR;sRsPLHPTNEVPR |
15.0928 | NaN | 15.3313 | NaN | NaN | 15.3163 | NaN | 15.3458 | 15.8878 | NaN | NaN | NaN | NaN | NaN | 15.9505 | 15.6368 | NaN | 15.683 | 15.467 | 15.0704 | 15.6614 | 15.2963 | 15.7213 | 15.7003 | 15.0605 | 15.4058 | 16.3211 | 15.7134 | NaN | 15.9043 | 16.0438 | 15.7761 | 15.8954 | 15.5926 | 15.5007 | 16.1923 | NA | 0.9562 | tr|A0A061HY55|A0A061HY55_CRIGR | A0A061HV22_S13 | Ints1 | A0A8C2L7B4 | GGYLQGNVsGR |
Show top rows of meta
Sample.ID | Corrected.Sample.ID | Reactor | Time.Point | Phase | Time |
---|---|---|---|---|---|
TP01_9 MaxLFQ Intensity | 1 | E09 | TP01 | Exponential | -60 |
TP01_10 MaxLFQ Intensity | 2 | E10 | TP01 | Exponential | -60 |
TP01_12 MaxLFQ Intensity | 3 | E12 | TP01 | Exponential | -60 |
TP02_9 MaxLFQ Intensity | 4 | E09 | TP02 | Exponential | 15 |
TP02_10 MaxLFQ Intensity | 5 | E10 | TP02 | Exponential | 15 |
TP02_12 MaxLFQ Intensity | 6 | E12 | TP02 | Exponential | 15 |
Show top rows of annotation
N..Best.Localization.Probability | T..Protein | T..Index | T..Gene | T..Protein.ID | T..Peptide |
---|---|---|---|---|---|
0.9186 | tr|A0A3L7HQ14|A0A3L7HQ14_CRIGR | A0A3L7HUN2_S117 | Top2b | A0A3L7IN31 | NsPMAQTPPCHLPNIPGVTSPGTLIEDSK;NsPMAQtPPCHLPNIPGVTSPGTLIEDSK;RNsPMAQTPPCHLPNIPGVTSPGTLIEDSK;RNsPMAQTPPCHLPNIPGVTSPGtLIEDSK;RNsPMAQtPPCHLPNIPGVTSPGTLIEDSK |
1.0000 | tr|A0A3L7HYQ7|A0A3L7HYQ7_CRIGR | A0A061I228_S641 | NA | A0A098KXG8 | NFQDsPK |
1.0000 | tr|A0A061IGN9|A0A061IGN9_CRIGR | A0A3L7GUZ0_S522 | Snip1 | A0A3L7HFU7 | QHLENDPGsNEDTDIPK |
0.9601 | tr|A0A061HXV6|A0A061HXV6_CRIGR | A0A3L7HQF8_S50 | Tomm34 | A0A3L7HPC6 | ASFITDEEQHAsPR;ASFITDEEQHAsPRPAPQLSR |
0.9561 | tr|A0A8C2L7B4|A0A8C2L7B4_CRIGR | A0A061IGN9_S52 | Pus3 | A0A098KXG8 | sRSPLHPTNEVPR;sRsPLHPTNEVPR |
0.9562 | tr|A0A061HY55|A0A061HY55_CRIGR | A0A061HV22_S13 | Ints1 | A0A8C2L7B4 | GGYLQGNVsGR |
Bring the Inputs into the Standardized Format
data <- SplineOmics::extract_data(
# The dataframe with the numbers on the left and info on the right.
data = data,
# Use this annotation column for the feature names.
feature_name_columns = c("T..Gene"),
use_row_index = TRUE, # makes the feature names unique with row index
top_row = 1,
bottom_row = 1000,
right_col = 36,
left_col = 1
)
SplineOmics Object
# Those fields are mandatory, because we believe that when such a report is
# opened after half a year, those infos can be very helpful.
report_info <- list(
omics_data_type = "PPTX",
data_description = "Phosphoproteomics data of CHO cells",
data_collection_date = "February 2024",
analyst_name = "Thomas Rauter",
contact_info = "thomas.rauter@plus.ac.at",
project_name = "DGTX"
)
# splineomics now contains the SplineOmics object.
splineomics <- SplineOmics::create_splineomics(
data = data,
meta = meta,
annotation = annotation,
report_info = report_info,
condition = "Phase", # Column of meta that contains the levels.
meta_batch_column = "Reactor", # For batch effect removal
)
Run the PVC-analysis
plot_info = list(
y_axis_label = "log2 intensity",
time_unit = "min",
treatment_labels = list("feeding"),
treatment_timepoints = list(0)
)
# Check out the documentation of the function under the Reference tab.
excursion_plots <- SplineOmics::find_pvc(
splineomics = splineomics,
alphas = 0.025,
plot_info = plot_info,
report_dir = here::here("results", "pvc_report")
)
#> Warning: The data contains missing values (NA). Ensure that imputation or handling of missing values aligns with your analysis requirements. Note that limma can handle missing values (it just ignores them), and therefore also SplineOmics can handle them.
#> Column 'Reactor' of meta will be used to remove the batch effect for the plotting
#> Warning: Partial NA coefficients for 200 probe(s)
#>
#> Detected 1 total pattern hits for condition level: Exponential
#>
#> Summary by pattern type:
#> p: 0, v: 1, b: 0, t: 0
#>
#> Breakdown by timepoint:
#> -60: p=0; v=0; b=0; t=0
#> 15: p=0; v=0; b=0; t=0
#> 60: p=0; v=0; b=0; t=0
#> 90: p=0; v=1; b=0; t=0
#> 120: p=0; v=0; b=0; t=0
#> 240: p=0; v=0; b=0; t=0
#> Warning: Partial NA coefficients for 205 probe(s)
#>
#> Detected 10 total pattern hits for condition level: Stationary
#>
#> Summary by pattern type:
#> p: 8, v: 2, b: 0, t: 0
#>
#> Breakdown by timepoint:
#> -60: p=0; v=0; b=0; t=0
#> 15: p=0; v=0; b=0; t=0
#> 60: p=8; v=2; b=0; t=0
#> 90: p=0; v=0; b=0; t=0
#> 120: p=0; v=0; b=0; t=0
#> 240: p=0; v=0; b=0; t=0
#> Info PVC report generation completed successfully.
#> Your HTML reports are located in the directory: /home/thomas/Documents/PhD/projects/DGTX/SplineOmics_hub/SplineOmics/results/pvc_report .
#> Please note that due to embedded files, the reports might be flagged as
#> harmful by other software. Rest assured that they provide no harm.
The terminal output of the function gives information about the total amount of excursions found in each level, and also a breakdown by timepoint and type of excursion (p = peak, v = valley, b = bottom of cliff, t = top of cliff, always for the respective timepoint that is mentioned in that row). Here you can see the resulting HTML report.
Session Info
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/local/R-4.3.3/lib/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Vienna
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] knitr_1.50 dplyr_1.1.4 here_1.0.1 SplineOmics_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rdpack_2.6.4 bitops_1.0-9 gridExtra_2.3
#> [4] rlang_1.1.6 magrittr_2.0.3 clue_0.3-66
#> [7] GetoptLong_1.0.5 matrixStats_1.5.0 compiler_4.3.3
#> [10] reshape2_1.4.4 png_0.1-8 systemfonts_1.2.3
#> [13] vctrs_0.6.5 stringr_1.5.1 pkgconfig_2.0.3
#> [16] shape_1.4.6.1 crayon_1.5.3 fastmap_1.2.0
#> [19] backports_1.5.0 labeling_0.4.3 caTools_1.18.3
#> [22] rmarkdown_2.29 nloptr_2.2.1 ragg_1.4.0
#> [25] purrr_1.0.4 xfun_0.52 cachem_1.1.0
#> [28] jsonlite_2.0.0 progress_1.2.3 EnvStats_3.1.0
#> [31] remaCor_0.0.18 BiocParallel_1.36.0 broom_1.0.8
#> [34] parallel_4.3.3 prettyunits_1.2.0 cluster_2.1.8.1
#> [37] R6_2.6.1 bslib_0.9.0 stringi_1.8.7
#> [40] RColorBrewer_1.1-3 limma_3.58.1 boot_1.3-31
#> [43] car_3.1-3 numDeriv_2016.8-1.1 jquerylib_0.1.4
#> [46] Rcpp_1.0.14 iterators_1.0.14 base64enc_0.1-3
#> [49] IRanges_2.36.0 Matrix_1.6-5 splines_4.3.3
#> [52] tidyselect_1.2.1 rstudioapi_0.17.1 abind_1.4-8
#> [55] yaml_2.3.10 viridis_0.6.5 doParallel_1.0.17
#> [58] gplots_3.2.0 codetools_0.2-20 plyr_1.8.9
#> [61] lmerTest_3.1-3 lattice_0.22-7 tibble_3.2.1
#> [64] withr_3.0.2 Biobase_2.62.0 evaluate_1.0.3
#> [67] desc_1.4.3 zip_2.3.3 circlize_0.4.16
#> [70] pillar_1.10.2 BiocManager_1.30.25 carData_3.0-5
#> [73] KernSmooth_2.23-26 renv_1.1.4 foreach_1.5.2
#> [76] stats4_4.3.3 reformulas_0.4.1 generics_0.1.4
#> [79] rprojroot_2.0.4 S4Vectors_0.40.2 hms_1.1.3
#> [82] ggplot2_3.5.2 scales_1.4.0 aod_1.3.3
#> [85] minqa_1.2.8 gtools_3.9.5 RhpcBLASctl_0.23-42
#> [88] glue_1.8.0 pheatmap_1.0.12 tools_4.3.3
#> [91] fANCOVA_0.6-1 dendextend_1.19.0 variancePartition_1.32.5
#> [94] lme4_1.1-37 openxlsx_4.2.8 mvtnorm_1.3-3
#> [97] fs_1.6.6 grid_4.3.3 tidyr_1.3.1
#> [100] rbibutils_2.3 colorspace_2.1-1 nlme_3.1-168
#> [103] patchwork_1.3.0 Formula_1.2-5 cli_3.6.5
#> [106] textshaping_1.0.1 viridisLite_0.4.2 svglite_2.2.1
#> [109] ComplexHeatmap_2.18.0 corpcor_1.6.10 gtable_0.3.6
#> [112] sass_0.4.10 digest_0.6.37 BiocGenerics_0.48.1
#> [115] pbkrtest_0.5.4 ggrepel_0.9.6 rjson_0.2.23
#> [118] htmlwidgets_1.6.4 farver_2.1.2 htmltools_0.5.8.1
#> [121] pkgdown_2.1.3 lifecycle_1.0.4 GlobalOptions_0.1.2
#> [124] statmod_1.5.0 MASS_7.3-60.0.1