Frequently Asked Questions
FAQ on data pre-processing
Does SplineOmics handle missing values?
Yes. SplineOmics can handle missing values gracefully because it uses
limma as the statistical engine for model fitting and
differential analysis. The limma package is designed to
work with incomplete expression matrices — missing values are simply
ignored when estimating model parameters and computing contrasts,
without causing errors.
This means you do not necessarily need to impute or remove features with a few missing entries before running SplineOmics. However, if a feature has too many missing values (for example, across most samples), consider removing it.
FAQ on parameter selection
Which design formula should I use?
The choice of design formula depends on your experimental setup and on whether you want to model your conditions independently or jointly.
Isolated designs (no interaction between condition and time)
Use this when you want to analyse each condition completely separately, for example, to get independent spline fits for each treatment.
In this case,SplineOmicswill simply process each dataset sequentially for convenience, so you do not need to run the code twice.
This approach is ideal when you do not expect conditions to share information or influence one another.Integrated designs (conditions modelled jointly with a time interaction)
Use this when you want to fit all conditions in a single model and allow their time trends to differ through an interaction term between condition and time.
With this approach, datasets are modelled jointly and can borrow statistical power from each other, leading to more stable variance estimates and more sensitive detection of shared temporal patterns.
This also enables access to the full range oflimmaresult categories, including category 2 and category 3 results (see the vignette on limma result categories).
Should I use B-splines or natural cubic splines?
Both spline types can model smooth trends across time or another continuous variable, but they differ in how local their flexibility is.
B-splines are locally adaptive: changing one knot affects the fitted curve only in a small neighbourhood around that knot.
This makes them ideal when you expect local variations (for example, short-term biological responses) and want the rest of the curve to remain stable. The trade-off is that B-splines typically use more degrees of freedom, so the model can become more complex.Natural cubic splines enforce global smoothness: each basis function extends across the entire range, so adjusting one part of the curve slightly influences the whole fit. They use fewer degrees of freedom and can be more stable when you expect overall smooth behaviour, but they are less suitable if local detail is important.
In short:
> Choose B-splines when local flexibility matters, and natural cubic
> splines when you want a smoother, more global trend.
How many degrees of freedom should I use for the splines?
There is no single optimal choice — the best number of degrees of
freedom (dof) depends on the smoothness and complexity of your
data.
In practice, 2 or 3 degrees of freedom work well in most cases.
Using more degrees of freedom makes the spline wigglier and increases the risk of overfitting, while using only 1 degree of freedom is usually too restrictive to capture meaningful trends.
If you set the degrees of freedom to 0, SplineOmics will
automatically determine the optimal value using leave-one-out
cross-validation, selecting the dof that provides the best predictive
performance for your dataset.
Should I use array weights?
Yes, in most cases this is recommended. Since
SplineOmics builds on limma, it inherits
support for array weights, which help to correct for heteroskedasticity
(unequal variances) often present in time-series data.
By estimating a quality weight for each sample,
SplineOmics can give less influence to noisier samples and
more weight to consistent ones, resulting in smoother spline fits and
higher statistical power.
How many clusters should I use?
There is no universal rule for choosing the optimal number of
clusters.
Using more clusters makes each cluster purer — its centroid represents
its members more precisely — but it also makes downstream interpretation
more complex and fragmented.
To help with this choice, SplineOmics provides several
aids:
For each cluster, it reports the variance explained by the cluster centroid, both as a mean value and as a distribution histogram, with guidance on how to interpret these values.
For each individual feature, it shows how well it is represented by its cluster centroid.
-
You can specify a range or custom set of cluster numbers (for example, 2–10 or {2, 5, 6}), and
SplineOmicsautomatically selects the best one using the Bayesian Information Criterion (BIC):
The model with the lowest BIC is chosen as the optimal clustering configuration.
General FAQ
Can SplineOmics handle datasets with more than two conditions?
Not directly. SplineOmics is designed for pairwise
comparisons between conditions. You can, however, analyse experiments
with more than two conditions by either:
- using an isolated design, where each condition is analysed
separately, or
- performing all pairwise comparisons among the conditions of interest.
These approaches allow you to explore multiple conditions, but they do not integrate them into a single joint model. Integrated designs are currently limited to comparisons between two conditions at a time.
Session Info
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Vienna
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 desc_1.4.3 R6_2.6.1
## [4] fastmap_1.2.0 xfun_0.52 cachem_1.1.0
## [7] knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
## [10] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
## [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4
## [16] renv_1.1.5 systemfonts_1.2.3 compiler_4.5.1
## [19] rstudioapi_0.17.1 tools_4.5.1 ragg_1.4.0
## [22] bslib_0.9.0 evaluate_1.0.4 yaml_2.3.10
## [25] BiocManager_1.30.26 jsonlite_2.0.0 htmlwidgets_1.6.4
## [28] rlang_1.1.6 fs_1.6.6