Skip to contents

Frequently Asked Questions


FAQ on data pre-processing

Does SplineOmics handle missing values?

Yes. SplineOmics can handle missing values gracefully because it uses limma as the statistical engine for model fitting and differential analysis. The limma package is designed to work with incomplete expression matrices — missing values are simply ignored when estimating model parameters and computing contrasts, without causing errors.

This means you do not necessarily need to impute or remove features with a few missing entries before running SplineOmics. However, if a feature has too many missing values (for example, across most samples), consider removing it.


FAQ on parameter selection

Which design formula should I use?

The choice of design formula depends on your experimental setup and on whether you want to model your conditions independently or jointly.

  • Isolated designs (no interaction between condition and time)
    Use this when you want to analyse each condition completely separately, for example, to get independent spline fits for each treatment.
    In this case, SplineOmics will simply process each dataset sequentially for convenience, so you do not need to run the code twice.
    This approach is ideal when you do not expect conditions to share information or influence one another.

  • Integrated designs (conditions modelled jointly with a time interaction)
    Use this when you want to fit all conditions in a single model and allow their time trends to differ through an interaction term between condition and time.
    With this approach, datasets are modelled jointly and can borrow statistical power from each other, leading to more stable variance estimates and more sensitive detection of shared temporal patterns.
    This also enables access to the full range of limma result categories, including category 2 and category 3 results (see the vignette on limma result categories).

Should I use B-splines or natural cubic splines?

Both spline types can model smooth trends across time or another continuous variable, but they differ in how local their flexibility is.

  • B-splines are locally adaptive: changing one knot affects the fitted curve only in a small neighbourhood around that knot.
    This makes them ideal when you expect local variations (for example, short-term biological responses) and want the rest of the curve to remain stable. The trade-off is that B-splines typically use more degrees of freedom, so the model can become more complex.

  • Natural cubic splines enforce global smoothness: each basis function extends across the entire range, so adjusting one part of the curve slightly influences the whole fit. They use fewer degrees of freedom and can be more stable when you expect overall smooth behaviour, but they are less suitable if local detail is important.

In short:
> Choose B-splines when local flexibility matters, and natural cubic > splines when you want a smoother, more global trend.

How many degrees of freedom should I use for the splines?

There is no single optimal choice — the best number of degrees of freedom (dof) depends on the smoothness and complexity of your data.
In practice, 2 or 3 degrees of freedom work well in most cases.

Using more degrees of freedom makes the spline wigglier and increases the risk of overfitting, while using only 1 degree of freedom is usually too restrictive to capture meaningful trends.

If you set the degrees of freedom to 0, SplineOmics will automatically determine the optimal value using leave-one-out cross-validation, selecting the dof that provides the best predictive performance for your dataset.

Should I use array weights?

Yes, in most cases this is recommended. Since SplineOmics builds on limma, it inherits support for array weights, which help to correct for heteroskedasticity (unequal variances) often present in time-series data.

By estimating a quality weight for each sample, SplineOmics can give less influence to noisier samples and more weight to consistent ones, resulting in smoother spline fits and higher statistical power.

How many clusters should I use?

There is no universal rule for choosing the optimal number of clusters.
Using more clusters makes each cluster purer — its centroid represents its members more precisely — but it also makes downstream interpretation more complex and fragmented.

To help with this choice, SplineOmics provides several aids:

  • For each cluster, it reports the variance explained by the cluster centroid, both as a mean value and as a distribution histogram, with guidance on how to interpret these values.

  • For each individual feature, it shows how well it is represented by its cluster centroid.

  • You can specify a range or custom set of cluster numbers (for example, 2–10 or {2, 5, 6}), and SplineOmics automatically selects the best one using the Bayesian Information Criterion (BIC):

    BIC=nobslog(tot_withinnobs)+klog(nobs)×p \text{BIC} = n_{\text{obs}} \log\left(\frac{\text{tot\_within}}{n_{\text{obs}}}\right) + k \log(n_{\text{obs}}) \times p

The model with the lowest BIC is chosen as the optimal clustering configuration.


General FAQ

Can SplineOmics handle datasets with more than two conditions?

Not directly. SplineOmics is designed for pairwise comparisons between conditions. You can, however, analyse experiments with more than two conditions by either:

  • using an isolated design, where each condition is analysed separately, or
  • performing all pairwise comparisons among the conditions of interest.

These approaches allow you to explore multiple conditions, but they do not integrate them into a single joint model. Integrated designs are currently limited to comparisons between two conditions at a time.

Session Info

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Vienna
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37       desc_1.4.3          R6_2.6.1           
##  [4] fastmap_1.2.0       xfun_0.52           cachem_1.1.0       
##  [7] knitr_1.50          htmltools_0.5.8.1   rmarkdown_2.29     
## [10] lifecycle_1.0.4     cli_3.6.5           sass_0.4.10        
## [13] pkgdown_2.1.3       textshaping_1.0.1   jquerylib_0.1.4    
## [16] renv_1.1.5          systemfonts_1.2.3   compiler_4.5.1     
## [19] rstudioapi_0.17.1   tools_4.5.1         ragg_1.4.0         
## [22] bslib_0.9.0         evaluate_1.0.4      yaml_2.3.10        
## [25] BiocManager_1.30.26 jsonlite_2.0.0      htmlwidgets_1.6.4  
## [28] rlang_1.1.6         fs_1.6.6