Skip to contents

This function extracts a rectangular block from a dataframe using user-specified top/bottom row indices and left/right column identifiers (numeric or Excel-style letters). It ensures the block contains only numeric values or NAs, and returns a cleaned matrix.

Usage

extract_data(
  data,
  bottom_row,
  right_col,
  top_row = 1,
  left_col = 1,
  feature_name_columns = NA,
  use_row_index = FALSE
)

Arguments

data

A dataframe containing the full input, including annotation columns and the numeric block to extract.

bottom_row

Integer. Specifies the last (bottom) row of the numeric data block. Must be >= top_row.

right_col

Same format as left_col. Specifies the right-most column of the numeric block. Must be >= left_col after conversion.

top_row

Integer. Specifies the first (top) row of the numeric data block. Row indexing is 1-based.

left_col

Column specifier for the left-most column of the data block. Can be either:

  • An integer index (e.g., 2 for the second column), or

  • A character string using Excel-style letters (e.g., "A", "AB").

Column names (e.g., "age") are not allowed here. Only letters or numeric indices are accepted.

feature_name_columns

Optional character vector specifying columns in data to be used as row (feature) names in the output. If NA, generic feature names are used.

use_row_index

Logical. If TRUE, prepend the row index to each combined feature name to ensure uniqueness. Defaults to FALSE.

Value

A numeric matrix with cleaned values and appropriate column names.

Examples

# Tiny demo table with two header rows, feature columns, and numeric block
df <- data.frame(
  feat_id = c(NA, NA, "g1", "g2", "g3"),
  feat_sym = c(NA, NA, "TP53", "EGFR", "BAX"),
  A = c("cond", "t0", 1, 2, 3),
  B = c("cond", "t1", 4, 5, 6),
  C = c("ctrl", "t0", 7, 8, 9),
  D = c("ctrl", "t1", 10, 11, 12),
  check.names = FALSE
)

# Example 1: extract numeric block using Excel letters, build headers from
# the two rows above (they get collapsed like "cond_t0", "ctrl_t1", ...)
m1 <- extract_data(
  data = df,
  top_row = 3,
  bottom_row = 5,
  left_col = "A",
  right_col = "D",
  feature_name_columns = c("feat_id", "feat_sym"),
  use_row_index = FALSE
)
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
m1
#>         X1 X2 cond_t0 cond_t1
#> g1_TP53 NA NA       1       4
#> g2_EGFR NA NA       2       5
#> g3_BAX  NA NA       3       6

# Example 2: same extraction but with numeric column indices and row index
# prepended to ensure uniqueness of feature names
m2 <- extract_data(
  data = df,
  top_row = 3,
  bottom_row = 5,
  left_col = 3,
  right_col = 6,
  feature_name_columns = c("feat_id", "feat_sym"),
  use_row_index = TRUE
)
m2
#>           cond_t0 cond_t1 ctrl_t0 ctrl_t1
#> 1_g1_TP53       1       4       7      10
#> 2_g2_EGFR       2       5       8      11
#> 3_g3_BAX        3       6       9      12