| Title: | Estimate Survival Data with Data Integration |
|---|---|
| Description: | Provides flexible and efficient tools for integrating external risk scores into Cox proportional hazards models while accounting for population heterogeneity. Enables robust estimation, improved predictive accuracy, and user-friendly workflows for modern survival analysis. For more information, see Wang et al. (2023) <doi:10.48550/arXiv.2302.11123>. |
| Authors: | Yubo Shao [aut, cre], Lingfeng Luo [aut], Xiaohan Liu [aut], Junyi Qiu [aut], Di Wang [aut], Kevin He [aut] |
| Maintainer: | Yubo Shao <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-06-06 05:23:36 UTC |
| Source: | https://github.com/um-kevinhe/survkl |
Computes individual survival probabilities from a fitted linear predictor
z%*%beta using a stratified Breslow-type baseline hazard estimate.
cal_surv_prob(z, delta, time, beta, stratum)cal_surv_prob(z, delta, time, beta, stratum)
z |
A numeric matrix (or data frame coercible to matrix) of covariates. Each row is an observation and each column a predictor. |
delta |
A numeric vector of event indicators (1 = event, 0 = censored). |
time |
A numeric vector of observed times (event or censoring). |
beta |
A numeric vector of regression coefficients with length equal to
the number of columns in |
stratum |
An optional vector specifying the stratum for each observation. If missing, a single-stratum model is assumed. |
Inputs are internally sorted by stratum and time. Within each
stratum, a baseline hazard increment is computed as delta/S0, where
S0 is the risk set sum returned by ddloglik_S0. The stratified
baseline cumulative hazard Lambda0 is then formed by a cumulative sum
within stratum, and individual survival curves are computed as
S(t) = exp(-Lambda0(t) * exp(z %*% beta)).
A numeric matrix of survival probabilities with nrow(z) rows and
length(time) columns. Rows correspond to observations; columns are in
the internal sorted order of (stratum, time) (i.e., not collapsed to
unique event times). Entry S[i, j] is the estimated survival
probability for subject i evaluated at the j-th sorted time
point.
coxkl ObjectExtracts the estimated regression coefficients (beta) from a fitted
coxkl object. Optionally, a value (or vector) of eta can be
supplied. If the requested eta values are not in the fitted sequence,
linear interpolation is performed between the nearest neighboring eta
values; out-of-range requests error.
## S3 method for class 'coxkl' coef(object, eta = NULL, ...)## S3 method for class 'coxkl' coef(object, eta = NULL, ...)
object |
An object of class |
eta |
Optional numeric value or vector specifying the |
... |
Additional arguments (currently ignored). |
A numeric matrix of regression coefficients.
Each column corresponds to one value of eta, sorted in ascending order.
data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_list <- generate_eta(method = "exponential", n = 5, max_eta = 5) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_list) coef(model)data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_list <- generate_eta(method = "exponential", n = 5, max_eta = 5) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_list) coef(model)
coxkl_enet ObjectExtracts the estimated regression coefficients (beta) from a fitted
coxkl_enet object. Optionally, one or more lambda values can be
supplied. If requested lambda values are not in the fitted sequence,
linear interpolation is performed between nearest neighbors; out-of-range
requests error.
## S3 method for class 'coxkl_enet' coef(object, lambda = NULL, ...)## S3 method for class 'coxkl_enet' coef(object, lambda = NULL, ...)
object |
An object of class |
lambda |
Optional numeric value or vector specifying the regularization
parameter(s) for which to extract (or interpolate) coefficients. If |
... |
Additional arguments (currently ignored). |
A numeric matrix of regression coefficients; each column corresponds to one
value of lambda, sorted in descending order.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external enet_model <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1, alpha = 1.0) coef(enet_model)[1:5, 1:10]data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external enet_model <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1, alpha = 1.0) coef(enet_model)[1:5, 1:10]
coxkl_ridge ObjectExtracts the estimated regression coefficients (beta) from a fitted
coxkl_ridge object. Optionally, one or more lambda values can be
supplied. If requested lambda values are not in the fitted sequence,
linear interpolation is performed between nearest neighbors; out-of-range
requests error.
## S3 method for class 'coxkl_ridge' coef(object, lambda = NULL, ...)## S3 method for class 'coxkl_ridge' coef(object, lambda = NULL, ...)
object |
An object of class |
lambda |
Optional numeric value or vector specifying the regularization
parameter(s) for which to extract (or interpolate) coefficients. If |
... |
Additional arguments (currently ignored). |
A numeric matrix of regression coefficients.
Each column corresponds to one value of lambda, sorted in descending order.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1) coef(model_ridge)[1:5, 1:10]data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1) coef(model_ridge)[1:5, 1:10]
Fits a Cox proportional hazards model that incorporates external information
via a Kullback–Leibler (KL) divergence penalty. External information can be
supplied either as external risk scores (RS) or as external coefficients
(beta). The tuning parameter(s) etas control the strength of integration.
coxkl( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, tol = 1e-04, Mstop = 100, backtrack = FALSE, message = FALSE, data_sorted = FALSE, beta_initial = NULL )coxkl( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, tol = 1e-04, Mstop = 100, backtrack = FALSE, message = FALSE, data_sorted = FALSE, beta_initial = NULL )
z |
Numeric matrix of covariates with rows representing observations and columns representing predictor variables. All covariates must be numeric. |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed event or censoring times. No sorting required. |
stratum |
Optional numeric or factor vector defining strata. |
RS |
Optional numeric vector or matrix of external risk scores. Length
(or number of rows) must equal the number of observations. If not supplied,
|
beta |
Optional numeric vector of external coefficients (e.g., from prior
studies). Length must equal the number of columns in |
etas |
Numeric vector of tuning parameters controlling the reliance on external information. Larger values place more weight on the external source. |
tol |
Convergence tolerance for the optimization algorithm. Default is
|
Mstop |
Maximum number of iterations for the optimization algorithm.
Default is |
backtrack |
Logical; if |
message |
Logical; if |
data_sorted |
Logical; if |
beta_initial |
Optional numeric vector of length |
If beta is supplied (length ncol(z)), external risk scores are computed
internally as RS = z %*% beta. If RS is supplied, it is used directly.
Data are optionally sorted by stratum (or a single stratum if NULL) and
increasing time when data_sorted = FALSE. Estimation proceeds over the
sorted data, and the returned linear.predictors are mapped back to the
original order. Optimization uses warm starts across the (ascending) etas
grid and supports backtracking line search when backtrack = TRUE.
Internally, the routine computes a stratum-wise adjusted event indicator
(delta_tilde) and maximizes a KL-regularized partial likelihood. The current
implementation fixes lambda = 0 in the low-level optimizer and exposes
etas as the primary tuning control.
An object of class "coxkl" containing:
eta: the fitted sequence.
beta: estimated coefficient matrix ().
linear.predictors: matrix of linear predictors.
likelihood: vector of partial likelihoods.
data: a list containing the input data used in fitting
(z, time, delta, stratum, data_sorted).
data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_list <- generate_eta(method = "exponential", n = 10, max_eta = 5) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_list)data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_list <- generate_eta(method = "exponential", n = 10, max_eta = 5) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_list)
Fits a Cox proportional hazards model that incorporates external information
using Kullback–Leibler (KL) divergence, with an optional L1 (Lasso) or elastic net penalty on
the coefficients. External information can be supplied either as precomputed external
risk scores (RS) or as externally derived coefficients (beta). The integration
strength is controlled by the tuning parameter eta.
coxkl_enet( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, eta = NULL, alpha = NULL, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n < p, 0.05, 0.001), lambda.early.stop = FALSE, tol = 1e-04, Mstop = 1000, max.total.iter = (Mstop * nlambda), group = 1:ncol(z), group.multiplier = NULL, standardize = TRUE, nvar.max = ncol(z), group.max = length(unique(group)), stop.loss.ratio = 0.001, actSet = TRUE, actIter = Mstop, actGroupNum = sum(unique(group) != 0), actSetRemove = FALSE, returnX = FALSE, trace.lambda = FALSE, message = FALSE, data_sorted = FALSE, ... )coxkl_enet( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, eta = NULL, alpha = NULL, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n < p, 0.05, 0.001), lambda.early.stop = FALSE, tol = 1e-04, Mstop = 1000, max.total.iter = (Mstop * nlambda), group = 1:ncol(z), group.multiplier = NULL, standardize = TRUE, nvar.max = ncol(z), group.max = length(unique(group)), stop.loss.ratio = 0.001, actSet = TRUE, actIter = Mstop, actGroupNum = sum(unique(group) != 0), actSetRemove = FALSE, returnX = FALSE, trace.lambda = FALSE, message = FALSE, data_sorted = FALSE, ... )
z |
Numeric matrix of covariates with rows representing observations and columns representing predictor variables. All covariates must be numeric. |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed event or censoring times. No sorting required. |
stratum |
Optional numeric or factor vector defining strata. |
RS |
Optional numeric vector or matrix of external risk scores. Length
(or number of rows) must equal the number of observations. If not supplied,
|
beta |
Optional numeric vector of external coefficients (e.g., from prior
studies). Length must equal the number of columns in |
eta |
Numeric tuning parameter controlling the reliance on external information. Larger values place more weight on the external source. |
alpha |
Elastic-net mixing parameter in |
lambda |
Optional nonnegative penalty parameter(s). If a numeric vector
is supplied, the path is taken as-is. If |
nlambda |
Integer number of lambda values to generate when |
lambda.min.ratio |
Ratio of the smallest to the largest lambda when
generating a sequence (when |
lambda.early.stop |
Logical; if |
tol |
Convergence tolerance for the optimization algorithm. Default is
|
Mstop |
Maximum number of iterations for the inner optimization at a
given lambda. Default is |
max.total.iter |
Maximum total iterations across the entire lambda path.
Default is |
group |
Integer vector of group indices defining group
membership of predictors for grouped penalties; use |
group.multiplier |
A vector of values representing multiplicative factors by which each covariate's penalty is to be multiplied. Default is a vector of 1's. |
standardize |
Logical; if |
nvar.max |
Integer cap on the number of active variables allowed during fitting. Default number of predictors. |
group.max |
Integer cap on the number of active groups allowed during fitting. Default total number of groups. |
stop.loss.ratio |
Relative improvement threshold for early stopping along
the path; optimization may stop if objective gain falls below this value.
Default |
actSet |
Logical; if |
actIter |
Maximum number of active-set refinement iterations per lambda.
Default |
actGroupNum |
Maximum number of active groups allowed under the active-set scheme. |
actSetRemove |
Logical; if |
returnX |
Logical; if |
trace.lambda |
Logical; if |
message |
Logical; if |
data_sorted |
Logical; if |
... |
Additional arguments. |
Setting lambda = 0 reduces to the unpenalized coxkl model.
When lambda > 0, the model fits a KL-regularized Cox objective with an
elastic-net penalty:
where gives lasso and gives elastic net. Grouped
penalties are supported via group (use 0 for unpenalized variables), with optional
per-group scaling through group.multiplier. If lambda is NULL, a decreasing path
of length nlambda is generated using lambda.min.ratio; early stopping can prune the
path (lambda.early.stop, stop.loss.ratio). When standardize = TRUE, predictors are
standardized for fitting and coefficients are rescaled on output. If data_sorted = FALSE,
data are sorted by stratum then time for optimization and predictions are returned in
the original order (reported via W = exp(linear predictors)). An active-set scheme
(actSet, actIter, nvar.max, group.max, actGroupNum, actSetRemove) is used to
accelerate the solution along the lambda path.
An object of class "coxkl_enet", a list with components:
betaCoefficient estimates (vector or matrix across the path).
groupA factor of the original group assignments.
lambdaThe lambda value(s) used or generated.
alphaThe elastic-net mixing parameter used.
likelihoodVector of log-partial likelihoods for each lambda.
nNumber of observations.
dfEffective degrees of freedom (e.g., number of nonzero coefficients or group-adjusted count) along the path.
iterNumber of iterations taken (per lambda and/or total).
WExponentiated linear predictors on the original scale.
group.multiplierGroup-specific penalty multipliers used.
returnXOnly when returnX = TRUE: a list with elements
XX (standardization/orthogonalization info from std.Z),
time, delta, stratum, and RS.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_enet <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 0, alpha = 1.0)data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_enet <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 0, alpha = 1.0)
Fits a Cox proportional hazards model using a ridge-type penalty (L2) on all covariates.
The model can integrate external information either as precomputed risk scores (RS)
or externally supplied coefficients (beta). A tuning parameter eta controls the
relative weight of the external information. If lambda is not provided, a lambda
sequence is automatically generated.
coxkl_ridge( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, eta = NULL, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n_obs < n_vars, 0.01, 1e-04), penalty.factor = 0.999, tol = 1e-04, Mstop = 50, backtrack = FALSE, message = FALSE, data_sorted = FALSE, beta_initial = NULL, ... )coxkl_ridge( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, eta = NULL, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n_obs < n_vars, 0.01, 1e-04), penalty.factor = 0.999, tol = 1e-04, Mstop = 50, backtrack = FALSE, message = FALSE, data_sorted = FALSE, beta_initial = NULL, ... )
z |
Numeric matrix of covariates (observations in rows, predictors in columns). |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed times. |
stratum |
Optional numeric or factor vector specifying strata. |
RS |
Optional numeric vector or matrix of external risk scores. |
beta |
Optional numeric vector of externally derived coefficients. |
eta |
Non-negative scalar controlling the strength of external information. |
lambda |
Optional numeric scalar or vector of penalty parameters. If |
nlambda |
Number of lambda values to generate if |
lambda.min.ratio |
Ratio defining the minimum lambda relative to |
penalty.factor |
Numeric scalar in |
tol |
Convergence tolerance for the iterative estimation algorithm. |
Mstop |
Maximum number of iterations for estimation. |
backtrack |
Logical; if |
message |
Logical; if |
data_sorted |
Logical; if |
beta_initial |
Optional; default NULL. When NULL, the algorithm initializes beta_initial to a zero vector as a warm start |
... |
Additional arguments. |
The estimator maximizes a KL-regularized Cox partial log-likelihood with a ridge (L2) penalty on all coefficients.
External information is incorporated via a KL term weighted by eta: if beta is supplied (length ncol(z)),
external risk scores are computed internally as RS = z %*% beta; otherwise RS must be provided.
If lambda is NULL, a decreasing lambda path of length nlambda is generated using lambda.min.ratio
(its overall scale is influenced by penalty.factor). Optimization proceeds along the lambda path with warm starts
(re-using the previous solution as beta_initial); when beta_initial = NULL, the first step uses zeros.
If data_sorted = FALSE, data are sorted by stratum and time for fitting and the returned linear predictors are
mapped back to the original observation order. tol, Mstop, and backtrack control convergence and line search.
An object of class "coxkl_ridge" containing:
lambda: The lambda sequence used for estimation.
beta: Matrix of estimated coefficients for each lambda.
linear.predictors: Matrix of linear predictors.
likelihood: Vector of log-partial likelihoods.
data: A list containing the input data used in fitting
(z, time, delta, stratum, data_sorted).
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim)data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim)
eta) for the Cox–KL ModelPerforms K-fold cross-validation to select the integration parameter eta
for the Cox–KL model. Each fold fits the model on a training split and
evaluates on the held-out split using the specified performance criterion.
cv.coxkl( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas = NULL, tol = 1e-04, Mstop = 100, backtrack = FALSE, nfolds = 5, criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )cv.coxkl( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas = NULL, tol = 1e-04, Mstop = 100, backtrack = FALSE, nfolds = 5, criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )
z |
Numeric matrix of covariates (rows = observations, columns = variables). |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed event or censoring times. |
stratum |
Optional numeric or factor vector defining strata. If |
RS |
Optional numeric vector or matrix of external risk scores. If omitted,
|
beta |
Optional numeric vector of external coefficients. If omitted, |
etas |
Numeric vector of candidate tuning values to be cross-validated. (required). Values are internally sorted in ascending order. |
tol |
Convergence tolerance for the optimizer used inside |
Mstop |
Maximum number of Newton iterations used inside |
backtrack |
Logical; if |
nfolds |
Number of cross-validation folds. Default |
criteria |
Character string specifying the performance criterion.
Choices are |
c_index_stratum |
Optional stratum vector. Only required when
|
message |
Logical; if |
seed |
Optional integer seed for reproducible fold assignment. Default |
... |
Additional arguments passed to |
External information is required: supply either RS or beta (if beta is given,
RS is computed as z %*% beta). Folds are created with stratification by
stratum and censoring status. Within each fold and each candidate eta,
the function fits coxkl on the training split with warm-starts initialized to zero
and evaluates on the test split:
"V&VH": uses the difference of partial log-likelihoods between full and
training fits; reported as times the aggregated quantity.
"LinPred": aggregates the test-split linear predictors across folds and
evaluates times the partial log-likelihood on the full data.
"CIndex_pooled": pools pairwise comparable counts across folds (numerator/denominator).
"CIndex_foldaverage": averages the per-fold stratified C-index.
The function also computes an external baseline statistic from RS using the
same criterion for comparison.
An object of class "cv.coxkl" with components:
internal_statA data.frame with one row per eta containing eta and the
cross-validated measure named according to criteria (one of
VVH_Loss, LinPred_Loss, CIndex_pooled, CIndex_foldaverage).
external_statScalar baseline statistic computed from RS under the same criteria.
criteriaThe evaluation criterion used.
nfoldsNumber of folds.
data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good etas <- generate_eta(method = "exponential", n = 10, max_eta = 5) cv_res <- cv.coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, beta = beta_external_good_lowdim, etas = etas)data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good etas <- generate_eta(method = "exponential", n = 10, max_eta = 5) cv_res <- cv.coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, beta = beta_external_good_lowdim, etas = etas)
This function performs cross-validation on the high-dimensional Cox model with
Kullback–Leibler (KL) penalty.
It tunes the parameter eta (external information weight) using user-specified
cross-validation criteria, while also evaluating a lambda path
(either provided or generated) and selecting the best lambda per eta.
cv.coxkl_enet( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, alpha = 1, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n < p, 0.05, 0.001), nfolds = 5, cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )cv.coxkl_enet( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, alpha = 1, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n < p, 0.05, 0.001), nfolds = 5, cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )
z |
Numeric matrix of covariates with rows representing individuals and columns representing predictors. |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed times (event or censoring). |
stratum |
Optional factor or numeric vector indicating strata. |
RS |
Optional numeric vector or matrix of external risk scores. If not provided,
|
beta |
Optional numeric vector of external coefficients (length equal to
|
etas |
Numeric vector of candidate |
alpha |
Elastic-net mixing parameter in |
lambda |
Optional numeric scalar or vector of penalty parameters. If |
nlambda |
Integer number of lambda values to generate when |
lambda.min.ratio |
Ratio of the smallest to the largest lambda when
generating a sequence (when |
nfolds |
Integer; number of cross-validation folds. Default = |
cv.criteria |
Character string specifying the cross-validation criterion. Choices are:
|
c_index_stratum |
Optional stratum vector. Used only when
|
message |
Logical; whether to print progress messages. Default = |
seed |
Optional integer random seed for fold assignment. |
... |
Additional arguments passed to |
Data are sorted by stratum and time. External info must be from RS or
beta (if beta given with length ncol(z), RS = z %*% beta); alpha .
For each candidate eta, a decreasing lambda path is used (generated from nlambda/lambda.min.ratio
if lambda = NULL); CV folds are created by get_fold. Each fold fits coxkl_enet
on the training split (full lambda path) and evaluates the chosen criterion on the test split.
Aggregation follows the code paths for "V&VH", "LinPred", "CIndex_pooled", or "CIndex_foldaverage":
"V&VH": sums pl(full) - pl(train) across folds (reported as loss via Loss = -2 * score).
"LinPred": aggregates test-fold linear predictors and evaluates partial log-likelihood on full data (reported as Loss = -2 * score).
"CIndex_pooled": pools comparable-pair numerators/denominators across folds to compute one C-index.
"CIndex_foldaverage": averages the per-fold stratified C-index.
The best lambda is selected per eta (min loss / max C-index), and the function returns full results,
the per-eta optimum, corresponding coefficients, and an external baseline from RS.
An object of class "cv.coxkl_enet":
integrated_stat.full_resultsData frame with columns
eta, lambda, and the aggregated CV score for each lambda
under the chosen cv.criteria. For loss criteria, an additional
column with the transformed loss (Loss = -2 * score); for C-index criteria,
a column named CIndex_pooled or CIndex_foldaverage.
integrated_stat.best_per_etaData frame with the best
lambda (per eta) according to the chosen cv.criteria
(minimizing loss or maximizing C-index).
integrated_stat.betahat_bestMatrix of coefficient vectors
(columns) corresponding to the best lambda for each eta.
external_statScalar baseline statistic computed from the external
risk score RS under the same cv.criteria.
criteriaThe evaluation criterion used (as provided in cv.criteria).
alphaThe elastic-net mixing parameter used.
nfoldsNumber of folds.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external etas <- generate_eta(method = "exponential", n = 10, max_eta = 100) cv_res <- cv.coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, stratum = NULL, RS = NULL, beta = beta_external_highdim, etas = etas, alpha = 1.0)data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external etas <- generate_eta(method = "exponential", n = 10, max_eta = 100) cv_res <- cv.coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, stratum = NULL, RS = NULL, beta = beta_external_highdim, etas = etas, alpha = 1.0)
This function performs cross-validation on the Cox model with Kullback–Leibler (KL)
penalty and ridge (L2) regularization. It tunes the parameter eta
(external information weight) using user-specified cross-validation criteria,
while internally evaluating a lambda path (provided or generated) and
selecting the best lambda per eta
cv.coxkl_ridge( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n_obs < n_vars, 0.01, 1e-04), nfolds = 5, cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )cv.coxkl_ridge( z, delta, time, stratum = NULL, RS = NULL, beta = NULL, etas, lambda = NULL, nlambda = 100, lambda.min.ratio = ifelse(n_obs < n_vars, 0.01, 1e-04), nfolds = 5, cv.criteria = c("V&VH", "LinPred", "CIndex_pooled", "CIndex_foldaverage"), c_index_stratum = NULL, message = FALSE, seed = NULL, ... )
z |
Numeric matrix of covariates with rows representing individuals and columns representing predictors. |
delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
time |
Numeric vector of observed times (event or censoring). |
stratum |
Optional factor or numeric vector indicating strata. |
RS |
Optional numeric vector or matrix of external risk scores. If not provided,
|
beta |
Optional numeric vector of external coefficients (length equal to
|
etas |
Numeric vector of candidate |
lambda |
Optional numeric scalar or vector of penalty parameters. If |
nlambda |
Integer number of lambda values to generate when |
lambda.min.ratio |
Ratio of the smallest to the largest lambda when generating a sequence
(when |
nfolds |
Integer; number of cross-validation folds. Default |
cv.criteria |
Character string specifying the cross-validation criterion. Choices are:
|
c_index_stratum |
Optional stratum vector. Used only when
|
message |
Logical; whether to print progress messages. Default |
seed |
Optional integer random seed for fold assignment. |
... |
Additional arguments passed to |
Data are sorted by stratum and time. External information must be given via RS
or beta (if beta has length ncol(z), the function computes RS = z %*% beta).
For each candidate eta, a lambda path is determined (generated if lambda = NULL, otherwise
the supplied lambda values are sorted decreasingly). Cross-validation folds are created by get_fold.
In each fold, coxkl_ridge is fit on the training split across the full lambda path
with data_sorted = TRUE, and the chosen criterion is evaluated on the test split and aggregated:
"V&VH": sums pl(full) - pl(train) across folds (reported as loss via Loss = -2 * score).
"LinPred": aggregates test-fold linear predictors and evaluates partial log-likelihood on full data (reported as Loss = -2 * score).
"CIndex_pooled": pools comparable-pair numerators/denominators across folds to compute one C-index.
"CIndex_foldaverage": averages the per-fold stratified C-index.
The best lambda is chosen per eta (minimizing loss or maximizing C-index). The function
also computes an external baseline statistic from RS under the same criterion.
An object of class "cv.coxkl_ridge":
integrated_stat.full_resultsData frame with columns eta, lambda, and the aggregated CV score per lambda;
for loss criteria an additional column Loss = -2 * score; for C-index criteria a column named CIndex_pooled or CIndex_foldaverage.
integrated_stat.best_per_etaData frame with the best lambda (per eta) according to the chosen criterion.
external_statScalar baseline statistic computed from RS under the same cv.criteria.
criteriaThe evaluation criterion used.
nfoldsNumber of folds.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external etas <- generate_eta(method = "exponential", n = 10, max_eta = 100) cv_res <- cv.coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, etas = etas)data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train beta_external_highdim <- ExampleData_highdim$beta_external etas <- generate_eta(method = "exponential", n = 10, max_eta = 100) cv_res <- cv.coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, etas = etas)
Plots cross-validated performance across eta for
cv.coxkl, cv.coxkl_ridge, or cv.coxkl_enet results.
The main CV curve is drawn as a solid purple line; a green dotted horizontal
reference line is placed at the value corresponding to eta = 0
(or the closest available eta), with a solid green point marking that
reference level.
cv.plot(object, line_color = "#7570B3", baseline_color = "#1B9E77", ...)cv.plot(object, line_color = "#7570B3", baseline_color = "#1B9E77", ...)
object |
A fitted cross-validation result of class |
line_color |
Color for the CV performance curve. Default |
baseline_color |
Color for the horizontal reference line and point.
Default |
... |
Additional arguments (currently ignored). |
The function reads the performance metric from the object:
For "cv.coxkl": uses object$internal_stat (one row per eta).
For "cv.coxkl_ridge" and "cv.coxkl_enet":
uses object$integrated_stat.best_per_eta (best lambda per eta).
The y-axis label is set to “Loss” if criteria in the object is
“V&VH” or “LinPred”; otherwise it is “C Index”.
The horizontal reference (“baseline”) is taken from the plotted series at
eta = 0 (or the nearest eta present in the results).
A ggplot object showing cross-validation performance versus eta.
cv.coxkl, cv.coxkl_ridge, cv.coxkl_enet
data(Exampledata_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good etas <- generate_eta(method = "exponential", n = 100, max_eta = 30) cv_res <- cv.coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratrum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = etas, nfolds = 5, criteria = c("V&VH"), seed = 1) cv.plot(cv_res)data(Exampledata_lowdim) train_dat_lowdim <- ExampleData_lowdim$train beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good etas <- generate_eta(method = "exponential", n = 100, max_eta = 30) cv_res <- cv.coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratrum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = etas, nfolds = 5, criteria = c("V&VH"), seed = 1) cv.plot(cv_res)
A simulated survival dataset in a high-dimensional linear setting with 50 covariates (6 signals + 44 AR(1) noise), Weibull baseline hazard, and controlled censoring. Includes internal train/test sets, and an external-data–estimated coefficient vector.
data(ExampleData_highdim)data(ExampleData_highdim)
A list containing the following elements:
A list with components:
Data frame of size with covariates Z1–Z50.
Vector of event indicators (1=event, 0=censored).
Numeric vector of observed times .
Vector of stratum labels (here all 1).
A list with the same structure as train, with size for z.
Numeric vector (length 50, named Z1–Z50) of Cox coefficients
estimated on an external dataset using only Z1–Z6 and expanded to length 50
(zeros for Z7–Z50).
Data-generating mechanism:
Covariates: 50 variables with signals Z1–Z6 and noise Z7–Z50.
Z1, Z2 ~ bivariate normal with AR(1) correlation .
Z3, Z4 ~ independent Bernoulli(0.5).
Z5 ~ , Z6 ~ (group indicator fixed at 1).
Z7–Z50 ~ multivariate normal with AR(1) correlation .
True coefficients: (length 50).
Event times: Weibull baseline hazard
with , .
Given linear predictor , draw and set
Censoring: with ub tuned iteratively to
achieve the target censoring rate (internal: 0.70; external: 0.50).
Observed time is , status is .
External coefficients: Fit a Cox model
Surv(time, status) ~ Z1 + ... + Z6 on the external data (Breslow ties),
then place the estimated coefficients into a length-50 vector (zeros elsewhere).
data(ExampleData_highdim) head(ExampleData_highdim$train$z) table(ExampleData_highdim$train$status) summary(ExampleData_highdim$train$time) head(ExampleData_highdim$test$z) table(ExampleData_highdim$test$status) summary(ExampleData_highdim$test$time)data(ExampleData_highdim) head(ExampleData_highdim$train$z) table(ExampleData_highdim$train$status) summary(ExampleData_highdim$train$time) head(ExampleData_highdim$test$z) table(ExampleData_highdim$test$status) summary(ExampleData_highdim$test$time)
A simulated survival dataset in a low-dimensional linear setting with 6 covariates (2 correlated continuous, 2 binary, 2 mean-shifted normals), Weibull baseline hazard, and controlled censoring. Includes internal train/test sets, and three external-quality coefficient vectors.
data(ExampleData_lowdim)data(ExampleData_lowdim)
A list containing the following elements:
A list with components:
Data frame of size with covariates Z1–Z6.
Vector of event indicators (1=event, 0=censored).
Numeric vector of observed times .
Vector of stratum labels (here all 1).
A list with the same structure as train, with size for z.
Numeric vector (length 6; named Z1–Z6) of Cox coefficients estimated on a
"Good" external dataset using all Z1–Z6.
Numeric vector (length 6; names Z1–Z6) of Cox coefficients estimated on a
"Fair" external dataset using a reduced subset Z1, Z3, Z5, Z6;
coefficients for variables not used are 0.
Numeric vector (length 6; names Z1–Z6) of Cox coefficients estimated on a
"Poor" external dataset using Z1 and Z5 only; remaining entries are 0.
Data-generating mechanism:
Covariates: 6 variables Z1–Z6.
Z1, Z2 ~ bivariate normal with AR(1) correlation .
Z3, Z4 ~ independent Bernoulli(0.5).
Z5 ~ , Z6 ~ (group indicator fixed at 1 for internal train/test).
True coefficients: (length 6).
Event times: Weibull baseline hazard
with , .
Given linear predictor , draw and set
Censoring: with ub tuned iteratively to
achieve the target censoring rate (internal: 0.70; external: 0.50).
Observed time is , status is .
External coefficients: For each quality level ("Good", "Fair", "Poor"), fit a Cox model
Surv(time, status) ~ Z1 + ... on the corresponding external data (Breslow ties)
using the specified covariate subset; place estimates into a length-6 vector named Z1–Z6
with zeros for variables not included.
data(ExampleData_lowdim) head(ExampleData_lowdim$train$z) table(ExampleData_lowdim$train$status) summary(ExampleData_lowdim$train$time) head(ExampleData_lowdim$test$z) table(ExampleData_lowdim$test$status) summary(ExampleData_lowdim$test$time)data(ExampleData_lowdim) head(ExampleData_lowdim$train$z) table(ExampleData_lowdim$train$status) summary(ExampleData_lowdim$train$time) head(ExampleData_lowdim$test$z) table(ExampleData_lowdim$test$status) summary(ExampleData_lowdim$test$time)
Produces a numeric vector of eta values to be used in Cox–KL model.
generate_eta(method = "exponential", n = 10, max_eta = 5, min_eta = 0)generate_eta(method = "exponential", n = 10, max_eta = 5, min_eta = 0)
method |
Character string selecting how to generate |
n |
Integer, the number of |
max_eta |
Numeric, the maximum value of |
min_eta |
Numeric, the minimum value of |
Exponential: values are formed by exponentiating a grid from
log(1) to log(100), then linearly rescaling to the interval
[0, max_eta]. Thus the smallest value equals 0 and the largest
equals max_eta.
Linear: the current implementation calls
seq(min_eta, max_eta, length.out = n) and therefore assumes a
numeric object min_eta exists in the calling environment.
Only the exact strings “linear” and “exponential” are supported;
other values for method will result in an error because eta_values
is never created.
Numeric vector of length n containing the generated eta values.
# Generate 10 exponentially spaced eta values up to 5 generate_eta(method = "exponential", n = 10, max_eta = 5) # Generate 5 linearly spaced eta values up to 3 generate_eta(method = "linear", n = 5, max_eta = 3)# Generate 10 exponentially spaced eta values up to 5 generate_eta(method = "exponential", n = 10, max_eta = 5) # Generate 5 linearly spaced eta values up to 3 generate_eta(method = "linear", n = 5, max_eta = 3)
Computes the stratified Cox partial log-likelihood for given covariates, event indicators, times, and coefficients.
loss_fn(z, delta, time, stratum, beta)loss_fn(z, delta, time, stratum, beta)
z |
A numeric matrix (or data frame coercible to matrix) of covariates. Each row is an observation and each column a predictor. |
delta |
A numeric vector of event indicators (1 = event, 0 = censored). |
time |
A numeric vector of observed times (event or censoring). |
stratum |
An optional vector specifying the stratum for each observation (factor/character/numeric). If missing, a single-stratum model is assumed. |
beta |
A numeric vector of regression coefficients with length equal to
the number of columns in |
Inputs are internally sorted by stratum and time. The function
evaluates the stratified Cox partial log-likelihood using the supplied z,
delta, beta, and the stratum sizes.
A single numeric value giving the stratified Cox partial log-likelihood.
coxkl
Plots model performance across the eta sequence. Performance is either
loss (-2 times partial log-likelihood) or concordance index (C-index).
If no test data are provided, the curve is computed on the training data stored
in x$data.
## S3 method for class 'coxkl' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )## S3 method for class 'coxkl' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )
x |
A fitted model object of class |
test_z |
Optional numeric matrix of test covariates. |
test_time |
Optional numeric vector of test survival times. |
test_delta |
Optional numeric vector of test event indicators. |
test_stratum |
Optional vector of test stratum membership. |
criteria |
Character string: |
... |
Additional arguments (ignored). |
When criteria = "loss" and no test data are supplied, the plotted values are
(-2 * x$likelihood) / n, where n is the number of rows in the
(training) data. When test data are provided, performance is computed via
test_eval(..., criteria = "loss") and divided by the test sample size.
For criteria = "CIndex", performance is computed via
test_eval(..., criteria = "CIndex") on the chosen dataset. The plot adds a
dotted horizontal reference line at the value corresponding to eta = 0
(closest point on the eta grid).
A ggplot object showing the performance curve.
data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train test_dat_lowdim <- ExampleData_lowdim$test beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_grid <- generate_eta(method = "exponential", n = 100, max_eta = 30) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_grid) plot(model, test_z = test_dat_lowdim$z, test_time = test_dat_lowdim$time, test_delta = test_dat_lowdim$status, test_stratum = test_dat_lowdim$stratum, criteria = "loss")data(ExampleData_lowdim) train_dat_lowdim <- ExampleData_lowdim$train test_dat_lowdim <- ExampleData_lowdim$test beta_external_good_lowdim <- ExampleData_lowdim$beta_external_good eta_grid <- generate_eta(method = "exponential", n = 100, max_eta = 30) model <- coxkl(z = train_dat_lowdim$z, delta = train_dat_lowdim$status, time = train_dat_lowdim$time, stratum = train_dat_lowdim$stratum, beta = beta_external_good_lowdim, etas = eta_grid) plot(model, test_z = test_dat_lowdim$z, test_time = test_dat_lowdim$time, test_delta = test_dat_lowdim$status, test_stratum = test_dat_lowdim$stratum, criteria = "loss")
coxkl_enet
Plots model performance across the lambda sequence. Performance is
loss (-2 times partial log-likelihood) or concordance index (C-index).
If no test data are provided, the curve uses the training data stored in x$data.
## S3 method for class 'coxkl_enet' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )## S3 method for class 'coxkl_enet' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )
x |
A fitted model object of class |
test_z |
Optional numeric matrix of test covariates. |
test_time |
Optional numeric vector of test survival times. |
test_delta |
Optional numeric vector of test event indicators. |
test_stratum |
Optional vector of test stratum membership. |
criteria |
Character string: |
... |
Additional arguments (ignored). |
When criteria = "loss" and no test data are supplied, the plotted values are
-2 * x$likelihood (no normalization). When test data are provided,
performance is computed via test_eval(..., criteria). The x-axis is shown
in decreasing lambda with a reversed log10 scale.
A ggplot object showing the performance curve.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train test_dat_highdim <- ExampleData_highdim$test beta_external_highdim <- ExampleData_highdim$beta_external model_enet <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1, alpha = 1.0) plot(model_enet, test_z = test_dat_highdim$z, test_time = test_dat_highdim$time, test_delta = test_dat_highdim$status, test_stratum = test_dat_highdim$stratum, criteria = "loss")data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train test_dat_highdim <- ExampleData_highdim$test beta_external_highdim <- ExampleData_highdim$beta_external model_enet <- coxkl_enet(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1, alpha = 1.0) plot(model_enet, test_z = test_dat_highdim$z, test_time = test_dat_highdim$time, test_delta = test_dat_highdim$status, test_stratum = test_dat_highdim$stratum, criteria = "loss")
coxkl_ridge
Plots model performance across the lambda sequence. Performance is
loss (-2 times partial log-likelihood) or concordance index (C-index).
If no test data are provided, the curve uses the training data stored in x$data.
## S3 method for class 'coxkl_ridge' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )## S3 method for class 'coxkl_ridge' plot( x, test_z = NULL, test_time = NULL, test_delta = NULL, test_stratum = NULL, criteria = c("loss", "CIndex"), ... )
x |
A fitted model object of class |
test_z |
Optional numeric matrix of test covariates. |
test_time |
Optional numeric vector of test survival times. |
test_delta |
Optional numeric vector of test event indicators. |
test_stratum |
Optional vector of test stratum membership. |
criteria |
Character string: |
... |
Additional arguments (ignored). |
When criteria = "loss" and no test data are supplied, the plotted values are
-2 * x$likelihood (no normalization). When test data are provided,
performance is computed via test_eval(..., criteria). The x-axis is shown
in decreasing lambda with a reversed log10 scale.
A ggplot object showing the performance curve.
data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train test_dat_highdim <- ExampleData_highdim$test beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1) plot( model_ridge, test_z = test_dat_highdim$z, test_time = test_dat_highdim$time, test_delta = test_dat_highdim$status, test_stratum = test_dat_highdim$stratum, criteria = "CIndex" )data(ExampleData_highdim) train_dat_highdim <- ExampleData_highdim$train test_dat_highdim <- ExampleData_highdim$test beta_external_highdim <- ExampleData_highdim$beta_external model_ridge <- coxkl_ridge(z = train_dat_highdim$z, delta = train_dat_highdim$status, time = train_dat_highdim$time, beta = beta_external_highdim, eta = 1) plot( model_ridge, test_z = test_dat_highdim$z, test_time = test_dat_highdim$time, test_delta = test_dat_highdim$status, test_stratum = test_dat_highdim$stratum, criteria = "CIndex" )
coxkl ObjectComputes linear predictors for new data based on a fitted coxkl model.
If eta is supplied, predictions are returned for those eta values;
otherwise predictions are returned for all fitted etas. Linear interpolation
is applied if an intermediate eta value is requested.
## S3 method for class 'coxkl' predict(object, newz, eta = NULL, ...)## S3 method for class 'coxkl' predict(object, newz, eta = NULL, ...)
object |
A fitted model object of class |
newz |
A numeric matrix or data frame of new covariates (must match the dimension of the training design matrix used to fit the model). |
eta |
Optional numeric vector of |
... |
Additional arguments. |
The linear predictors are computed as as.matrix(newz) %*% beta.
A numeric matrix of linear predictors with one column per eta (sorted ascending).
Computes linear predictors for new data using a fitted coxkl_enet model.
If lambda is supplied, predictions are returned for those lambda
values; otherwise predictions are returned for all fitted lambdas. When a
requested lambda lies between fitted values, coefficients are linearly
interpolated.
## S3 method for class 'coxkl_enet' predict(object, newz, lambda = NULL, ...)## S3 method for class 'coxkl_enet' predict(object, newz, lambda = NULL, ...)
object |
A fitted model object of class |
newz |
A numeric matrix or data frame of new covariates (same columns as in training data). |
lambda |
Optional numeric value(s) specifying the regularization parameter(s)
for which to predict. If |
... |
Additional arguments. |
The linear predictors are computed as as.matrix(newz) %*% beta.
A numeric matrix of linear predictors.
Each column corresponds to one lambda, sorted in descending order.
Computes linear predictors for new data using a fitted coxkl_ridge model.
If lambda is supplied, predictions are returned for those lambda
values; otherwise predictions are returned for all fitted lambdas. When a
requested lambda lies between fitted values, coefficients are linearly
interpolated.
## S3 method for class 'coxkl_ridge' predict(object, newz, lambda = NULL, ...)## S3 method for class 'coxkl_ridge' predict(object, newz, lambda = NULL, ...)
object |
A fitted model object of class |
newz |
A numeric matrix or data frame of new covariates (same columns as in training data). |
lambda |
Optional numeric value(s) specifying the regularization parameter(s)
for which to predict. If |
... |
Additional arguments. |
The linear predictors are computed as as.matrix(newz) %*% beta.
A numeric matrix of linear predictors.
Each column corresponds to one lambda, sorted in descending order.
The support dataset tracks five response variables: hospital
death, severe functional disability, hospital costs, and time until death
and death itself. The patients are followed for up to 5.56 years. See Bhatnagar et al. (2020) for details.
data(support)data(support)
A data frame with 9,104 observations and 34 variables after imputation and the removal of response variables like hospital charges, patient ratio of costs to charges and micro-costs following Bhatnagar et al. (2020). Ordinal variables, namely functional disability and income, were also removed. Finally, Surrogate activities of daily living were removed due to sparsity. There were 6 other model scores in the data-set and they were removed; only aps and sps were kept.
stores a double representing age.
death at any time up to NDI (National Death Index) date: 12/31/1994.
0=female, 1=male.
days from study entry to discharge.
days of follow-up.
each level of dzgroup: ARF/MOSF w/Sepsis, COPD, CHF, Cirrhosis, Coma, Colon Cancer, Lung Cancer, MOSF with malignancy.
ARF/MOSF, COPD/CHF/Cirrhosis, Coma and cancer disease classes.
the number of comorbidities.
years of education of patients.
the SUPPORT coma score based on Glasgow D3.
average TISS, days 3-25.
indicates race: White, Black, Asian, Hispanic or other.
day in Hospital at Study Admit.
diabetes (Com27-28, Dx 73).
dementia (Comorbidity 6).
cancer state.
mean arterial blood pressure day 3.
white blood cell count on day 3.
heart rate day 3.
respiration rate day 3.
temperature, in Celsius, on day 3.
PaO2/(0.01*FiO2) day 3.
serum albumin day 3.
bilirubin day 3.
serum creatinine day 3.
serum sodium day 3.
serum pH (in arteries) day 3.
serum glucose day 3.
bun day 3.
urine output day 3.
adl patient day 3.
imputed adl calibrated to surrogate, if a surrogate was used for a follow up.
SUPPORT physiology score.
apache III physiology score.
Some of the original data was missing. Before imputation, there were
a total of 9,104 individuals and 47 variables. Following Bhatnagar et al. (2020), a few variables
were removed. Three response variables were removed:
hospital charges, patient ratio of costs to charges and patient
micro-costs. Hospital death was also removed as it was directly informative
of the event of interest, namely death. Additionally, functional disability and
income were removed as they are ordinal covariates. Finally, 8
covariates were removed related to the results of previous findings: SUPPORT
day 3 physiology score (sps), APACHE III day 3 physiology score
(aps), SUPPORT model 2-month survival estimate, SUPPORT model
6-month survival estimate, Physician's 2-month survival estimate for pt.,
Physician's 6-month survival estimate for pt., Patient had Do Not
Resuscitate (DNR) order, and Day of DNR order (<0 if before study). Of
these, sps and aps were added on after imputation, as they
were missing only 1 observation. First the imputation is done manually using the normal
values for physiological measures recommended by Knaus et al. (1995). Next,
a single dataset was imputed using mice with default settings. After
imputation, the covariate for surrogate activities of daily
living was not imputed. This is due to collinearity between the other two
covariates for activities of daily living. Therefore, surrogate activities
of daily living were removed. See details in the R package (casebase) by Bhatnagar et al. (2020).
Available at the following website: https://archive.ics.uci.edu/dataset/880/support2.
Bhatnagar, S., Turgeon, M., Islam, J., Hanley, J. A., and Saarela, O. (2020) casebase: Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression. R package version 0.9.0, https://CRAN.R-project.org/package=casebase.
Knaus, W. A., Harrell, F. E., Lynn, J., Goldman, L., Phillips, R. S., Connors, A. F., et al. (1995)
The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults.
Annals of Internal Medicine, 122(3): 191-203.
if (requireNamespace("survival", quietly = TRUE)) { data(support) set.seed(123) support <- support[support$ca %in% "metastatic", ] time <- support$d.time death <- support$death diabetes <- model.matrix(~ factor(support$diabetes))[, -1] # sex: female as the reference group sex <- model.matrix(~ support$sex)[, -1] # age: continuous variable age <- support$age age[support$age <= 50] <- "<50" age[support$age > 50 & support$age <= 60] <- "50-59" age[support$age > 60 & support$age < 70] <- "60-69" age[support$age >= 70] <- "70+" age <- factor(age, levels = c("60-69", "<50", "50-59", "70+")) z_age <- model.matrix(~ age)[, -1] z <- data.frame(z_age, sex, diabetes) colnames(z) <- c("age_50", "age_50_59", "age_70", "diabetes", "male") dat <- data.frame(time, death, z) n <- nrow(dat) n_ext <- floor(0.87 * n) n_int <- floor(0.03 * n) n_test <- n - n_ext - n_int idx <- sample(seq_len(n)) idx_ext <- idx[1:n_ext] idx_int <- idx[(n_ext + 1):(n_ext + n_int)] idx_test <- idx[(n_ext + n_int + 1):n] external_data <- dat[idx_ext, ] internal_data <- dat[idx_int, ] test_data <- dat[idx_test, ] ext_cox <- survival::coxph( survival::Surv(time, death) ~ age_50 + age_50_59 + age_70 + diabetes + male, data = external_data ) beta_external <- coef(ext_cox) result1 <- cv.coxkl( z = internal_data[, c("age_50", "age_50_59", "age_70", "diabetes", "male")], delta = internal_data$death, time = internal_data$time, beta = beta_external, stratum = NULL, etas = generate_eta(method = "exponential", n = 50, max_eta = 50) ) cv.plot(result1) }if (requireNamespace("survival", quietly = TRUE)) { data(support) set.seed(123) support <- support[support$ca %in% "metastatic", ] time <- support$d.time death <- support$death diabetes <- model.matrix(~ factor(support$diabetes))[, -1] # sex: female as the reference group sex <- model.matrix(~ support$sex)[, -1] # age: continuous variable age <- support$age age[support$age <= 50] <- "<50" age[support$age > 50 & support$age <= 60] <- "50-59" age[support$age > 60 & support$age < 70] <- "60-69" age[support$age >= 70] <- "70+" age <- factor(age, levels = c("60-69", "<50", "50-59", "70+")) z_age <- model.matrix(~ age)[, -1] z <- data.frame(z_age, sex, diabetes) colnames(z) <- c("age_50", "age_50_59", "age_70", "diabetes", "male") dat <- data.frame(time, death, z) n <- nrow(dat) n_ext <- floor(0.87 * n) n_int <- floor(0.03 * n) n_test <- n - n_ext - n_int idx <- sample(seq_len(n)) idx_ext <- idx[1:n_ext] idx_int <- idx[(n_ext + 1):(n_ext + n_int)] idx_test <- idx[(n_ext + n_int + 1):n] external_data <- dat[idx_ext, ] internal_data <- dat[idx_int, ] test_data <- dat[idx_test, ] ext_cox <- survival::coxph( survival::Surv(time, death) ~ age_50 + age_50_59 + age_70 + diabetes + male, data = external_data ) beta_external <- coef(ext_cox) result1 <- cv.coxkl( z = internal_data[, c("age_50", "age_50_59", "age_70", "diabetes", "male")], delta = internal_data$death, time = internal_data$time, beta = beta_external, stratum = NULL, etas = generate_eta(method = "exponential", n = 50, max_eta = 50) ) cv.plot(result1) }
Evaluates model performance on a test dataset using either the log-partial-likelihood loss or the concordance index (C-index).
This function accepts either:
test_z and betahat, which will be multiplied to obtain risk scores; or
test_RS, a pre-computed numeric vector of risk scores.
test_eval( test_z = NULL, test_RS = NULL, test_delta, test_time, test_stratum = NULL, betahat = NULL, criteria = c("loss", "CIndex") )test_eval( test_z = NULL, test_RS = NULL, test_delta, test_time, test_stratum = NULL, betahat = NULL, criteria = c("loss", "CIndex") )
test_z |
Optional numeric matrix or data frame of covariates for the test dataset.
Required if |
test_RS |
Optional numeric vector of pre-computed risk scores (e.g., linear predictors).
If provided, |
test_delta |
Numeric vector of event indicators (1 = event, 0 = censored). |
test_time |
Numeric vector of survival times for the test dataset. |
test_stratum |
Optional vector indicating stratum membership for each test observation.
If |
betahat |
Optional numeric vector of estimated regression coefficients.
Required if |
criteria |
Character string specifying the evaluation criterion; one of:
|
Prior to evaluation, observations are sorted by (stratum, time) to ensure correct
risk-set construction. For stratified C-index computation, the provided test_stratum
is used; otherwise all test data are treated as a single stratum.
You may supply either covariates and coefficients (test_z with betahat)
or a precomputed risk score vector (test_RS). When test_RS is provided,
test_z and betahat are ignored.
A numeric value representing either:
if criteria = "loss": the negative twice log–partial-likelihood on the test data.
if criteria = "CIndex": the concordance index on the test data.