--- title: "Methods for Transfer-learning Based Integrated Cox Models" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Methods for Transfer-learning Based Integrated Cox Models} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} editor_options: markdown: wrap: sentence --- The `survkl` package implements a transfer-learning procedure that integrates external summary information with newly collected time-to-event data under a Cox proportional hazards model. This vignette summarizes the underlying methodology: the internal Cox model, the external summary information, the partial likelihood-based Kullback--Leibler (KL) transfer-learning objective, and the regularized extension for high-dimensional data. ## Cox Proportional Hazards Model for the Target Cohort Let $D_i$ denote the death time and $C_i$ the censoring time for patient $i$, $i = 1, \ldots, n$, where $n$ is the total sample size of the target (internal) cohort. The observed survival time is $T_i = \min\{D_i, C_i\}$, and the death indicator is $\delta_i = \mathbb{I}(D_i \le C_i)$. Let $Z_i = (Z_{i1}, \ldots, Z_{ip})^\top$ be a $p$-dimensional covariate vector for the $i$-th patient. We assume that, conditional on $Z_i$, $D_i$ is independently censored by $C_i$. Consider the Cox proportional hazards model $$ \lambda(t \mid Z_i) = \lambda_0(t)\,\exp\{g(Z_i, \beta)\}, $$ where $\lambda_0(t)$ is an arbitrarily unspecified baseline hazard function, $g(Z_i, \beta)$ specifies the log-relative-risk relationship between the covariates $Z_i$ and the hazard function, and $\beta \in \mathbb{R}^p$ is a vector of regression parameters. Under the standard linear specification, $g(Z_i, \beta) = Z_i^\top \beta$. The log-partial likelihood is given by $$ \ell(\beta) = \sum_{i=1}^{n} \delta_i \left[ g(Z_i, \beta) - \log\left\{ \sum_{l=1}^{n} Y_l(T_i)\,\exp\{g(Z_l, \beta)\} \right\} \right], $$ where $Y_l(T_i) = \mathbb{I}(T_l \ge T_i)$ is the at-risk indicator. ## External Summary Information To account for privacy constraints, we consider scenarios where only external summary information is available, rather than individual-level external data. For example, suppose the estimated coefficients $\tilde{\beta}$ are available from a published Cox model; a risk score can then be computed as $\tilde{g}(Z_i) = Z_i^\top \tilde{\beta}$ for the $i$-th subject in the target cohort. The proposed transfer-learning procedure is flexible and can incorporate various forms of external summary information, including estimated risk scores from machine-learning algorithms and clinically derived risk groupings. ## Partial Likelihood-Based Transfer Learning To extract information from external risk scores, we formulate the censored time-to-event data as a dynamic ranking problem. Specifically, suppose the internal cohort comprises $K$ unique failure times $t_1 < \cdots < t_K$. Let $A_k$ specify that individual $k$ fails in $[t_k, t_k + dt_k)$, and let $B_k$ specify all the censoring and failure information up to time $t_k^{-}$, together with the information that one failure occurs in $[t_k, t_k + dt_k)$. Based on the external risk scores, the conditional density of $A_k$ given $B_k$ is $$ \tilde{f}(A_k \mid B_k) = \frac{\tilde{\lambda}_0(t_k)\,\exp\{\tilde{g}(Z_k)\}\,dt_k} {\sum_{i=1}^{n} Y_i(t_k)\,\tilde{\lambda}_0(t_k)\,\exp\{\tilde{g}(Z_i)\}\,dt_k} = \frac{\exp\{\tilde{g}(Z_k)\}} {\sum_{i=1}^{n} Y_i(t_k)\,\exp\{\tilde{g}(Z_i)\}}, $$ where the second equality follows from canceling $\tilde{\lambda}_0(t_k)\,dt_k$ in the numerator and denominator. Following Wang et al. (2023), the partial likelihood-based KL divergence between the conditional densities corresponding to the external risk scores and the internal Cox model, contained in $A_k \mid B_k$, is given by $$ d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k) = \mathbb{E}_{\tilde{f}} \left[ \log\left\{ \frac{\tilde{f}(A_k \mid B_k)}{f(A_k \mid B_k)} \right\} \right], $$ where the expectation is taken with respect to the external conditional density $\tilde{f}(A_k \mid B_k)$, and $f(A_k \mid B_k)$ is the conditional density based on the internal Cox model, $$ f(A_k \mid B_k) = \frac{\exp\{g(Z_k, \beta)\}} {\sum_{i=1}^{n} Y_i(t_k)\,\exp\{g(Z_i, \beta)\}}. $$ When $\tilde{g}(Z_k)$ is generated from clinically derived risk groupings, $\tilde{f}(A_k \mid B_k)$ does not represent a formal conditional density; instead, it can be viewed as a Plackett--Luce ranking metric, and $d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k)$ can be interpreted as a generalized KL divergence. The accumulated KL divergence across the sequence of conditional experiments $A_1 \mid B_1, \ldots, A_K \mid B_K$ is $$ D_{\mathrm{KL}}(\tilde{f} \parallel f) = \sum_{k=1}^{K} d_{\mathrm{KL}}(\tilde{f} \parallel f;\, t_k), $$ which measures the discrepancy between the external risk scores and the internal Cox model. To integrate external information while accounting for potential disparities, we combine the internal log-partial likelihood with the accumulated KL divergence by constructing the penalized objective function $$ \ell_{\eta}(\beta) = \ell(\beta) - \eta\, D_{\mathrm{KL}}(\tilde{f} \parallel f), $$ where $\eta \ge 0$ is a tuning parameter that controls the trade-off between the internal model and the external risk scores. Setting $\eta = 0$ recovers the internal-only Cox fit, whereas larger values of $\eta$ place more weight on the external information.
**Equivalent weighted form.** Substituting the Cox-model expressions and noting that the unique failure times $t_1 < \cdots < t_K$ coincide with the observed internal event times, the integrated objective admits the equivalent weighted partial-likelihood form $$ \ell_{\eta}(\beta) \;\propto\; \sum_{i=1}^{n} \left\{ \frac{\delta_i + \eta\, \tilde{\delta}_i}{1 + \eta}\, g(Z_i, \beta) - \delta_i \log\left[ \sum_{l=1}^{n} Y_l(T_i)\,\exp\{g(Z_l, \beta)\} \right] \right\}, $$ where the externally induced pseudo-event weight is defined as $$ \tilde{\delta}_i = \sum_{k=1}^{K} \frac{Y_i(t_k)\,\exp\{\tilde{g}(Z_i)\}} {\sum_{j=1}^{n} Y_j(t_k)\,\exp\{\tilde{g}(Z_j)\}}. $$ This representation shows that the external information enters the internal partial likelihood by augmenting each subject's observed event indicator $\delta_i$ with a fractional pseudo-event weight $\tilde{\delta}_i$ derived from the external risk scores, with $\eta$ governing the relative contribution of the two sources.
## Regularization for High-Dimensional Data For high-dimensional applications, where the number of covariates $p$ may be large relative to the sample size $n$, we extend the integrated objective by adding a regularization term. The resulting objective function enables simultaneous variable selection and parameter estimation: $$ \ell_{\eta, \lambda}(\beta) = \ell_{\eta}(\beta) - \lambda\, P(\beta), $$ where $P(\beta)$ is a penalty function and $\lambda \ge 0$ is a tuning parameter controlling its strength. The package supports the following choices of $P(\beta)$: - **Ridge** (Hoerl and Kennard, 1970): $$ P(\beta) = \tfrac{1}{2}\,\|\beta\|_2^2 = \tfrac{1}{2}\sum_{j=1}^{p} \beta_j^2, $$ which shrinks coefficients toward zero and stabilizes estimation under collinearity. - **LASSO** (Tibshirani, 1997): $$ P(\beta) = \|\beta\|_1 = \sum_{j=1}^{p} |\beta_j|, $$ which produces sparse solutions by setting some coefficients exactly to zero. - **Elastic Net** (Simon et al., 2011): $$ P(\beta) = \alpha\,\|\beta\|_1 + \tfrac{1}{2}(1 - \alpha)\,\|\beta\|_2^2 = \sum_{j=1}^{p}\left[ \alpha\,|\beta_j| + \tfrac{1}{2}(1 - \alpha)\,\beta_j^2 \right], $$ where $\alpha \in [0, 1]$ is a mixing parameter that blends the LASSO and ridge penalties; $\alpha = 1$ reduces to the LASSO and $\alpha = 0$ to ridge. In `survkl`, ridge-penalized estimation is provided by `coxkl_ridge`, while the elastic-net family (including the LASSO as the special case $\alpha = 1$) is provided by `coxkl_enet`. The companion cross-validation routines `cv.coxkl`, `cv.coxkl_ridge`, and `cv.coxkl_enet` perform $K$-fold cross-validation to select the integration weight $\eta$ and the regularization parameter $\lambda$, using Harrell's C-index for discrimination and the V&VH loss for overall model fit.