‘reslife’ Package Demo

Load in the Package

library(reslife)
#>
#> Attaching package: 'gsl'
#> The following objects are masked from 'package:pracma':
#>
#>     Ci, Si, erf, erfc, eta, expint_E1, expint_Ei, fact, hypot, psi,
#>     sinc, zeta

Demonstration of the ‘reslifefsr’ function

This function exclusively uses flexsurvreg() objects as the source of the data. We will show how to use the function below. First, we’ll start off with a simple example and show how the mean produced is equal to the integration.

We run a flexsurvreg() with a single covariate for the ‘gengamma.orig’ distribution. We then run the reslifefsr() function, using the flexsurvreg() object. We have a life of 1, which is a scalar value at which we calculate the residual life, and specify that we only want the mean.

library(flexsurv)
fs1 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc,dist = "gengamma.orig")
r <- reslifefsr(obj = fs1, life = 1, type = 'mean')
r
#> [1] 7.194142

We can verify the accuracy of our derivation using the integrate function.


b <- exp(as.numeric(fs1$coefficients[1])) a <- exp(as.numeric(fs1$coefficients[2]))
k <- exp(as.numeric(fs1$coefficients[3])) # numerical integration f <- function(x) { return(pgengamma.orig(x, shape = b, scale = a, k = k, lower.tail = FALSE)) } mx_interal <- integrate(f, 1, Inf)$value/pgengamma.orig(1, shape = b, scale = a, k = k, lower.tail = FALSE)
mx_interal
#> [1] 7.194142

As you can see, the mean values are the same.

Demonstration with multiple covariates

What if the flexsurvreg() object has multiple covariates? We will see this below. We will use the ‘bc’ dataset contained within ‘flexsurv’ with an added covariate ‘age’.

data(bc)
newbc <- bc
newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE),sd = 5)

Generating a flexsurvreg object:

fsr2 =  flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist = 'weibull')

We can now use the reslifefsr() function to generate residual life values. There are 686 values, since the length of the input is 686. We are only going to show 10, in order to save space.

head(reslifefsr(obj = fsr2, life= 4), 10)
#>  [1] 10.991160  7.445017 11.888460  8.855456  7.320557  5.101370  7.224315
#>  [8]  8.983472  4.498345 10.005028

If we want to look at the median, we can specify that.

head(reslifefsr(obj = fsr2,life= 4, type = 'median'),10)
#>  [1] 8.893677 5.931759 9.646690 7.106529 5.828371 3.995473 5.748458 7.213403
#>  [9] 3.502050 8.067468

We can look at all 3 types now (mean, median, and percentile).

head(reslifefsr(obj = fsr2, life= 4, p = .8, type = 'all'),10)
#>         mean   median percentile
#> 1  10.991160 8.893677  17.531507
#> 2   7.445017 5.931759  11.945311
#> 3  11.888460 9.646690  18.939333
#> 4   8.855456 7.106529  14.172096
#> 5   7.320557 5.828371  11.748439
#> 6   5.101370 3.995473   8.225509
#> 7   7.224315 5.748458  11.596157
#> 8   8.983472 7.213403  14.373850
#> 9   4.498345 3.502050   7.263311
#> 10 10.005028 8.067468  15.981979

Notice how when ‘all’ is used, the output is a dataframe with the name of each value as the column names.

But what if we want to supply our own data and make predictions? Since the ‘flexsurv’ regression object generates parameters, we can add our own data points and get the residual life values for each.

Let’s start by generating some data. This data has the same two levels as ‘fsr2’ above (group, age). Note that the data is placed in a data frame. The new data must be inputted as a data frame for the function to work.

group = c("Medium", 'Good', "Poor")
age = c(43, 35, 39)
newdata = data.frame(age, group)

We run that through the reslifefsr() function and get a 3 row data frame

reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#>        mean    median percentile
#> 1 12.300164  9.992529   12.53526
#> 2 31.905635 26.564496   32.84741
#> 3  7.674191  6.122261    7.74986

Now we can show off some features of the function, specifically how it handles different input data.

Lets change the order of the variables in the input data and rerun. It will show the exact same results as above.

newdata = data.frame(group, age)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#>        mean    median percentile
#> 1 12.300164  9.992529   12.53526
#> 2 31.905635 26.564496   32.84741
#> 3  7.674191  6.122261    7.74986

This is a handy feature of the function that can enhance the user experience. Another feature of this is the ability to handle data with extra columns. Let’s show that feature below.

extra = c(100, 100, 100)
newdata2 = data.frame(age, group, extra)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata2)
#>        mean    median percentile
#> 1 12.300164  9.992529   12.53526
#> 2 31.905635 26.564496   32.84741
#> 3  7.674191  6.122261    7.74986

The extra column ‘extra’ is ignored by the function.

Are there scenarios where the package would return an error? Yes, in fact, there are a few.

The first is if not enough data is provided, which is shown below.

newdata3 = data.frame(group)
reslifefsr(obj = fsr2, life = 4,p = .6, type = 'all', newdata= newdata3)
#> Warning in if (fsoutput$covdata$covnames != colnames(newdata)) {: the condition
#> has length > 1 and only the first element will be used
#>        mean    median percentile
#> 1  87.22487  73.51266   90.10934
#> 2 150.23904 127.02587  155.29444
#> 3  47.56366  39.84435   49.06227

Since only ‘group’ is provided, predictions cannot be generated. The function needs values for ‘age’ in order to make predictions.

Another scenario could happen with factor data. The levels for ‘group’ are ‘Good’, ‘Medium’, and ‘Poor’. But what if a level was inputted that isn’t one of those three? Let’s see.

group = c("Medium", 'Good', "Terrible")
newdata = data.frame(age, group)
reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata)
#> Error in weibull_mlr(obj, life, p, type, newdata): Incorrect Level Entered

Since ‘Terrible’ is not a valid level, the function errors and returns a message notifying the user of this error.

It is also possible to plot the mean residual life for a set of values from a ‘flexsurv’ object.

We take the first line from newbc and create a blank vector of length 10. We then loop the function to create the MRLs for the first 10 integers, which gives us the expected survival time given the patient has survived y years.

We can then plot these values.

newdata4 = newbc[1,]
mrl = rep(0,10)
for (i in 1:10){
mrl[i] = reslifefsr(obj = fsr2,life= i,p = .6, type = 'mean', newdata= newdata4)
}
plot(x=c(1:10), y=mrl,type="l", xlab = 'Years Survived', ylab = 'Mean Residual Life',  main = 'MRL over Time')

That is an overview of how to use the flexsurvreg() dependent function. Next, we will show how to get residual life values by inputting your own parameters.

Demonstration of the ‘residlife’ function

This function works similarly to the reslifefsr() function, except it doesn’t rely on a flexsurvreg() input. Instead, the user inputs the name of the distribution and the parameters. The distribution must be one contained within ‘flexsurv’ (‘gamma’, ‘gengamma.orig’, ‘weibull’, ‘gompertz’, ‘exponential’, ‘lnorm’, ‘llogis’, ‘gengamma’, ‘genf’, ‘genf.orig’), and the names of the parameters must match those specified for each distribution.

We will show a demonstration below.

The inputs for the function are simple. Values is the first argument, which are the values for which the residual life will be calculated. Values is usually a vector or a seq function. The next argument is the distribution name, ‘weibull’. The last argument are the parameters, inputted as a vector. The names, shape and scale, are specified in the vector.

For the examples in this vignette, we will use the output from a flexsurvreg() using the bc dataset.

fsr3 = flexsurvreg(Surv(recyrs, censrec) ~1, data=newbc, dist = 'weibull')
fsr3\$res
#>            est     L95%     U95%         se
#> shape 1.271519 1.153372 1.401770 0.06326773
#> scale 6.191377 5.604203 6.840071 0.31475730

We use a sequence of values from 1 to 10, counting up by .5, for this example.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191))
#>  [1] 5.280618 5.125825 4.994025 4.878801 4.776287 4.683907 4.599837 4.522725
#>  [9] 4.451537 4.385457 4.323832 4.266128 4.211904 4.160787 4.112464 4.066666
#> [17] 4.023161 3.981746 3.942246

What if the parameter names are incorrect? The function will return an error.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, not_scale = 6.191))
#> Error in residlife(values = seq(1, 10, 0.5), distribution = "weibull", : incorrect parameters entered. Parameters for weibull are shape and scale

For distributions where there are different parameterizations, the function can be flexible.

For example, ‘gamma’ has both shape/rate and shape/scale parameterizations. The function works for both, as long as the parameters are correctly specified in the input.

residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, scale = 6.191))
#>  [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#>  [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311
residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, rate = 1/6.191))
#>  [1] 7.498217 7.389908 7.302911 7.230487 7.168736 7.115159 7.068047 7.026172
#>  [9] 6.988622 6.954700 6.923859 6.895665 6.869766 6.845874 6.823747 6.803186
#> [17] 6.784020 6.766103 6.749311

Much like the reslifefsr() function, the reslife() function has the ability to show mean, median, percentile, or all three.

residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191), p = .7, type = 'all')
#>        mean   median percentile
#> 1  5.280618 4.151524   6.619995
#> 2  5.125825 3.988283   6.423757
#> 3  4.994025 3.851217   6.253251
#> 4  4.878801 3.733262   6.102230
#> 5  4.776287 3.629997   5.966703
#> 6  4.683907 3.538406   5.843883
#> 7  4.599837 3.456318   5.731711
#> 8  4.522725 3.382114   5.628606
#> 9  4.451537 3.314545   5.533322
#> 10 4.385457 3.252635   5.444853
#> 11 4.323832 3.195598   5.362376
#> 12 4.266128 3.142799   5.285206
#> 13 4.211904 3.093714   5.212767
#> 14 4.160787 3.047908   5.144570
#> 15 4.112464 3.005015   5.080196
#> 16 4.066666 2.964725   5.019284
#> 17 4.023161 2.926773   4.961518
#> 18 3.981746 2.890929   4.906624
#> 19 3.942246 2.856997   4.854361

The outputs of this function can then easily be taken and used in a plot or any other type of analysis tool.

Let’s look at an example of such a plot.

Say we want to plot the mean residual life over time. First, we would create a vector for time, run it through the residlife() function with type = ‘mean’, and plot the output against time.

After extracting the parameters, we can run residlife() and generate an output vector. Note that since residlife() can take a vector input, we don’t have to use a loop.

time= seq(1,10,.5)
life = residlife(values = time, distribution = 'weibull',parameters= c(shape= 1.272, scale = 6.191))
plot(time, life, xlab = 'Years Survived', ylab = 'Mean Residual Life', main = 'MRL over Time', type="l")