Presentation
This program is designed to compute smoothed hazard and survival functions using a
non-parametric method based on the penalized likelihood. The optimum smoothing parameter
is estimated and confidence intervals of the hazard and survival functions are given.
Explanatory variables can be included in a proportional hazards model. This program
handles right, left and interval censored data and left truncated data.
To obtain a non-parametric hazard function estimator a possible means is to penalize
the likelihood by a term which takes large values for rough functions. The loglikelihood
is penalized by the square norm of the second derivative of the hazard function.
The estimator is defined non-parametrically as the function which maximizes the penalized
likelihood. The solution is then approximated on a basis of splines. The smoothing
parameter controls the balance between the fit to the data and the smoothness of the
function. An approximation of the method of cross-validation gives a solution to the
problem of automatic choice of the smoothing parameter. In our approximation we
use cubic M-splines. M-splines are piecewise polynomial functions which are combined
linearly to approximate a function on an interval. The interval is defined by knots. The
penalized likelihood can be applied for a proportional hazards model to estimate risk
factors. Time varying covariates can not yet be treated.
|
Compilation
The computer program was written in FORTRAN 77. The file phmpl.f has to be compiled
(for MS DOS version, an executable program is given).
Use
of the program
- Built the file phmpl.inf that contains the parameters of the
analysis (see the description
below).
- Built an ASCII file <data> that contains the data (see the description
below). The name of the data file has to be specified in the parameter file phmpl.inf.
- Run the program phmpl (the files phmpl, phmpl.inf and the <data>
file have to be in the same directory)
Results
According to the analysis, the results are given in the following files:
- regr.res: contains the coefficient estimates of the explanatory variables. This file is
not created if explanatory variables are not included in the model.
- <fic1.ext>: contains the coordinates to plot the survival function and its
confidence intervals. The name of this file has to be given by the user.
- <fic2.ext>: contains the coordinates to plot the hazard function and its
confidence intervals. The name of this file has to be given by the user.
Parameter file phmpl.inf
This file contains the parameters of the analysis.
There are the following lines:
- On the first line there is the name of the data file
<data>
- On the second line, give the number of subjects in the data file (that is
the number of lines of the data file). This number is an integer (maximum 5000)
- On the third line, give the number of explanatory variables
included in the data file. It is an integer between 0 and 25.
- On the fourth and subsequent lines, give the names of the
explanatory variables (10 characters max) and a pointer that indicates if this
variable is included in the proportional hazards model:
1 = variable included, 0 = variable not included.
- On the next line, give the number of knots. It is an integer
that varies from 5 to 25.
- On the next line, indicate if the search of the smoothing
parameter is automatic or fixed:
0 = automatic, 1 = fixed.
- On the next line, give the initial value of the smoothing
parameter. This number is strictly positive. (To be completed even if the search
is automatic).
- On the next line, indicate if the coordinates of the survival
and risk functions and their confidence intervals have to be saved:
0 = results not saved, 1 = results saved.
- On the next line, give the name of the file <fic1.ext> where the
coordinates of the survival function and the confidence intervals have to be saved.
Example : surv.gr. When explanatory variables are included, the baseline survival
function is given (eg when all explanatory variables are equal to 0).
- On the last line, give the name of the file <fic2.ext> where
the coordinates of the hazard function and the confidence intervals have to be
saved. Example : hazard.gr. When explanatory variables are included, the baseline hazard
function is given (eg when all explanatory variables are equal to 0).
Example of phmpl.inf file
exdata
250
1
var1 1
12
0
1000
1
surv.gr
hazard.gr |
Comments:
the main program (phmpl) reads the data in the file exdata
that contains 250 lines (and therefore, 250 subjects). One explanatory variable
is stored in the file and is included in the proportional hazards model.
There
are 12 knots and the smoothing parameter is estimated automatically.
The initial value of the smoothing parameter is 1000.
The coordinates of the risk and their confidence intervals are saved to the file
hazard.gr and the survival function and their confidence intervals to the file surv.gr |
Data file ( <data> )
This ASCII file contains the data and has the following format:
- -
|
The first 3 columns define the times of truncation and censoring:
- The first column is the time at entry in the study (Left truncation). It is equal to 0
if there is no truncation (the subject entered at time 0).
- The second column is the left boundary of the interval in which the outcome occured.
- The third column is the right boundary of the interval in which the outcome occured
(give the value -1 if the outcome is not observed).
All values of the times are real variables.
Examples:
- If a subject did not experience the outcome at the end of the follow-up (case
of right censoring), the value in the second column will be the time of censoring, and the
value in the third column will be -1.
- If the subject is left censored, the value in the second column will be 0, and the
value in the third column will be the time of left censoring.
- If the time of outcome is known precisely, the same time will be given in the
second and third column.
|
- -
|
The next columns contain the values of the explanatory variables (maximum
25). |
- |
Remarks:
- There are as many lines in the file as subjects.
- Each variable is separated by one or several blanks/spaces.
- The 3 times that define the times of truncation and censoring can not be missing values.
The missing value in explanatory variables is coded -32768.
- If a specific variable is included in the model, subjects with a missing value in this
variable will be excluded from the analysis.
|
Output files
At the user's request, two files are created whose names have to be written in the
parameter file. phpml.inf
The first file contains the coordinates for plotting the survival function and its
confidence bands between the first and the last knots; the second contains the
coordinates for plotting the hazard function and its confidence bands. Note that if
explanatory variables are included, the functions saved are the baseline functions.
- -
|
regr.res file This file is created automatically and contains the coefficient estimates of each
explanatory variable. This file is not created if explanatory variables are not included
in the model. This file contains the value of the log-likelihood, the number of regression
parameters, the number of subjects and the number of events. For each variable, its name,
the value of the coefficient and its standard error, the value of the Wald test, the value
of the relative risk and its confidence interval are given
Caution: as this file is crushed after each execution, it is necessary to rename it
before starting again the program if one wishes to save the results.
|
- |
<fic1.ext> file The file contains 99 lines and 4 columns:
- The first column is the abscissa (time).
- The second one is the value of the survival function.
- The next ones are the values of the confidence intervals.
|
- |
<fic2.ext>
file It contains 99 lines and 4 columns:
- The first column is the abscissa (time).
- The second one is the value of the hazard function.
- The next ones are the values of the confidence intervals.
|
|
References
Joly, P, Letenneur, L, Alioum, A, Commenges, D.
PHMPL: a computer program for hazard estimation using a penalized likelihood method with
interval-censored and left-truncated data. Computer Methods and Programs in Biomedicine, 1999
Joly, P, Commenges, D, Letenneur, L. A penalized likelihood approach for arbitrarily censored et truncated data: application to
age-specific incidence of dementia. Biometrics, 1998, 54 : 185-94.
Commenges, D, Letenneur, L, Joly, P, Alioum, A, Dartigues, JF.
Modelling age-specific risk: application to dementia. Stat Med, 1998, 17 : 1973-88.
Author
Pierre Joly
Ahmadou Alioum
Daniel Commenges
Luc Letenneur
Inserm U897 146 rue Léo Saignat 33076 Bordeaux Cedex
France
Contact
E-mail:
Daniel.Commenges@isped.u-bordeaux2.fr.
We are interested in feed-back but can not guarantee support.
Licence
This program is free software; you can
redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details. |