Institut de Santé Publique,
d'Épidémiologie et de Développement
 

Centre Inserm U897
Equipe Biostatistique

PHMPL
A computer program for hazard estimation using a penalized likelihood method.

Programme de calcul d'une fonction de risque et d'une fonction de survie lisses par une méthode non-paramétrique basée sur la maximisation d'une vraisemblance pénalisée.

Presentation

This program is designed to compute smoothed hazard and survival functions using a non-parametric method based on the penalized likelihood. The optimum smoothing parameter is estimated and confidence intervals of the hazard and survival functions are given. Explanatory variables can be included in a proportional hazards model. This program handles right, left and interval censored data and left truncated data.

To obtain a non-parametric hazard function estimator a possible means is to penalize the likelihood by a term which takes large values for rough functions. The loglikelihood is penalized by the square norm of the second derivative of the  hazard function.
The estimator is defined non-parametrically as the function which maximizes the penalized likelihood. The solution is then approximated on a basis of splines. The smoothing parameter controls the balance between the fit to the data and the smoothness of the function. An approximation of the method of cross-validation gives a solution to the problem of automatic choice of the  smoothing parameter. In our approximation we use  cubic M-splines. M-splines are piecewise polynomial functions which are combined linearly to approximate a function on an interval. The interval is defined by knots. The penalized likelihood can be applied for a proportional hazards model to estimate risk factors. Time varying covariates can not yet be treated.


Downloadable files

Note: each time a filename can be fixed by the user, it will be specified with the convention of writing <name-of-file>.

The following files are downloadable:

  • Unix station version : compacted file phmpl.tar.gz contains the following files:
    doc phpml.txt file of documentation (ascii text file)
    phmpl.f source file in FORTRAN
    phpml.inf example of parameter file 
    exdata example  of data file
  • MS DOS station version : compacted file PHMPL.zip contains the following files:

same as Unix version, with:

phmpl.exe executable version of FORTRAN program (MS DOS program).

Compilation

The computer program was written in FORTRAN 77. The file phmpl.f has to be compiled (for MS DOS version, an executable program is given).

Use of the program

  • Built the file phmpl.inf that contains the parameters of the analysis (see the description  below).
  • Built an ASCII file <data> that contains the data (see the description  below). The name of the data file has to be specified in the parameter file phmpl.inf.
  • Run the program phmpl (the files phmpl, phmpl.inf and the <data> file have to be in the same directory)

Results

According to the analysis, the results are given in the following files:

  • regr.res: contains the coefficient estimates of the explanatory variables. This file is not created if explanatory variables are not included in the model.
  • <fic1.ext>: contains the coordinates to plot the  survival function and its confidence intervals. The name of this file has to be given by the user.
  • <fic2.ext>: contains the coordinates to plot the hazard function and its confidence intervals. The name of this file has to be given by the user.

Parameter file phmpl.inf

This file contains the parameters of the analysis.  There are the following lines:

  • On the first line there is the name of the data file  <data> 
  • On the second line, give the number of subjects in the data file (that is the number of lines of the data file). This number is an integer (maximum 5000)
  •  On the third line, give the number of explanatory variables included in the data file. It is an integer between 0 and 25.
  • On the fourth and subsequent lines, give the names of the explanatory variables (10 characters max) and a pointer that indicates if this variable is included in the proportional hazards model: 
    1 = variable included, 0 = variable not included.
  • On the next line, give the number of knots. It is an integer that varies from 5 to 25. 
  • On the next line, indicate if the search of the smoothing parameter is automatic or fixed:
    0 = automatic, 1 = fixed.
  • On the next line, give the initial value of the smoothing parameter. This number is strictly positive. (To be completed even if the search is automatic).
  •  On the next line, indicate if the coordinates of the survival and risk functions and their confidence intervals have to be saved: 
    0 = results not saved, 1 = results saved.
  • On the next line, give the name of the file <fic1.ext> where the coordinates of the survival function and the confidence intervals have to be saved. Example : surv.gr. When explanatory variables are included, the baseline survival function is given (eg when all explanatory variables are equal to 0).
  •  On the last line, give the name of the file <fic2.ext> where the coordinates of the  hazard function and the confidence intervals have to be saved. Example : hazard.gr. When explanatory variables are included, the baseline hazard function is given (eg when all explanatory variables are equal to 0). 

Example of phmpl.inf file

exdata 
250 

var1 1 
12 

1000 

surv.gr 
hazard.gr
Comments: the main program (phmpl) reads the data in the file exdata that contains 250 lines (and therefore, 250 subjects). One explanatory variable is stored in the file and is  included in the proportional hazards model.
There are 12 knots and the smoothing parameter is estimated  automatically.
The initial value of the smoothing parameter is 1000.  
The coordinates of the risk and their confidence intervals are saved to the file hazard.gr and the survival function and their confidence intervals to the file surv.gr

Data file ( <data> )

This ASCII file contains the data and has the following format:

-
The first 3 columns define the times of truncation and censoring: 
  • The first column is the time at entry in the study (Left truncation). It is equal to 0 if there is no truncation (the subject entered at time 0). 
  • The second column is the left boundary of the interval in which the outcome occured.
  • The third column is the right boundary of the interval in which the outcome occured (give the value -1 if the outcome is not observed).

All values of the times are real variables.

Examples: 

  •  If a subject did not experience the outcome at the end of the follow-up (case of right censoring), the value in the second column will be the time of censoring, and the value in the third column will be  -1.
  • If the subject is left censored, the value in the second column will be 0, and the value in the third column will be the time of left censoring.
  • If the time of outcome is known precisely, the same time will be given in the second and third column.
-
The next columns contain the values of the explanatory variables (maximum 25).
- Remarks: 
  • There are as many lines in the file as subjects.
  • Each variable is separated by one or several blanks/spaces.
  • The 3 times that define the times of truncation and censoring can not be missing values. The missing value in explanatory variables is coded -32768.
  • If a specific variable is included in the model, subjects with a missing value in this variable will be excluded from the analysis.

Output files

At the user's request, two files are created whose names have to be written in the parameter file. phpml.inf  The first file contains the coordinates for plotting the survival function and its confidence bands  between the first and the last knots; the second contains the coordinates for plotting the hazard function and its confidence bands. Note that if explanatory variables are included, the functions saved are the baseline functions.

-
regr.res file

This file is created automatically and contains the coefficient estimates of each explanatory variable. This file is not created if explanatory variables are not included in the model. This file contains the value of the log-likelihood, the number of regression parameters, the number of subjects and the number of events. For each variable, its name, the value of the coefficient and its standard error, the value of the Wald test, the value of the relative risk and its confidence interval are given
Caution: as this file is crushed after each execution, it is necessary to rename it before starting again the program if one wishes to save the results.
 

- <fic1.ext> file

The file contains 99 lines and 4 columns:

  • The first column is the abscissa (time).
  • The second one is the value of the survival function.
  • The next ones are the values of the confidence intervals.
- <fic2.ext> file

It contains 99 lines and 4 columns:

  • The first column is the abscissa (time).
  • The second one is the value of the hazard function.
  • The next ones are the values of the confidence intervals.

 


References

Joly, P, Letenneur, L, Alioum, A, Commenges, D.
PHMPL: a computer program for hazard estimation using a penalized likelihood method with interval-censored and left-truncated data.
Computer Methods and Programs in Biomedicine, 1999

Joly, P, Commenges, D, Letenneur, L.
A penalized likelihood approach for arbitrarily censored et truncated data: application to age-specific incidence of dementia.
Biometrics, 1998, 54 : 185-94.

Commenges, D, Letenneur, L, Joly, P, Alioum, A, Dartigues, JF.
Modelling age-specific risk: application to dementia.
Stat Med, 1998, 17 : 1973-88.

Author

Pierre Joly
Ahmadou Alioum
Daniel Commenges
Luc Letenneur
Inserm U897
146 rue Léo Saignat
33076 Bordeaux Cedex
France

Contact

E-mail: Daniel.Commenges@isped.u-bordeaux2.fr.
We are interested in feed-back but can not guarantee support.

Licence

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.


Downloading:

Back to main page: