Pooling and Selection of Linear Regression Models

Martijn W Heymans

2023年06月16日

Introduction

With the psfmi_lm function you can pool Linear regression models by using
the following pooling methods: RR (Rubin’s Rules), D1, D2 and MPR (Median R Rule).

You can also use forward or backward selection from the pooled model.

This vignette show you examples of how to apply these procedures.

Examples

Pooling without BS and method D1

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 Function + JobControl + JobDemands + SocialSupport, 
 method="D1")
 
 pool_lm$RR_model
 #> $`Step 1 - no variables removed -`
 #> term estimate std.error statistic df p.value
 #> 1 (Intercept) 7.626750501 2.37470136 3.21166721 103.21605 0.001760151
 #> 2 Gender -0.549897436 0.41763180 -1.31670395 97.10997 0.191036859
 #> 3 Smoking -0.184822738 0.35459284 -0.52122524 60.23783 0.604120893
 #> 4 Function -0.126983721 0.04264394 -2.97776686 46.48759 0.004600709
 #> 5 JobControl -0.018201443 0.01884372 -0.96591573 117.54453 0.336069460
 #> 6 JobDemands 0.015351105 0.03590006 0.42760673 121.85071 0.669692207
 #> 7 SocialSupport -0.003435975 0.05621115 -0.06112622 96.21255 0.951385488

Back to Examples

Pooling with BS and method D1

Pooling linear regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor "Smoking" in the models during backward selection.

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 Function + JobControl + JobDemands + SocialSupport, 
 keep.predictors = "Smoking", method="D1", p.crit=0.05, 
 direction="BW")
 #> Removed at Step 1 is - SocialSupport
 #> Removed at Step 2 is - JobDemands
 #> Removed at Step 3 is - JobControl
 #> Removed at Step 4 is - Gender
 #> 
 #> Selection correctly terminated, 
 #> No more variables removed from the model
 
 pool_lm$RR_model_final
 #> $`Step 5`
 #> term estimate std.error statistic df p.value
 #> 1 (Intercept) 6.7504947 0.47607990 14.1793314 78.48419 2.368256e-23
 #> 2 Smoking -0.1998222 0.35556369 -0.5619871 57.20990 5.763201e-01
 #> 3 Function -0.1403048 0.04077998 -3.4405314 51.97198 1.153144e-03
 pool_lm$multiparm_final
 #> $`Step 5`
 #> p-values D1 F-statistic
 #> Smoking 0.5753099186 0.3158295
 #> Function 0.0008794238 11.8372561
 pool_lm$predictors_out
 #> Gender Smoking Function JobControl JobDemands SocialSupport
 #> Step 1 0 0 0 0 0 1
 #> Step 2 0 0 0 0 1 0
 #> Step 3 0 0 0 1 0 0
 #> Step 4 1 0 0 0 0 0
 #> Removed 1 0 0 1 1 1

Back to Examples

Pooling with BS and method MPR

Pooling linear regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor "Smoking" in the models during backward selection.

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 Function + JobControl + JobDemands + SocialSupport, 
 keep.predictors = "Smoking", method="MPR", p.crit=0.05, 
 direction="BW")
 #> Removed at Step 1 is - SocialSupport
 #> Removed at Step 2 is - JobDemands
 #> Removed at Step 3 is - JobControl
 #> Removed at Step 4 is - Gender
 #> 
 #> Selection correctly terminated, 
 #> No more variables removed from the model
 
 pool_lm$RR_model_final
 #> $`Step 5`
 #> term estimate std.error statistic df p.value
 #> 1 (Intercept) 6.7504947 0.47607990 14.1793314 78.48419 2.368256e-23
 #> 2 Smoking -0.1998222 0.35556369 -0.5619871 57.20990 5.763201e-01
 #> 3 Function -0.1403048 0.04077998 -3.4405314 51.97198 1.153144e-03
 pool_lm$multiparm_final
 #> $`Step 5`
 #> p-value MPR
 #> Smoking 0.6019832504
 #> Function 0.0001268997
 pool_lm$predictors_out 
 #> Gender Smoking Function JobControl JobDemands SocialSupport
 #> Step 1 0 0 0 0 0 1
 #> Step 2 0 0 0 0 1 0
 #> Step 3 0 0 0 1 0 0
 #> Step 4 1 0 0 0 0 0
 #> Removed 1 0 0 1 1 1

Back to Examples

Pooling with BS including interaction terms and method D2

Pooling linear regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D2. Several interaction terms, including a categorical predictor, are part of the selection procedure.

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 Function + JobControl + factor(Carrying) + 
 factor(Satisfaction) +
 factor(Carrying):Smoking + Gender:Smoking, 
 method="D2", p.crit=0.05, 
 direction="BW")
 #> Removed at Step 1 is - Function
 #> Removed at Step 2 is - Gender*Smoking
 #> Removed at Step 3 is - Smoking*factor(Carrying)
 #> Removed at Step 4 is - Smoking
 #> Removed at Step 5 is - JobControl
 #> Removed at Step 6 is - Gender
 #> 
 #> Selection correctly terminated, 
 #> No more variables removed from the model
 
 pool_lm$RR_model_final
 #> $`Step 7`
 #> term estimate std.error statistic df p.value
 #> 1 (Intercept) 3.8156476 0.3621860 10.5350490 129.49014 4.010553e-19
 #> 2 factor(Carrying)2 0.8759161 0.3761904 2.3283850 113.24274 2.166656e-02
 #> 3 factor(Carrying)3 1.8001704 0.3746799 4.8045553 145.87249 3.811778e-06
 #> 4 factor(Satisfaction)2 0.1385358 0.3729809 0.3714288 108.41273 7.110431e-01
 #> 5 factor(Satisfaction)3 1.4420012 0.4685986 3.0772635 74.55846 2.921715e-03
 pool_lm$multiparm_final
 #> $`Step 7`
 #> p-values D2 F-statistic
 #> factor(Carrying) 1.653888e-05 11.150999
 #> factor(Satisfaction) 7.789587e-03 5.372204
 pool_lm$predictors_out 
 #> Gender Smoking Function JobControl factor(Carrying)
 #> Step 1 0 0 1 0 0
 #> Step 2 0 0 0 0 0
 #> Step 3 0 0 0 0 0
 #> Step 4 0 1 0 0 0
 #> Step 5 0 0 0 1 0
 #> Step 6 1 0 0 0 0
 #> Removed 1 1 1 1 0
 #> factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
 #> Step 1 0 0 0
 #> Step 2 0 0 1
 #> Step 3 0 1 0
 #> Step 4 0 0 0
 #> Step 5 0 0 0
 #> Step 6 0 0 0
 #> Removed 0 1 1

Back to Examples

Pooling with BS and forcing interaction terms and method D1

Same as above but now forcing several predictors, including interaction terms, in the model during BS.

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 Function + JobControl + factor(Carrying) + factor(Satisfaction) +
 factor(Carrying):Smoking + Gender:Smoking, 
 keep.predictors = c("Smoking*Carrying", "JobControl"), method="D1", 
 p.crit=0.05, direction="BW")
 #> Removed at Step 1 is - Function
 #> Removed at Step 2 is - Gender*Smoking
 #> Removed at Step 3 is - Gender
 #> 
 #> Selection correctly terminated, 
 #> No more variables removed from the model
 
 pool_lm$RR_model_final
 #> $`Step 4`
 #> term estimate std.error statistic df
 #> 1 (Intercept) 5.05673749 1.11537162 4.5336796 87.35469
 #> 2 Smoking -0.75879295 0.59455328 -1.2762405 50.60796
 #> 3 JobControl -0.01558801 0.01737846 -0.8969733 87.93658
 #> 4 factor(Carrying)2 0.51735642 0.51915658 0.9965325 132.99359
 #> 5 factor(Carrying)3 1.31863192 0.50113424 2.6312948 126.77358
 #> 6 factor(Satisfaction)2 0.11077123 0.37320587 0.2968100 117.98206
 #> 7 factor(Satisfaction)3 1.44590689 0.48154484 3.0026423 64.65768
 #> 8 Smoking:factor(Carrying)2 0.81312389 0.77812973 1.0449721 87.32029
 #> 9 Smoking:factor(Carrying)3 1.13073244 0.79050622 1.4303903 104.46161
 #> p.value
 #> 1 1.832877e-05
 #> 2 2.076965e-01
 #> 3 3.721823e-01
 #> 4 3.208012e-01
 #> 5 9.561386e-03
 #> 6 7.671335e-01
 #> 7 3.802284e-03
 #> 8 2.989200e-01
 #> 9 1.555895e-01
 pool_lm$multiparm_final
 #> $`Step 4`
 #> p-values D1 F-statistic
 #> Smoking 0.5399398352 0.7214117
 #> JobControl 0.3705279273 0.8045610
 #> factor(Carrying) 0.0001017566 5.9155402
 #> factor(Satisfaction) 0.0025183119 6.2392581
 #> Smoking*factor(Carrying) 0.3318368885 1.1068101
 pool_lm$predictors_out 
 #> Gender Smoking Function JobControl factor(Carrying)
 #> Step 1 0 0 1 0 0
 #> Step 2 0 0 0 0 0
 #> Step 3 1 0 0 0 0
 #> Removed 1 0 1 0 0
 #> factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
 #> Step 1 0 0 0
 #> Step 2 0 0 1
 #> Step 3 0 0 0
 #> Removed 0 0 1

Back to Examples

Pooling with BS including spline coefficient and method D1

Pooling linear regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D1. A spline predictor and interaction term are part of the selection procedure.

 
 library(psfmi)
 pool_lm <- psfmi_lm(data=lbpmilr, nimp=5, impvar="Impnr", 
 formula = Pain ~ Gender + Smoking + 
 JobControl + factor(Carrying) + factor(Satisfaction) +
 factor(Carrying):Smoking + rcs(Function, 3), 
 method="D1", 
 p.crit=0.05, direction="BW")
 #> Removed at Step 1 is - rcs(Function,3)
 #> Removed at Step 2 is - Smoking*factor(Carrying)
 #> Removed at Step 3 is - Smoking
 #> Removed at Step 4 is - JobControl
 #> Removed at Step 5 is - Gender
 #> 
 #> Selection correctly terminated, 
 #> No more variables removed from the model
 
 pool_lm$RR_model_final
 #> $`Step 6`
 #> term estimate std.error statistic df p.value
 #> 1 (Intercept) 3.8156476 0.3621860 10.5350490 129.49014 4.010553e-19
 #> 2 factor(Carrying)2 0.8759161 0.3761904 2.3283850 113.24274 2.166656e-02
 #> 3 factor(Carrying)3 1.8001704 0.3746799 4.8045553 145.87249 3.811778e-06
 #> 4 factor(Satisfaction)2 0.1385358 0.3729809 0.3714288 108.41273 7.110431e-01
 #> 5 factor(Satisfaction)3 1.4420012 0.4685986 3.0772635 74.55846 2.921715e-03
 pool_lm$multiparm_final
 #> $`Step 6`
 #> p-values D1 F-statistic
 #> factor(Carrying) 1.752967e-05 11.125118
 #> factor(Satisfaction) 2.477744e-03 6.275617
 pool_lm$predictors_out 
 #> Gender Smoking JobControl factor(Carrying) factor(Satisfaction)
 #> Step 1 0 0 0 0 0
 #> Step 2 0 0 0 0 0
 #> Step 3 0 1 0 0 0
 #> Step 4 0 0 1 0 0
 #> Step 5 1 0 0 0 0
 #> Removed 1 1 1 0 0
 #> rcs(Function,3) Smoking*factor(Carrying)
 #> Step 1 1 0
 #> Step 2 0 1
 #> Step 3 0 0
 #> Step 4 0 0
 #> Step 5 0 0
 #> Removed 1 1

Back to Examples

AltStyle によって変換されたページ (->オリジナル) /