Comparison of propensity score (PS) weighting and trimming strategies to reduce variance and bias of treatment effect estimates: a simulation study

Sturmer T, Rothman KJ, Ellis AR, Wyss R, Conover M, Lunt M, Glynn RJ. Comparison of propensity score (PS) weighting and trimming strategies to reduce variance and bias of treatment effect estimates: a simulation study. Poster presented at the 34th ICPE International Conference on Pharmacoepidemiology & Therapeutic Risk Management; August 24, 2018. Prague, Czech Republic. [abstract] Pharmacoepidemiol Drug Saf. 2018 Aug; 27(S2):53. doi: 10.1002/pds.4629

BACKGROUND: To reduce the variance of PS‐based treatment effect estimates, biostatisticians have proposed various covariate‐balancing weights and trimming the tails of the PS distribution. In parallel, pharmacoepidemiologists have proposed PS trimming to reduce confounding in the tails of the PS distribution where unmeasured factors can cause patients to be treated contrary to prediction.

OBJECTIVES: To compare the performance of PS weighting and trimming methods with respect to both variance and bias.

METHODS: We simulated cohort studies with a binary intended treatment T as a function of 4 measured covariates (confounders and instruments, dichotomous and continuous). We mimicked treatment withheld and last‐resort treatment by adding two “unmeasured” dichotomous factors that caused treatment assignment to change for some patients in both tails of the PS distribution. The number of outcomes Y was simulated as a Poisson function of T, all confounders (including unmeasured), and 2 risk factors. We estimated the PS based on measured covariates using logistic regression and trimmed the tails of the PS distribution using 3 strategies (Crump et al., Biometrika 2009; Stürmer et al., AJE 2010; and Walker et al, Comp Eff Res 2013). After trimming, we re‐estimated the PS and then implemented various PS weights to estimate the treatment effect (rate ratio) in the population (ATE), the treated (ATT), the untreated (ATU), the overlap (ATO), and the matched (ATM) (Li et al., JASA 2017).

RESULTS: With no unmeasured confounding and 20% treatment prevalence, relative efficiency (RE) versus the untrimmed ATE ranged from 61% (ATU, Walker) to 123% (ATO, untrimmed). Crump trimming improved efficiency for ATE (RE = 112%), but not for ATO (RE = 120% vs 123%). With unmeasured confounding leading to treatment withheld, ATO and ATM were more biased than ATE, and only Stürmer and Walker trimming reduced bias for all estimates. Mean squared error (MSE) was lowest for Stürmer trimming. With unmeasured confounding leading to last‐resort treatment, ATT, ATO, and ATM were less biased than ATE, and all trimming methods reduced bias. MSE was lowest for Crump trimming.

CONCLUSIONS: ATO and Crump trimming reduce the variance of PSweighted treatment effect estimates compared with ATE but change the causal interpretation. In settings where unmeasured confounding (eg, frailty) may lead physicians to withhold treatment, only Stürmer and Walker trimming reduce bias consistently in scenarios assessed.

Share on: