Rasouliyan L, Odom D. WORTH the weight: evaluation of weighting strategies in unanchored matching-adjusted indirect comparison. Poster to be given at the Virtual ISPOR 2021 Conference; May 2021.

OBJECTIVES: The objective of this research is to evaluate weighting strategies in unanchored matching-adjusted indirect comparison (MAIC).

METHODS: Patient-level example datasets for two single-arm trials with time-to death (TTD) endpoints were simulated. Trial 1 comprised individual patient-level data; trial 2 comprised aggregate-level data and was assumed to have presented digitizable Kaplan-Meier plots. Mean prevalences of four dichotomous patient characteristics predictive of TTD were held constant in trial 1 and varied in trial 2 to reflect differing degrees of cross-trial similarity. Trial 1 patients were reweighted employing MAIC, and adjusted hazard ratios were computed employing weighted Cox regression. Unweighted (UN) methods and the following weighting strategies were implemented: Signorovitch weights (SV), inverse probability weights (IP), stabilized weights (ST), overlap weights (OV), and matching weights (MA). Each scenario was simulated 1,000 times. Performance metrics to assess weighting strategies were the mean percentage error, mean absolute percentage error, and coverage probability of 95% confidence interval.

RESULTS: In the scenario of identical cross-trial patient characteristics, UN and weighting strategies performed similarly, all exhibiting very little bias. UN estimates demonstrated increasing degrees of bias when cross-trial patient characteristics were most dissimilar. SV exhibited the most favorable performance metrics across all scenarios. OV and MA performed similarly to SV when trial 1 characteristics were more predictive of TTD but were biased when they were less predictive. IP and ST estimates were biased in scenarios with moderate to large cross-trial differences.

In these simulations, SV demonstrated superior properties across all scenarios compared with other weighting methods. Lackluster performances in some scenarios of OV and MA, both of which tend to perform well in traditional propensity score analyses, may be due to the inability to trim patients in the aggregate-level data trial. Further research is needed to understand the behavior of these weighting strategies in different scenarios.

Share on: