Journal of Minimally Invasive Surgery 2024; 27(2): 55-71
Published online June 15, 2024
https://doi.org/10.7602/jmis.2024.27.2.55
© The Korean Society of Endo-Laparoscopic & Robotic Surgery
Correspondence to : Jungsoo Gim
Department of Biomedical Science, Chosun University, 309 Pilmun-daero, Dong-gu, Gwangju 61452, Korea
E-mail: jgim@chosun.ac.kr
https://orcid.org/0000-0002-6454-9751
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Recently, there has been considerable progress in developing new technologies and equipment for the medical field, including minimally invasive surgeries. Evaluating the effectiveness of these treatments requires study designs like randomized controlled trials. However, due to the nature of certain treatments, randomization is not always feasible, leading to the use of observational studies. The effect size estimated from observational studies is subject to selection bias caused by confounders. One method to reduce this bias is propensity scoring. This study aimed to introduce a propensity score matching process between two groups using a practical example with R. Additionally, Rex, an Excel add-in graphical user interface statistical program, is provided for researchers unfamiliar with R programming. Further techniques, such as matching with three or more groups, propensity score weighting and stratification, and imputation of missing values, are summarized to offer approaches for more complex studies not covered in this tutorial.
Keywords Propensity score, Matching, Selection bias, R, Rex
The demand for minimally invasive surgeries has grown since they minimize the surgical site and bleeding, leading to faster recovery times. Consequently, new equipment and technologies incorporating minimally invasive treatments are being developed for various conditions, such as degenerative brain disorders and spinal diseases in older adults and brain tumor removal in infants [1–4]. Accurate evaluations of these new treatments are crucial for several reasons: (1) they clarify the benefits of the new technologies and the potential side effects; (2) they update the guidelines to offer better strategies in clinical practice; and (3) they ensure efficient allocation of medical resources [5,6]. Additionally, future medical research can focus on effective technologies and encourage investment in promising ones [7].
Selecting the appropriate outcome indicators and comparable controls is essential to get the precise treatment effects. If unrelated outcomes are chosen, or if the control group differs from the treatment group, it becomes challenging to determine whether the effects are due to the treatment or other factors. Selecting appropriate outcomes is relatively straightforward, but ensuring the comparability of well-balanced characteristics in the treatment and control groups is generally problematic. In observational studies, the characteristics of the treatment and control groups usually differ. Fortunately, there are various statistical approaches to strengthen causal inference by controlling for confounders (Fig. 1) [8]. One such method is the propensity score.
Propensity scores can be applied in several ways, such as matching, weighting, and stratification. Propensity score-based analyses perform efficient evaluations even with a small sample size [9,10]. Notably, they ensure the validity and generalizability of the result in causal inference, whereas conventional analyses infer only association, not causality [11].
This study aimed to provide a practical guide to propensity score analyses, focusing on matching. Unlike previous tutorials on propensity score matching (PSM), this study covers traditional and machine learning-based matching with detailed descriptions using the recently updated R packages. It also offers an easy way of performing PSM with an Excel add-in statistical program called Rex, providing point-and-click analysis for researchers unfamiliar with R programming. To this end, we explained the concept of propensity score and outlined the step-by-step process of PSM, along with available statistical programs. As a practical example, we used PSM to compare the surgical outcomes between patients who underwent noninvasive high-intensity focused ultrasound (HIFU) brain surgery and conventional brain resection surgery. Finally, we summarize additional considerations not covered in this study.
When comparing the outcomes of treatment and control groups, differing characteristics, aside from the treatment itself, can make it difficult to determine whether the differences in outcomes are due to the treatment or other factors. This leads to a biased estimate of the treatment effect, with the other influencing factors referred to as ‘confounders.’
One way to obtain an unbiased effect size is through randomized controlled trials (RCTs), which randomly assign subjects to the treatment or control groups. However, when random assignment is not feasible due to the nature of the study, observational studies are necessary. In this case, propensity scores are used to control for confounders in a manner similar to random assignment.
To better understand propensity scores, consider a balanced two-arm RCT. The concept of random assignment in RCTs can be expressed mathematically as P(Xi = 1 | (C1i, C2i, ···, Cki) = 0.05, where Xi = 1 meaning that the ith subject is assigned to the treatment group. This equation indicates that the probability of a subject with k characteristics (C1i, C2i, ···, Cki) being assigned to the treatment group is 0.5, resulting in a 0.5 probability of being assigned to the control group. In other words, it indicates equal assignment probabilities between two groups. It has been shown that the k characteristics (C1i, C2i, ···, Cki) of subjects assigned to the groups are similar.
When adopting this concept to observational studies, the propensity score estimates the probability that the ith subject is assigned to the treatment group using covariates. PSM pairs subjects from both groups with similar scores to emulate the effects of random assignment.
Propensity score analyses can be broadly classified into the following six steps (Fig. 2).
1. Set variables: Define the causal variable (treatment), outcome variable, and confounding variables.
2. Estimate propensity scores: Choose the optimal model to estimate the propensity scores based on the defined variables.
3. Perform analyses: Use the estimated propensity scores to conduct analyses, such as matching, weighting, and stratification.
4. Check balance: Assess the balance of propensity scores and covariates between the treatment and control groups.
5. Evaluate treatment effect: From the balanced data, evaluate the treatment effect and conduct a sensitivity analysis on the effect size.
6. Iterate if necessary: If covariate imbalance is detected after propensity score analysis, or if the treatment effect estimates are not robust on the sensitivity analysis, redo the process starting from variable selection or propensity score estimation (PSE).
The techniques for propensity score analyses are as follows: (1) PSM, which matches subjects with similar propensity scores from the treatment and control groups; (2) propensity score weighting (PSW), which calculates the inverse-treatment probability weights based on the propensity scores to adjust the distribution of covariates into two groups for comparability; and (3) propensity score stratification (PSS), which stratifies subjects from both groups into similar propensity score strata, then calculates and integrates the treatment effects estimated from each stratum [11].
This study focused on PSM, the most frequently used technique among the various propensity score analysis techniques. Generalized linear models (GLMs) and their extension with regularization terms were introduced for propensity score estimation. Machine learning models were also added, including a generalized boosted model (GBM), random forest, Bayesian additive regression trees (BART), and a neural network model. In total, four algorithms were used for matching: nearest neighbor matching, optimal matching, full matching, and genetic matching.
Treatment effects can be estimated if covariate balance is achieved based on the propensity scores. The fundamental approach in causal inference is to estimate the average treatment effect using weighted means. However, due to a lack of consensus on computing the standard error of the estimates, bootstrapping or non-parametric tests are used to estimate the 95% confidence intervals or determine statistical significance [11,12]. Alternatively, a weighted generalized linear regression model can be fitted to estimate the treatment effect [13].
In practice, the most widely used method is to treat the matched data as a regular dataset and apply conventional statistical analyses. In this case, the matched data can be regarded as pooled data, and the analyses can be conducted under the independence assumption. Alternatively, pairwise or conditional analyses can be performed based on the matched pairs [14]. Since there is still debate about the suitable approach, researchers are advised to follow the methods commonly used in their field. The available methods for each case are shown in Fig. 3.
Various statistical programs, including R, STATA, SAS, and Python, can analyze propensity scores. Each program possesses unique features and strengths. The selection of a specific program hinges on the characteristics of the data and the requirements of the analysis. This study introduces practical applications using R and Rex. R is an open-source statistical software freely available and continually enhanced by its users. R packages for propensity score analyses include MatchIt, cobalt, WeightIt, twang, and others [15–18]. This study employed the MatchIt and cobalt packages for PSM. In contrast, Rex is an Excel add-in program that offers various R-based statistical analysis modules in a graphical user interface (GUI) format [19]. Rex includes complimentary modules for descriptive statistics, group comparison, categorical data analysis, and survival analysis. However, certain modules, such as PSM and structural equation modeling, require purchase (a 2-week free trial is available).
This section describes an example of propensity score using R. Implementation using Rex is presented in Supplementary Material 1.
HIFU is a noninvasive surgical technique used to treat target areas in the brain without the need for an incision, offering shorter recovery times and reduced infection risks compared with conventional deep brain surgery (DBS). In this example, PSM was conducted between the HIFU treatment group (treatment = 1) and the DBS control group (treatment = 0). Covariates, including age (in years), sex (1 = male, 2 = female), presence of comorbidities (0 = no, 1 = yes; referred to as “comor”), and the preoperative symptom score (in points; referred to as “pre_sx”), were utilized. The objective was to estimate the effect of the treatment on the HIFU postoperative symptom score. Data for this analysis was generated using the R code provided in Supplementary Material 2.
Categorical covariates were converted to factors after importing the data saved as a CSV file. It is important to note that including categorical covariates as dummy variables (one less than the number of levels) in the model can yield different propensity scores or matching outcomes. Therefore, converting them to factors in R is recommended instead of using dummy variables. However, if treatment variables are not numeric variables coded as 0 and 1, they cannot be fitted to machine learning models. Thus, they should not be converted to factors even though they are categorical variables.
The ‘matchit()’ function available in the MatchIt package (version 4.5.5) facilitates PSE and PSM. Table 1 outlines the primary function arguments for PSM and PSE for two treatment groups.
Table 1 . Primary arguments in MatchIt::matchit() for PSM
For use | Argument | Format (with default values) | Details |
---|---|---|---|
PSE or PSM | Formulaa) | formula = treatment ~ covariate1 + covariate2 + … | The model representing the relationship between the treatment and covariates for propensity score estimation |
Only the treatment variable can be entered for matching based on the pre-calculated propensity score vector | |||
PSE or PSM | data | data = datafame | A data frame containing the variables specified in the formula |
PSE or PSM | distance | distance = ‘glm’ | The method for calculating the propensity score used for matching |
PSE or PSM | link | link = ‘logit’ | Specify the link function when the distance is set to ‘glm’, ‘ridge’, ‘lasso’, ‘elasticnet’, or ‘gbm’ |
PSE or PSM | distance.options | distance.options=list() | Set the specific options for each model to compute the propensity score |
PSM | method | method = ‘nearest’ | The method for matching |
PSM | exact | exact = NULL | Exact matching for specific variables (this format matches subjects with the exact same value of covariate1 and covariate2) |
PSM | ratio | ratio = 1 | The number of control subjects to be matched with one treated subject |
PSM | replace | replace = FALSE | Matching without replacement (restricts each control subject to be matched with one treated subject) |
PSM | discard | discard = ‘none’ | The way to manage subjects that fall outside the common support region of the propensity scores before matching |
PSM | reestimate | reestimate = FALSE | Determines whether to reestimate the propensity scores after some subjects have been discarded based on the ‘discard’ option |
PSM | caliper | caliper = NULL | The width of the caliper to be used in the matching |
PSM | std.caliper | std.caliper = TRUE | The way to measure the caliper |
PSM | m.order | m.order = NULL | The order that the matching is performed |
PSE, propensity score estimation; PSM, propensity score matching.
a)The argument ‘formula’ must be specified in any case.
The basic structure of the analysis code is as follows. To conduct only PSE, utilize the ‘formula,’ ‘data,’ ‘distance,’ ‘link,’ and ‘distance.options’ parameters. For PSM based on pre-estimated propensity scores, input the relevant propensity score vector in ‘distance’ and specify the treatment variable in ‘formula.’ When performing both PSE and PSM simultaneously, all arguments are applied. Default values are used for arguments that are not explicitly designated.
For PSE, the causal variable (treatment), the outcome variable, and the covariates expected to be associated with the treatment and outcome variables need to be set [12]. Among the covariates that precede or co-occur with the treatment, a variable correlated with both the treatment and outcome is called a ‘confounder.’ In contrast, if a treatment affects an intermediate indicator, which affects the outcome, then that intermediate indicator is called a ‘mediator.’ Controlling for mediators in the model can lead to underestimation of the treatment effect. Therefore, researchers should review relevant factors from previous studies on the causal relationship when estimating propensity scores. Mediators should be excluded, and confounders should be included in the model for PSE.
In this example, the surgery type (‘treat’) is the causal variable, and the postoperative symptom score (‘post_sx’) is the outcome. Confounders related to the treatment assignment and outcome are age, sex, presence of underlying disease (‘diz’), and preoperative symptom score (‘pre_sx’). These were selected as covariates for the PSE.
Methods for estimating propensity scores can be divided into parametric and non-parametric methods. The most commonly used parametric method is the GLM, including logistic regression. GLMs, by specifying probability distributions and link functions, can be applied to binary, multinomial, and continuous treatment variables. These models can also be extended to regularized regression models like Ridge, Lasso, and ElasticNet, which add penalty terms to the sum of the squared errors.
Recently, machine learning techniques have been widely used for estimating propensity scores. These non-parametric methods do not assume a specific distribution and can model complex patterns inherent in data. However, it is necessary to adjust the hyperparameters to prevent overfitting.
In the ‘matchit()’ function, ‘method=NULL’ allows for PSE without matching. Table 2 lists the available models for estimation in ‘matchit().’ To build various PSE models, refer to the model-specific options for each method.
Table 2 . Specifications of each model in matchit() for propensity score estimation
Argument | Details | Utilized function | Specific options |
---|---|---|---|
distance = ‘glm’ (default) or distance = ‘ps’ | A generalized linear model (e.g. logistic regression) | glm() | family, link |
distance = ‘ridge’ | A generalized linear model with L2 regularization (Ridge model) | glmnet::cv.glmnet() | family, link |
distance = ‘lasso’ | A generalized linear model with L1 regularization (Lasso model) | glmnet::cv.glmnet() | family, link |
distance = ‘elasticnet’ | A generalized linear model with L1 and L2 regularizations (Elastic Net model) | glmnet::cv.glmnet() | family, link, alpha |
distance = ‘gbm’ | A generalized boosted model | gbm::gbm() | n.trees, interaction.depth, shrinkage, bag.fraction, n.minobsinnode |
distance = ‘randomforest’ | A random forest | randomForest::ra ndomForest() | mtry, ntree, nodesice, sampsize, maxnodes |
distance = ‘bart’ | The Bayesian additive regression trees (BART) | dbarts::bart2() | num.trees, num.burn, num.iter, alpha, beta, sigma |
distance = ‘nnet’ | A single-hidden-layer neural network | nnet::nnet() | size, decay, maxit, linout, trace |
Generalized linear model with various link functions
In GLMs, various models can be specified by the link function. The logistic regression model, which is most widely used for a binary treatment group, uses the link function of logit (link = ‘logit’). Features of each link function are listed in Table 3.
Table 3 . Link functions in matchit()
Argument | Function | Model | Suitable case |
---|---|---|---|
link = ‘logit’ | Logistic regression | Commonly used for binary outcome in practice | |
link = ‘probit’ | Probit regression | Used for binary outcome which has more extreme values close to 0 or 1 | |
link = ‘cloglog’ | Complementary log-log regression | Used for binary outcome of rare event | |
link = ‘cauchit’ | Cauchit regression | Used for binary outcome which has heavy tails and is very sensitive to outliers |
Here is an example code for fitting a GLM with a logit link function with the setting of ‘distance = ‘glm’ and link = ‘logit’, the default probability distribution was binomial (family = ‘binomial’), so no additional ‘distance.options’ were specified.
Generalized linear model with regularization
Propensity scores can also be estimated using regularized regression models, which add a regularization term to the sum of squared errors in the generalized linear model discussed in previous section. Specifying ‘ridge,’ ‘lasso,’ or ‘elasticnet’ for the ‘distance’ parameter in the ‘matchit()’ function performs Ridge, Lasso, or Elastic Net regression, respectively. Ridge regression handles multicollinearity, and Lasso regression performs variable selection by shrinking some weights to zero. Elastic Net combines the strengths of both models.
Special options for these models include the parameter lambda, which adjusts the regularization strength. The ‘cv.glmnet()’ function in the glmnet package, used by ‘matchit(),’ automatically calculates the optimal lambda value, so it does not need to be specified separately. However, for Elastic Net, an additional parameter alpha is required to adjust the rate of L1, with a default value of 0.5. Below is an example code for fitting Ridge and Elastic Net model based on logistic regression. In this code, ‘glmr2’ model correspond to an Elastic Net model with 25% L1 (lasso) regularization.
Generalized boosted model
The GBM is one of the boosting techniques that combine multiple decision trees by iterative updates with weights to reduce the errors in the previous tree. The final estimated probability is the aggregate of the predicted probabilities of all the generated trees, which is used as the propensity score.
The GBM effectively captures nonlinear relationships and interaction effects. However, to avoid overfitting, it is necessary to tune the parameters, such as the number of trees (n.trees), depth (interaction.depth), and learning rate (shrinkage). The hyperparameters available in this model are shown in Table 4.
Table 4 . Distance options for generalized boosted model in matchit()
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 5,000 | The number of trees to be generated | More trees can improve model fitting in training set, but increase the risk of overfitting. |
interaction.depth | 3 | The maximum depth of each tree or allowable interactions | Greater depth can capture complex patterns better but may lead to overfitting. |
shrinkage | 0.01 | Learning rate, which is a scalar coefficient multiplied to each tree | Lower values make the model learn more slowly and can improve generalization, but too low can make training very slow. |
n.minobsinnode | 10 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growth, helping to prevent overfitting. |
The code below is an example of PSE based on a GBM. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(1) Generalized boosted model’ section of Supplementary Material 3 as a reference.
Random forest
Generating multiple bootstrap samples, random forest models create several decision trees by randomly combining variables. Then, the model combines the prediction results of each tree to produce a final estimated probability, known as the propensity score. This method is robust against overfitting and useful when evaluating the relative importance of various variables, especially in cases with many covariates where the importance of each covariate is unknown. Relevant options are summarized in Table 5.
Table 5 . Distance options for random forest in matchit()
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 500 | The number of trees to be generated | More trees can improve model stability and accuracy, but increase computation. |
mtry | sqrt(p) | The number of covariates to consider at each split where p is the total number of covariates | Larger values can improve model accuracy but may increase the risk of overfitting and computation. |
nodesize | 1 | The minimum number of observations required in the terminal node of each tree | Larger values make the trees less complex, which can help prevent overfitting but may reduce accuracy. |
maxnodes | NULL | The maximum number of terminal nodes that trees can have | Limiting the number of nodes can reduce overfitting, but too small value may result in underfitting. |
importance | FALSE | Whether to compute variable importance | Variable importance can provide insights into the model but increases computation time. |
The code below provides an example of PSE based on a random forest model. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(2) Random forest’ section of Supplementary Material 3 as a reference.
Bayesian additive regression trees
BART is an ensemble model based on the Bayesian approach combining multiple regression trees. It employs the Markov chain Monte Carlo (MCMC) algorithm to sample parameters from Bayesian prior distributions, aiming to reduce errors from prior trees, thereby iteratively producing new regression trees.
For each generated tree, the final estimated probability is calculated by weighting predicted probabilities according to the contribution from the Bayesian posterior distribution. Relevant options are outlined in Table 6.
Table 6 . Distance options for Bayesian additive regression trees (BART) in matchit()
Argument | Default value | Details | Impact |
---|---|---|---|
num_trees | 200 | The number of trees to be generated | More trees can improve model stability and accuracy but increase computation. |
k | 2 | A hyperparameter controlling the variance of the tree predictions | Larger values reduce variance but can make the model more conservative, potentially underfitting. |
alpha | 0.95 | The base terminal node probability in the tree-building process | Higher values increase the probability of smaller trees, reducing complexity and overfitting. |
beta | 2 | A hyperparameter influencing the depth of the trees | Higher values encourage deeper trees, which make model more complex, potentially overfitting. |
n_min | 5 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growths, helping to prevent overfitting. |
n_iter | 1,000 | The number of iterations for the MCMC algorithm | More iterations can improve model convergence and stability but increase computation time. |
burn_in | 100 | The number of initial MCMC iterations to be discarded | Larger values ensure more stable models but increase computation time. |
MCMC, Markov chain Monte Carlo.
This method effectively estimates propensity scores when there is a significant variance in the covariates between the treatment and control groups. Notably, the prediction from each tree not only provides estimates but also captures the uncertainty of these estimates via MCMC sampling. Consequently, the final estimated probabilities are computed to reflect this uncertainty. However, this comes with a substantial computational burden.
The code below is an example of a PSE based on BART. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(3) Bayesian additive regression trees’ section of Supplementary Material 3 as a reference.
Neural network model
A neural network is a network of interconnected neurons organized into layers. A neural network model (NNET) mimics the structure of neural networks, with covariates as the input layer and treatment variables as the output layer. A basic neural network with only these two layers is equivalent to a linear regression model.
Adding a layer with hidden neurons between these two layers creates a multilayer structure capable of capturing nonlinear relationships. Neurons in each hidden layer, often termed nodes, receive inputs from the previous layer, compute a weighted linear combination, and transport these outputs to the subsequent layer. To prevent overfitting, regularization parameters (decay) control the magnitude of the weights, similar to a Ridge regression. Increasing the number of layers and nodes per layer allows for structuring more complex patterns in the data.
A single-hidden-layer neural network model structure is applied by setting ‘distance = ‘nnet’ in ‘matchit().’ The number of nodes (size) in this hidden layer and the regularization parameter (decay) optimize the model (refer to Table 7). Due to longer training times, it is recommended to use this approach when computing resources are sufficient.
Table 7 . Distance options for a single-hidden-layer neural network in matchit()
Argument | Default value | Details | Impact |
---|---|---|---|
size | 5 | The number of nodes in the hidden layer | More nodes can capture more complex patterns but increase the risk of overfitting and computation time. |
decay | 0.1 | The weight decay parameter for regularization | Higher values help prevent overfitting but may lead to underfitting. |
maxit | 100 | The maximum number of iterations for training the neural network | More iterations improve convergence but increase the risk of overfitting and computation time. |
linout | FALSE | Specifies if the activation function should be linear | For propensity score of binary treatment, a nonlinear activation function is more appropriate. |
trace | TRUE | Whether to produce output during training | Outputs provide insight into the training process, which can be useful for debugging and monitoring, but also increase computation. |
The code below is an example of PSE based on a neural network model. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(4) Neural network model with a single-hidden-layer’ section of Supplementary Material 3.
Model selection for propensity score estimation
The criteria for the optimal model for PSE can be divided into model fitness and the distribution of propensity scores. Akaike information criterion (AIC) or Bayesian information criterion (BIC) are goodness-of-fit measurements for generalized linear models, but they are not computed in machine learning models. Therefore, the propensity score distribution is used for the most appropriate model.
The propensity score distribution between treatment and control groups is assessed by the common support region (CSR). When there is no overlap in propensity scores between the two groups, known as ‘separation,’ the groups cannot be compared, making it impossible to calculate treatment effects. In this case, adjusting the model and its parameters or redefining the covariate settings is crucial. Here are the code and results for inspecting the CSR of estimated propensity scores from glm1 and rf1 objects. Since glm1 demonstrates a broader and more uniform CSR compared with rf1, it is a better model (Fig. 4).
Once optimal propensity scores are calculated, different matching methods are applied. Popular algorithms are nearest neighbor matching, optimal matching, full matching, and genetic matching. Table 8 summarizes the features of each algorithm and the scenarios in which they are useful.
Table 8 . Matching algorithms in matchit()
Argument | Matching algorithm | Details | Suitable case |
---|---|---|---|
method = NULL | None | Estimate the propensity score without matching | None |
method = ‘nearest’ (default) | Nearest neighbor matching | Match each subject to the closest propensity score within the designated caliper | Evenly distributed propensity scores |
method = ‘optimal’ | Optimal pair matching | Select the optimal one from all possible matching combinations | High dimensional data with many covariates, high imbalanced between treated and control |
method = ‘full’ | Optimal full matching | Find the overall optimal matching from all subjects for matching | Useful when desired to use all subjects |
method = ‘genetic’ | Matching by genetic algorithm | Optimize covariate balance across groups based on genetic algorithm | High dimensional data with many covariates, high imbalanced between treated and control, or small sample size |
Additionally, options used for matching include exact, ratio, replace, discard, reestimate, caliper, std.caliper, and m.order, as presented in Table 1. Table 9 outlines the detailed descriptions of each option.
Table 9 . Options for propensity score matching in matchit()
Argument | Format | Details |
---|---|---|
exact | exact = ~ covariate1 + covariate2 | Exact matching on specific variables (in this format, match subjects with the exactly same value of covariate1 and covariate2) |
ratio | 1 (default) | The number of control subjects to be matched with one treated subject |
replace | FALSE (default) | Matching without replacement (restrict each control subject to be matched with one treated subject) |
TRUE | Matching with replacement (allow control subjects to be matched with several treated subjects) | |
discard | none’ (default) | No units are discarded |
control’ | Discard only control subjects with propensity scores outside the common support region | |
treated’ | Discard only treated subjects with propensity scores outside the common support region | |
both’ | Discard both treated and control subjects with propensity scores outside the common support region | |
reestimate | FALSE (default) | Use the original propensity score for matching, even for some subjects are discarded |
TRUE | Reestimate the propensity score using only the subjects who remained after discard process | |
caliper | NULL (default) (for no caliper) | The width of the caliper to be used in matching |
std.caliper | TRUE (default) | The caliper is defined as a fraction or multiple of the standard deviation of the logit of the propensity score |
FALSE | The caliper is defined in the raw units of the propensity score | |
m.order | NUKK (default) | Corresponds to ‘largest’ |
random’ | Match in a random order | |
smallest’ | Match subjects with the smallest propensity scores first | |
largest’ | Match subjects with the largest propensity scores first |
Using the propensity scores estimated from the glm1 model, nearest neighbor matching was conducted within a caliper corresponding to 10% of the standard error of the logit of the propensity scores, with identical age and sex ensured. The code was set up for a 1:2 matching ratio without replacement. After matching, the results were examined using the ‘summary(matching1)’ function. Due to the exact matching of age and sex, the final matching includes 11 subjects in the treatment group and 13 subjects in the control group, totaling 24 subjects, which is relatively small (Fig. 5).
After PSM, it is essential to verify whether a balance has been achieved between the covariates of the treatment and control groups. This usually involves comparing the standardized covariates between the two groups and evaluating whether the means and variances of the propensity scores are similar. Histograms or other visualizations can be used to compare the propensity score distributions before and after matching.
When comparing group means and variances, since the units of each covariate are different, the propensity scores and covariates are standardized. The group difference in means and the ratio of variances are then calculated. These are respectively referred to as the standardized mean difference (SMD) or absolute SMD (ASMD) and the variance ratio (VR). Typically, if the absolute value of the SMD is less than 0.1 and the VR falls between 0.5 and 2.0, it is considered that there is no imbalance in the covariates and propensity scores [20].
Checking the covariate balance before and after matching can also be done through histograms or density plots. However, it is common to use a plot called a Love plot to visualize changes in the SMD before and after matching easily. If there is an imbalance in the covariates, it is recommended the propensity scores be reestimated.
The following code was used to examine the SMD and VR for the matching1 object created in step 3. Histograms, density plots, and a Love plot were generated to visualize the covariate balance. Since all covariates had an SMD under 0.1 and a VR between 0.5 and 2.0, it was determined that a covariate balance was achieved (Fig. 6). The Love plot in Fig. 6C illustrates the changes in the ASMD for each covariate before and after matching. Initially, as depicted by the blue line, each covariate had an ASMD greater than 0.1, sorted from the highest ASMD starting with comorbidity. After matching, represented by the red line, the ASMD for all four covariates fell below 0.1, indicating an effective covariate balancing.
Once the post-PSM has achieved covariate balance, the matched data are utilized to perform analyses to evaluate the treatment effect.
Matched data extraction
The process for extracting the matched dataset is outlined below. Since the matching data were obtained from the matching1 object generated in step 3, it was evident that age and sex were identical in the 1:2 matched nine pairs (Fig. 7). Within this dataset, the variable ‘ps’ represents the estimated propensity score, ‘matching weight’ denotes the sample weight resulting from the 1:2 matching, and ‘match_id’ indicates the pair number of the matching.
Treatment effect analysis
Postoperative symptom scores are categorized into two cases: improvement (post_imp = 1) for scores less than 40 or no improvement (post_imp = 0). Two approaches evaluated this variable: (1) a standard logistic regression assuming independence of the two groups in the matched data, and (2) a conditional logistic regression of the matched pairs. The following code and results illustrate these analyses. Since this involves a 1:k matching where k is greater than or equal to 2, the sample weights are incorporated as weighting variables.
In propensity score analyses, a sensitivity analysis evaluates the robustness of the estimated effect size. Various methods have been proposed to assess the impact of omitted variable bias when all relevant covariates are not included. However, implementing all the methods presented in Fig. 3 may not be practical [21–23].
A commonly used sensitivity analysis involves applying different models during the PSM and conducting the same analysis on multiple matched datasets derived from various matching methods [24]. When the effect size of a specific dataset is not robust, further investigation is necessary to identify the cause. This may involve verifying the error in the PSM, checking for omitted covariates, or ensuring that the parameters are correctly included. If the estimated treatment effects remain valid across multiple datasets, it strengthens the validity of causal inference. However, in this study, a separate sensitivity analysis was not conducted.
While this study primarily focused on matching between two groups, alternative techniques such as PSW and stratification also deserve attention. PSW uses the inverse probability of treatment weights (IPTW), calculated from the propensity score of each subject, to analyze the data. This method strives to equalize covariate distributions across treatment and control groups, improving their comparability. Unlike matching, it includes all available data, enhancing sample efficiency. However, extremely large or small weights can skew the results. To minimize this issue, truncation or stabilization techniques should be applied [25,26].
Instead of directly applying weighted analysis, another approach has been introduced to construct matched sub-controls based on the IPTW in studies with asymmetric sample sizes, particularly for those with larger control groups. Using these matched sub-controls, a post-genome-wide association study (GWAS) analysis was conducted, identifying additional candidate variants associated with the phenotypes [27].
PSS includes partitioning the dataset into multiple strata based on propensity scores, calculating treatment effects within each stratum, and aggregating these effects. This method helps reduce bias stemming from unobserved confounding variables, leading to more precise and stable estimates. Typically, PSS involves creating strata of equal size based on the propensity score distribution.
Propensity score analysis can be extended to treatment variables with three or more levels or continuous causal variables. However, no established methodology or tool developed explicitly for PSM exists for these cases. Typically, for treatment variables with three or more levels, a link function based on a multinomial distribution is specified. For continuous variables, a generalized linear regression model is fitted using distributions such as Gaussian or Gamma, depending on the data distribution. Propensity scores are then estimated using these models, and PSW analysis is applied accordingly.
In observational studies, treatment effects are often categorized into ‘average treatment effect for the treated (ATT),’ ‘average treatment effect for the control (ATC),’ or ‘average treatment effect (ATE)’ [28,29]. ATT refers to the average effect of the treatment on the treated group compared with the matched control group. ATC represents the average effect of the treatment on the control group compared with the matched treated group. ATE calculates the average effect of the treatment for the entire population. In this study, PSM primarily aimed to estimate the treatment effect using control groups similar to the treatment group, thus corresponding to ATT. Conversely, PSW analysis estimates the treatment effect based on the entire population, aligning with ATE.
In observational studies, the missing values rate is generally higher than in experimental studies. It is common for the probability of missing covariates to differ between the treatment and control groups. Removing missing values on a pairwise basis can significantly distort the treatment effect. Therefore, when analyzing data with missing values, it is recommended to consider various assumptions related to missing data mechanisms, such as missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR), as well as missing data imputation techniques, such as maximum likelihood and multiple minputation. After addressing missing data, conducting propensity score analysis is advisable [30]. In R, packages like ‘mi,’ ‘mice,’ and ‘missForest’ can help perform missing data imputation [31,32].
This study introduced PSM to reduce bias in observational studies, providing a comprehensive overview of each step involved in the process. It demonstrated the practical application using an example in R and introduced the GUI program Rex. However, this tutorial is limited to matching treatment variables between two groups, indicating a need for future research to expand PSM to include a treatment variable with three or more levels or continuous causal variables. Additionally, exploring other analysis techniques, such as PSW, stratification, and handling missing data through imputation methods, are beneficial areas for further investigation
Conceptualization, Investigation: BL, SW, JG
Funding acquisition: BL, JG
Project administration: JG
Visualization: BL, NK
Writing–original draft: BL, JG
Writing–review & editing: All authors
All authors read and approved the final manuscript.
All authors have no conflicts of interest to declare.
This research was supported by the Global-Learning & Academic Research Institution for Master’s·PhD students, and Postdocs (LAMP) Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. RS-2023-00285353).
All the R code in this paper is available at http://www.gimlab.online.
Supplementary materials can be found via https://doi.org/10.7602/jmis.2024.27.2.55.
Journal of Minimally Invasive Surgery 2024; 27(2): 55-71
Published online June 15, 2024 https://doi.org/10.7602/jmis.2024.27.2.55
Copyright © The Korean Society of Endo-Laparoscopic & Robotic Surgery.
Bora Lee1,2 , Nam-eun Kim3 , Sungho Won2,3,4 , Jungsoo Gim1,5,6
1Institute of Well-Aging Medicare & CSU G-LAMP Project Group, Chosun University, Gwangju, Korea
2Department of Public Health Sciences, Seoul National University, Seoul, Korea
3Institute of Health and Environment, Seoul National University, Seoul, Korea
4RexSoft Inc., Seoul, Korea
5Department of Biomedical Science, Chosun University, Gwangju, Korea
6Gwangju Alzheimer’s & Related Dementia Cohort Research Center, Chosun University, Gwangju, Korea
Correspondence to:Jungsoo Gim
Department of Biomedical Science, Chosun University, 309 Pilmun-daero, Dong-gu, Gwangju 61452, Korea
E-mail: jgim@chosun.ac.kr
https://orcid.org/0000-0002-6454-9751
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Recently, there has been considerable progress in developing new technologies and equipment for the medical field, including minimally invasive surgeries. Evaluating the effectiveness of these treatments requires study designs like randomized controlled trials. However, due to the nature of certain treatments, randomization is not always feasible, leading to the use of observational studies. The effect size estimated from observational studies is subject to selection bias caused by confounders. One method to reduce this bias is propensity scoring. This study aimed to introduce a propensity score matching process between two groups using a practical example with R. Additionally, Rex, an Excel add-in graphical user interface statistical program, is provided for researchers unfamiliar with R programming. Further techniques, such as matching with three or more groups, propensity score weighting and stratification, and imputation of missing values, are summarized to offer approaches for more complex studies not covered in this tutorial.
Keywords: Propensity score, Matching, Selection bias, R, Rex
The demand for minimally invasive surgeries has grown since they minimize the surgical site and bleeding, leading to faster recovery times. Consequently, new equipment and technologies incorporating minimally invasive treatments are being developed for various conditions, such as degenerative brain disorders and spinal diseases in older adults and brain tumor removal in infants [1–4]. Accurate evaluations of these new treatments are crucial for several reasons: (1) they clarify the benefits of the new technologies and the potential side effects; (2) they update the guidelines to offer better strategies in clinical practice; and (3) they ensure efficient allocation of medical resources [5,6]. Additionally, future medical research can focus on effective technologies and encourage investment in promising ones [7].
Selecting the appropriate outcome indicators and comparable controls is essential to get the precise treatment effects. If unrelated outcomes are chosen, or if the control group differs from the treatment group, it becomes challenging to determine whether the effects are due to the treatment or other factors. Selecting appropriate outcomes is relatively straightforward, but ensuring the comparability of well-balanced characteristics in the treatment and control groups is generally problematic. In observational studies, the characteristics of the treatment and control groups usually differ. Fortunately, there are various statistical approaches to strengthen causal inference by controlling for confounders (Fig. 1) [8]. One such method is the propensity score.
Propensity scores can be applied in several ways, such as matching, weighting, and stratification. Propensity score-based analyses perform efficient evaluations even with a small sample size [9,10]. Notably, they ensure the validity and generalizability of the result in causal inference, whereas conventional analyses infer only association, not causality [11].
This study aimed to provide a practical guide to propensity score analyses, focusing on matching. Unlike previous tutorials on propensity score matching (PSM), this study covers traditional and machine learning-based matching with detailed descriptions using the recently updated R packages. It also offers an easy way of performing PSM with an Excel add-in statistical program called Rex, providing point-and-click analysis for researchers unfamiliar with R programming. To this end, we explained the concept of propensity score and outlined the step-by-step process of PSM, along with available statistical programs. As a practical example, we used PSM to compare the surgical outcomes between patients who underwent noninvasive high-intensity focused ultrasound (HIFU) brain surgery and conventional brain resection surgery. Finally, we summarize additional considerations not covered in this study.
When comparing the outcomes of treatment and control groups, differing characteristics, aside from the treatment itself, can make it difficult to determine whether the differences in outcomes are due to the treatment or other factors. This leads to a biased estimate of the treatment effect, with the other influencing factors referred to as ‘confounders.’
One way to obtain an unbiased effect size is through randomized controlled trials (RCTs), which randomly assign subjects to the treatment or control groups. However, when random assignment is not feasible due to the nature of the study, observational studies are necessary. In this case, propensity scores are used to control for confounders in a manner similar to random assignment.
To better understand propensity scores, consider a balanced two-arm RCT. The concept of random assignment in RCTs can be expressed mathematically as P(Xi = 1 | (C1i, C2i, ···, Cki) = 0.05, where Xi = 1 meaning that the ith subject is assigned to the treatment group. This equation indicates that the probability of a subject with k characteristics (C1i, C2i, ···, Cki) being assigned to the treatment group is 0.5, resulting in a 0.5 probability of being assigned to the control group. In other words, it indicates equal assignment probabilities between two groups. It has been shown that the k characteristics (C1i, C2i, ···, Cki) of subjects assigned to the groups are similar.
When adopting this concept to observational studies, the propensity score estimates the probability that the ith subject is assigned to the treatment group using covariates. PSM pairs subjects from both groups with similar scores to emulate the effects of random assignment.
Propensity score analyses can be broadly classified into the following six steps (Fig. 2).
1. Set variables: Define the causal variable (treatment), outcome variable, and confounding variables.
2. Estimate propensity scores: Choose the optimal model to estimate the propensity scores based on the defined variables.
3. Perform analyses: Use the estimated propensity scores to conduct analyses, such as matching, weighting, and stratification.
4. Check balance: Assess the balance of propensity scores and covariates between the treatment and control groups.
5. Evaluate treatment effect: From the balanced data, evaluate the treatment effect and conduct a sensitivity analysis on the effect size.
6. Iterate if necessary: If covariate imbalance is detected after propensity score analysis, or if the treatment effect estimates are not robust on the sensitivity analysis, redo the process starting from variable selection or propensity score estimation (PSE).
The techniques for propensity score analyses are as follows: (1) PSM, which matches subjects with similar propensity scores from the treatment and control groups; (2) propensity score weighting (PSW), which calculates the inverse-treatment probability weights based on the propensity scores to adjust the distribution of covariates into two groups for comparability; and (3) propensity score stratification (PSS), which stratifies subjects from both groups into similar propensity score strata, then calculates and integrates the treatment effects estimated from each stratum [11].
This study focused on PSM, the most frequently used technique among the various propensity score analysis techniques. Generalized linear models (GLMs) and their extension with regularization terms were introduced for propensity score estimation. Machine learning models were also added, including a generalized boosted model (GBM), random forest, Bayesian additive regression trees (BART), and a neural network model. In total, four algorithms were used for matching: nearest neighbor matching, optimal matching, full matching, and genetic matching.
Treatment effects can be estimated if covariate balance is achieved based on the propensity scores. The fundamental approach in causal inference is to estimate the average treatment effect using weighted means. However, due to a lack of consensus on computing the standard error of the estimates, bootstrapping or non-parametric tests are used to estimate the 95% confidence intervals or determine statistical significance [11,12]. Alternatively, a weighted generalized linear regression model can be fitted to estimate the treatment effect [13].
In practice, the most widely used method is to treat the matched data as a regular dataset and apply conventional statistical analyses. In this case, the matched data can be regarded as pooled data, and the analyses can be conducted under the independence assumption. Alternatively, pairwise or conditional analyses can be performed based on the matched pairs [14]. Since there is still debate about the suitable approach, researchers are advised to follow the methods commonly used in their field. The available methods for each case are shown in Fig. 3.
Various statistical programs, including R, STATA, SAS, and Python, can analyze propensity scores. Each program possesses unique features and strengths. The selection of a specific program hinges on the characteristics of the data and the requirements of the analysis. This study introduces practical applications using R and Rex. R is an open-source statistical software freely available and continually enhanced by its users. R packages for propensity score analyses include MatchIt, cobalt, WeightIt, twang, and others [15–18]. This study employed the MatchIt and cobalt packages for PSM. In contrast, Rex is an Excel add-in program that offers various R-based statistical analysis modules in a graphical user interface (GUI) format [19]. Rex includes complimentary modules for descriptive statistics, group comparison, categorical data analysis, and survival analysis. However, certain modules, such as PSM and structural equation modeling, require purchase (a 2-week free trial is available).
This section describes an example of propensity score using R. Implementation using Rex is presented in Supplementary Material 1.
HIFU is a noninvasive surgical technique used to treat target areas in the brain without the need for an incision, offering shorter recovery times and reduced infection risks compared with conventional deep brain surgery (DBS). In this example, PSM was conducted between the HIFU treatment group (treatment = 1) and the DBS control group (treatment = 0). Covariates, including age (in years), sex (1 = male, 2 = female), presence of comorbidities (0 = no, 1 = yes; referred to as “comor”), and the preoperative symptom score (in points; referred to as “pre_sx”), were utilized. The objective was to estimate the effect of the treatment on the HIFU postoperative symptom score. Data for this analysis was generated using the R code provided in Supplementary Material 2.
Categorical covariates were converted to factors after importing the data saved as a CSV file. It is important to note that including categorical covariates as dummy variables (one less than the number of levels) in the model can yield different propensity scores or matching outcomes. Therefore, converting them to factors in R is recommended instead of using dummy variables. However, if treatment variables are not numeric variables coded as 0 and 1, they cannot be fitted to machine learning models. Thus, they should not be converted to factors even though they are categorical variables.
The ‘matchit()’ function available in the MatchIt package (version 4.5.5) facilitates PSE and PSM. Table 1 outlines the primary function arguments for PSM and PSE for two treatment groups.
Table 1 . Primary arguments in MatchIt::matchit() for PSM.
For use | Argument | Format (with default values) | Details |
---|---|---|---|
PSE or PSM | Formulaa) | formula = treatment ~ covariate1 + covariate2 + … | The model representing the relationship between the treatment and covariates for propensity score estimation |
Only the treatment variable can be entered for matching based on the pre-calculated propensity score vector | |||
PSE or PSM | data | data = datafame | A data frame containing the variables specified in the formula |
PSE or PSM | distance | distance = ‘glm’ | The method for calculating the propensity score used for matching |
PSE or PSM | link | link = ‘logit’ | Specify the link function when the distance is set to ‘glm’, ‘ridge’, ‘lasso’, ‘elasticnet’, or ‘gbm’ |
PSE or PSM | distance.options | distance.options=list() | Set the specific options for each model to compute the propensity score |
PSM | method | method = ‘nearest’ | The method for matching |
PSM | exact | exact = NULL | Exact matching for specific variables (this format matches subjects with the exact same value of covariate1 and covariate2) |
PSM | ratio | ratio = 1 | The number of control subjects to be matched with one treated subject |
PSM | replace | replace = FALSE | Matching without replacement (restricts each control subject to be matched with one treated subject) |
PSM | discard | discard = ‘none’ | The way to manage subjects that fall outside the common support region of the propensity scores before matching |
PSM | reestimate | reestimate = FALSE | Determines whether to reestimate the propensity scores after some subjects have been discarded based on the ‘discard’ option |
PSM | caliper | caliper = NULL | The width of the caliper to be used in the matching |
PSM | std.caliper | std.caliper = TRUE | The way to measure the caliper |
PSM | m.order | m.order = NULL | The order that the matching is performed |
PSE, propensity score estimation; PSM, propensity score matching..
a)The argument ‘formula’ must be specified in any case..
The basic structure of the analysis code is as follows. To conduct only PSE, utilize the ‘formula,’ ‘data,’ ‘distance,’ ‘link,’ and ‘distance.options’ parameters. For PSM based on pre-estimated propensity scores, input the relevant propensity score vector in ‘distance’ and specify the treatment variable in ‘formula.’ When performing both PSE and PSM simultaneously, all arguments are applied. Default values are used for arguments that are not explicitly designated.
For PSE, the causal variable (treatment), the outcome variable, and the covariates expected to be associated with the treatment and outcome variables need to be set [12]. Among the covariates that precede or co-occur with the treatment, a variable correlated with both the treatment and outcome is called a ‘confounder.’ In contrast, if a treatment affects an intermediate indicator, which affects the outcome, then that intermediate indicator is called a ‘mediator.’ Controlling for mediators in the model can lead to underestimation of the treatment effect. Therefore, researchers should review relevant factors from previous studies on the causal relationship when estimating propensity scores. Mediators should be excluded, and confounders should be included in the model for PSE.
In this example, the surgery type (‘treat’) is the causal variable, and the postoperative symptom score (‘post_sx’) is the outcome. Confounders related to the treatment assignment and outcome are age, sex, presence of underlying disease (‘diz’), and preoperative symptom score (‘pre_sx’). These were selected as covariates for the PSE.
Methods for estimating propensity scores can be divided into parametric and non-parametric methods. The most commonly used parametric method is the GLM, including logistic regression. GLMs, by specifying probability distributions and link functions, can be applied to binary, multinomial, and continuous treatment variables. These models can also be extended to regularized regression models like Ridge, Lasso, and ElasticNet, which add penalty terms to the sum of the squared errors.
Recently, machine learning techniques have been widely used for estimating propensity scores. These non-parametric methods do not assume a specific distribution and can model complex patterns inherent in data. However, it is necessary to adjust the hyperparameters to prevent overfitting.
In the ‘matchit()’ function, ‘method=NULL’ allows for PSE without matching. Table 2 lists the available models for estimation in ‘matchit().’ To build various PSE models, refer to the model-specific options for each method.
Table 2 . Specifications of each model in matchit() for propensity score estimation.
Argument | Details | Utilized function | Specific options |
---|---|---|---|
distance = ‘glm’ (default) or distance = ‘ps’ | A generalized linear model (e.g. logistic regression) | glm() | family, link |
distance = ‘ridge’ | A generalized linear model with L2 regularization (Ridge model) | glmnet::cv.glmnet() | family, link |
distance = ‘lasso’ | A generalized linear model with L1 regularization (Lasso model) | glmnet::cv.glmnet() | family, link |
distance = ‘elasticnet’ | A generalized linear model with L1 and L2 regularizations (Elastic Net model) | glmnet::cv.glmnet() | family, link, alpha |
distance = ‘gbm’ | A generalized boosted model | gbm::gbm() | n.trees, interaction.depth, shrinkage, bag.fraction, n.minobsinnode |
distance = ‘randomforest’ | A random forest | randomForest::ra ndomForest() | mtry, ntree, nodesice, sampsize, maxnodes |
distance = ‘bart’ | The Bayesian additive regression trees (BART) | dbarts::bart2() | num.trees, num.burn, num.iter, alpha, beta, sigma |
distance = ‘nnet’ | A single-hidden-layer neural network | nnet::nnet() | size, decay, maxit, linout, trace |
Generalized linear model with various link functions
In GLMs, various models can be specified by the link function. The logistic regression model, which is most widely used for a binary treatment group, uses the link function of logit (link = ‘logit’). Features of each link function are listed in Table 3.
Table 3 . Link functions in matchit().
Argument | Function | Model | Suitable case |
---|---|---|---|
link = ‘logit’ | Logistic regression | Commonly used for binary outcome in practice | |
link = ‘probit’ | Probit regression | Used for binary outcome which has more extreme values close to 0 or 1 | |
link = ‘cloglog’ | Complementary log-log regression | Used for binary outcome of rare event | |
link = ‘cauchit’ | Cauchit regression | Used for binary outcome which has heavy tails and is very sensitive to outliers |
Here is an example code for fitting a GLM with a logit link function with the setting of ‘distance = ‘glm’ and link = ‘logit’, the default probability distribution was binomial (family = ‘binomial’), so no additional ‘distance.options’ were specified.
Generalized linear model with regularization
Propensity scores can also be estimated using regularized regression models, which add a regularization term to the sum of squared errors in the generalized linear model discussed in previous section. Specifying ‘ridge,’ ‘lasso,’ or ‘elasticnet’ for the ‘distance’ parameter in the ‘matchit()’ function performs Ridge, Lasso, or Elastic Net regression, respectively. Ridge regression handles multicollinearity, and Lasso regression performs variable selection by shrinking some weights to zero. Elastic Net combines the strengths of both models.
Special options for these models include the parameter lambda, which adjusts the regularization strength. The ‘cv.glmnet()’ function in the glmnet package, used by ‘matchit(),’ automatically calculates the optimal lambda value, so it does not need to be specified separately. However, for Elastic Net, an additional parameter alpha is required to adjust the rate of L1, with a default value of 0.5. Below is an example code for fitting Ridge and Elastic Net model based on logistic regression. In this code, ‘glmr2’ model correspond to an Elastic Net model with 25% L1 (lasso) regularization.
Generalized boosted model
The GBM is one of the boosting techniques that combine multiple decision trees by iterative updates with weights to reduce the errors in the previous tree. The final estimated probability is the aggregate of the predicted probabilities of all the generated trees, which is used as the propensity score.
The GBM effectively captures nonlinear relationships and interaction effects. However, to avoid overfitting, it is necessary to tune the parameters, such as the number of trees (n.trees), depth (interaction.depth), and learning rate (shrinkage). The hyperparameters available in this model are shown in Table 4.
Table 4 . Distance options for generalized boosted model in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 5,000 | The number of trees to be generated | More trees can improve model fitting in training set, but increase the risk of overfitting. |
interaction.depth | 3 | The maximum depth of each tree or allowable interactions | Greater depth can capture complex patterns better but may lead to overfitting. |
shrinkage | 0.01 | Learning rate, which is a scalar coefficient multiplied to each tree | Lower values make the model learn more slowly and can improve generalization, but too low can make training very slow. |
n.minobsinnode | 10 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growth, helping to prevent overfitting. |
The code below is an example of PSE based on a GBM. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(1) Generalized boosted model’ section of Supplementary Material 3 as a reference.
Random forest
Generating multiple bootstrap samples, random forest models create several decision trees by randomly combining variables. Then, the model combines the prediction results of each tree to produce a final estimated probability, known as the propensity score. This method is robust against overfitting and useful when evaluating the relative importance of various variables, especially in cases with many covariates where the importance of each covariate is unknown. Relevant options are summarized in Table 5.
Table 5 . Distance options for random forest in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 500 | The number of trees to be generated | More trees can improve model stability and accuracy, but increase computation. |
mtry | sqrt(p) | The number of covariates to consider at each split where p is the total number of covariates | Larger values can improve model accuracy but may increase the risk of overfitting and computation. |
nodesize | 1 | The minimum number of observations required in the terminal node of each tree | Larger values make the trees less complex, which can help prevent overfitting but may reduce accuracy. |
maxnodes | NULL | The maximum number of terminal nodes that trees can have | Limiting the number of nodes can reduce overfitting, but too small value may result in underfitting. |
importance | FALSE | Whether to compute variable importance | Variable importance can provide insights into the model but increases computation time. |
The code below provides an example of PSE based on a random forest model. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(2) Random forest’ section of Supplementary Material 3 as a reference.
Bayesian additive regression trees
BART is an ensemble model based on the Bayesian approach combining multiple regression trees. It employs the Markov chain Monte Carlo (MCMC) algorithm to sample parameters from Bayesian prior distributions, aiming to reduce errors from prior trees, thereby iteratively producing new regression trees.
For each generated tree, the final estimated probability is calculated by weighting predicted probabilities according to the contribution from the Bayesian posterior distribution. Relevant options are outlined in Table 6.
Table 6 . Distance options for Bayesian additive regression trees (BART) in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
num_trees | 200 | The number of trees to be generated | More trees can improve model stability and accuracy but increase computation. |
k | 2 | A hyperparameter controlling the variance of the tree predictions | Larger values reduce variance but can make the model more conservative, potentially underfitting. |
alpha | 0.95 | The base terminal node probability in the tree-building process | Higher values increase the probability of smaller trees, reducing complexity and overfitting. |
beta | 2 | A hyperparameter influencing the depth of the trees | Higher values encourage deeper trees, which make model more complex, potentially overfitting. |
n_min | 5 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growths, helping to prevent overfitting. |
n_iter | 1,000 | The number of iterations for the MCMC algorithm | More iterations can improve model convergence and stability but increase computation time. |
burn_in | 100 | The number of initial MCMC iterations to be discarded | Larger values ensure more stable models but increase computation time. |
MCMC, Markov chain Monte Carlo..
This method effectively estimates propensity scores when there is a significant variance in the covariates between the treatment and control groups. Notably, the prediction from each tree not only provides estimates but also captures the uncertainty of these estimates via MCMC sampling. Consequently, the final estimated probabilities are computed to reflect this uncertainty. However, this comes with a substantial computational burden.
The code below is an example of a PSE based on BART. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(3) Bayesian additive regression trees’ section of Supplementary Material 3 as a reference.
Neural network model
A neural network is a network of interconnected neurons organized into layers. A neural network model (NNET) mimics the structure of neural networks, with covariates as the input layer and treatment variables as the output layer. A basic neural network with only these two layers is equivalent to a linear regression model.
Adding a layer with hidden neurons between these two layers creates a multilayer structure capable of capturing nonlinear relationships. Neurons in each hidden layer, often termed nodes, receive inputs from the previous layer, compute a weighted linear combination, and transport these outputs to the subsequent layer. To prevent overfitting, regularization parameters (decay) control the magnitude of the weights, similar to a Ridge regression. Increasing the number of layers and nodes per layer allows for structuring more complex patterns in the data.
A single-hidden-layer neural network model structure is applied by setting ‘distance = ‘nnet’ in ‘matchit().’ The number of nodes (size) in this hidden layer and the regularization parameter (decay) optimize the model (refer to Table 7). Due to longer training times, it is recommended to use this approach when computing resources are sufficient.
Table 7 . Distance options for a single-hidden-layer neural network in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
size | 5 | The number of nodes in the hidden layer | More nodes can capture more complex patterns but increase the risk of overfitting and computation time. |
decay | 0.1 | The weight decay parameter for regularization | Higher values help prevent overfitting but may lead to underfitting. |
maxit | 100 | The maximum number of iterations for training the neural network | More iterations improve convergence but increase the risk of overfitting and computation time. |
linout | FALSE | Specifies if the activation function should be linear | For propensity score of binary treatment, a nonlinear activation function is more appropriate. |
trace | TRUE | Whether to produce output during training | Outputs provide insight into the training process, which can be useful for debugging and monitoring, but also increase computation. |
The code below is an example of PSE based on a neural network model. Explaining the details of the search strategies for the optimal hyperparameters is beyond the scope of this study. Please refer to the relevant code in ‘(4) Neural network model with a single-hidden-layer’ section of Supplementary Material 3.
Model selection for propensity score estimation
The criteria for the optimal model for PSE can be divided into model fitness and the distribution of propensity scores. Akaike information criterion (AIC) or Bayesian information criterion (BIC) are goodness-of-fit measurements for generalized linear models, but they are not computed in machine learning models. Therefore, the propensity score distribution is used for the most appropriate model.
The propensity score distribution between treatment and control groups is assessed by the common support region (CSR). When there is no overlap in propensity scores between the two groups, known as ‘separation,’ the groups cannot be compared, making it impossible to calculate treatment effects. In this case, adjusting the model and its parameters or redefining the covariate settings is crucial. Here are the code and results for inspecting the CSR of estimated propensity scores from glm1 and rf1 objects. Since glm1 demonstrates a broader and more uniform CSR compared with rf1, it is a better model (Fig. 4).
Once optimal propensity scores are calculated, different matching methods are applied. Popular algorithms are nearest neighbor matching, optimal matching, full matching, and genetic matching. Table 8 summarizes the features of each algorithm and the scenarios in which they are useful.
Table 8 . Matching algorithms in matchit().
Argument | Matching algorithm | Details | Suitable case |
---|---|---|---|
method = NULL | None | Estimate the propensity score without matching | None |
method = ‘nearest’ (default) | Nearest neighbor matching | Match each subject to the closest propensity score within the designated caliper | Evenly distributed propensity scores |
method = ‘optimal’ | Optimal pair matching | Select the optimal one from all possible matching combinations | High dimensional data with many covariates, high imbalanced between treated and control |
method = ‘full’ | Optimal full matching | Find the overall optimal matching from all subjects for matching | Useful when desired to use all subjects |
method = ‘genetic’ | Matching by genetic algorithm | Optimize covariate balance across groups based on genetic algorithm | High dimensional data with many covariates, high imbalanced between treated and control, or small sample size |
Additionally, options used for matching include exact, ratio, replace, discard, reestimate, caliper, std.caliper, and m.order, as presented in Table 1. Table 9 outlines the detailed descriptions of each option.
Table 9 . Options for propensity score matching in matchit().
Argument | Format | Details |
---|---|---|
exact | exact = ~ covariate1 + covariate2 | Exact matching on specific variables (in this format, match subjects with the exactly same value of covariate1 and covariate2) |
ratio | 1 (default) | The number of control subjects to be matched with one treated subject |
replace | FALSE (default) | Matching without replacement (restrict each control subject to be matched with one treated subject) |
TRUE | Matching with replacement (allow control subjects to be matched with several treated subjects) | |
discard | none’ (default) | No units are discarded |
control’ | Discard only control subjects with propensity scores outside the common support region | |
treated’ | Discard only treated subjects with propensity scores outside the common support region | |
both’ | Discard both treated and control subjects with propensity scores outside the common support region | |
reestimate | FALSE (default) | Use the original propensity score for matching, even for some subjects are discarded |
TRUE | Reestimate the propensity score using only the subjects who remained after discard process | |
caliper | NULL (default) (for no caliper) | The width of the caliper to be used in matching |
std.caliper | TRUE (default) | The caliper is defined as a fraction or multiple of the standard deviation of the logit of the propensity score |
FALSE | The caliper is defined in the raw units of the propensity score | |
m.order | NUKK (default) | Corresponds to ‘largest’ |
random’ | Match in a random order | |
smallest’ | Match subjects with the smallest propensity scores first | |
largest’ | Match subjects with the largest propensity scores first |
Using the propensity scores estimated from the glm1 model, nearest neighbor matching was conducted within a caliper corresponding to 10% of the standard error of the logit of the propensity scores, with identical age and sex ensured. The code was set up for a 1:2 matching ratio without replacement. After matching, the results were examined using the ‘summary(matching1)’ function. Due to the exact matching of age and sex, the final matching includes 11 subjects in the treatment group and 13 subjects in the control group, totaling 24 subjects, which is relatively small (Fig. 5).
After PSM, it is essential to verify whether a balance has been achieved between the covariates of the treatment and control groups. This usually involves comparing the standardized covariates between the two groups and evaluating whether the means and variances of the propensity scores are similar. Histograms or other visualizations can be used to compare the propensity score distributions before and after matching.
When comparing group means and variances, since the units of each covariate are different, the propensity scores and covariates are standardized. The group difference in means and the ratio of variances are then calculated. These are respectively referred to as the standardized mean difference (SMD) or absolute SMD (ASMD) and the variance ratio (VR). Typically, if the absolute value of the SMD is less than 0.1 and the VR falls between 0.5 and 2.0, it is considered that there is no imbalance in the covariates and propensity scores [20].
Checking the covariate balance before and after matching can also be done through histograms or density plots. However, it is common to use a plot called a Love plot to visualize changes in the SMD before and after matching easily. If there is an imbalance in the covariates, it is recommended the propensity scores be reestimated.
The following code was used to examine the SMD and VR for the matching1 object created in step 3. Histograms, density plots, and a Love plot were generated to visualize the covariate balance. Since all covariates had an SMD under 0.1 and a VR between 0.5 and 2.0, it was determined that a covariate balance was achieved (Fig. 6). The Love plot in Fig. 6C illustrates the changes in the ASMD for each covariate before and after matching. Initially, as depicted by the blue line, each covariate had an ASMD greater than 0.1, sorted from the highest ASMD starting with comorbidity. After matching, represented by the red line, the ASMD for all four covariates fell below 0.1, indicating an effective covariate balancing.
Once the post-PSM has achieved covariate balance, the matched data are utilized to perform analyses to evaluate the treatment effect.
Matched data extraction
The process for extracting the matched dataset is outlined below. Since the matching data were obtained from the matching1 object generated in step 3, it was evident that age and sex were identical in the 1:2 matched nine pairs (Fig. 7). Within this dataset, the variable ‘ps’ represents the estimated propensity score, ‘matching weight’ denotes the sample weight resulting from the 1:2 matching, and ‘match_id’ indicates the pair number of the matching.
Treatment effect analysis
Postoperative symptom scores are categorized into two cases: improvement (post_imp = 1) for scores less than 40 or no improvement (post_imp = 0). Two approaches evaluated this variable: (1) a standard logistic regression assuming independence of the two groups in the matched data, and (2) a conditional logistic regression of the matched pairs. The following code and results illustrate these analyses. Since this involves a 1:k matching where k is greater than or equal to 2, the sample weights are incorporated as weighting variables.
In propensity score analyses, a sensitivity analysis evaluates the robustness of the estimated effect size. Various methods have been proposed to assess the impact of omitted variable bias when all relevant covariates are not included. However, implementing all the methods presented in Fig. 3 may not be practical [21–23].
A commonly used sensitivity analysis involves applying different models during the PSM and conducting the same analysis on multiple matched datasets derived from various matching methods [24]. When the effect size of a specific dataset is not robust, further investigation is necessary to identify the cause. This may involve verifying the error in the PSM, checking for omitted covariates, or ensuring that the parameters are correctly included. If the estimated treatment effects remain valid across multiple datasets, it strengthens the validity of causal inference. However, in this study, a separate sensitivity analysis was not conducted.
While this study primarily focused on matching between two groups, alternative techniques such as PSW and stratification also deserve attention. PSW uses the inverse probability of treatment weights (IPTW), calculated from the propensity score of each subject, to analyze the data. This method strives to equalize covariate distributions across treatment and control groups, improving their comparability. Unlike matching, it includes all available data, enhancing sample efficiency. However, extremely large or small weights can skew the results. To minimize this issue, truncation or stabilization techniques should be applied [25,26].
Instead of directly applying weighted analysis, another approach has been introduced to construct matched sub-controls based on the IPTW in studies with asymmetric sample sizes, particularly for those with larger control groups. Using these matched sub-controls, a post-genome-wide association study (GWAS) analysis was conducted, identifying additional candidate variants associated with the phenotypes [27].
PSS includes partitioning the dataset into multiple strata based on propensity scores, calculating treatment effects within each stratum, and aggregating these effects. This method helps reduce bias stemming from unobserved confounding variables, leading to more precise and stable estimates. Typically, PSS involves creating strata of equal size based on the propensity score distribution.
Propensity score analysis can be extended to treatment variables with three or more levels or continuous causal variables. However, no established methodology or tool developed explicitly for PSM exists for these cases. Typically, for treatment variables with three or more levels, a link function based on a multinomial distribution is specified. For continuous variables, a generalized linear regression model is fitted using distributions such as Gaussian or Gamma, depending on the data distribution. Propensity scores are then estimated using these models, and PSW analysis is applied accordingly.
In observational studies, treatment effects are often categorized into ‘average treatment effect for the treated (ATT),’ ‘average treatment effect for the control (ATC),’ or ‘average treatment effect (ATE)’ [28,29]. ATT refers to the average effect of the treatment on the treated group compared with the matched control group. ATC represents the average effect of the treatment on the control group compared with the matched treated group. ATE calculates the average effect of the treatment for the entire population. In this study, PSM primarily aimed to estimate the treatment effect using control groups similar to the treatment group, thus corresponding to ATT. Conversely, PSW analysis estimates the treatment effect based on the entire population, aligning with ATE.
In observational studies, the missing values rate is generally higher than in experimental studies. It is common for the probability of missing covariates to differ between the treatment and control groups. Removing missing values on a pairwise basis can significantly distort the treatment effect. Therefore, when analyzing data with missing values, it is recommended to consider various assumptions related to missing data mechanisms, such as missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR), as well as missing data imputation techniques, such as maximum likelihood and multiple minputation. After addressing missing data, conducting propensity score analysis is advisable [30]. In R, packages like ‘mi,’ ‘mice,’ and ‘missForest’ can help perform missing data imputation [31,32].
This study introduced PSM to reduce bias in observational studies, providing a comprehensive overview of each step involved in the process. It demonstrated the practical application using an example in R and introduced the GUI program Rex. However, this tutorial is limited to matching treatment variables between two groups, indicating a need for future research to expand PSM to include a treatment variable with three or more levels or continuous causal variables. Additionally, exploring other analysis techniques, such as PSW, stratification, and handling missing data through imputation methods, are beneficial areas for further investigation
Conceptualization, Investigation: BL, SW, JG
Funding acquisition: BL, JG
Project administration: JG
Visualization: BL, NK
Writing–original draft: BL, JG
Writing–review & editing: All authors
All authors read and approved the final manuscript.
All authors have no conflicts of interest to declare.
This research was supported by the Global-Learning & Academic Research Institution for Master’s·PhD students, and Postdocs (LAMP) Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (No. RS-2023-00285353).
All the R code in this paper is available at http://www.gimlab.online.
Supplementary materials can be found via https://doi.org/10.7602/jmis.2024.27.2.55.
Table 1 . Primary arguments in MatchIt::matchit() for PSM.
For use | Argument | Format (with default values) | Details |
---|---|---|---|
PSE or PSM | Formulaa) | formula = treatment ~ covariate1 + covariate2 + … | The model representing the relationship between the treatment and covariates for propensity score estimation |
Only the treatment variable can be entered for matching based on the pre-calculated propensity score vector | |||
PSE or PSM | data | data = datafame | A data frame containing the variables specified in the formula |
PSE or PSM | distance | distance = ‘glm’ | The method for calculating the propensity score used for matching |
PSE or PSM | link | link = ‘logit’ | Specify the link function when the distance is set to ‘glm’, ‘ridge’, ‘lasso’, ‘elasticnet’, or ‘gbm’ |
PSE or PSM | distance.options | distance.options=list() | Set the specific options for each model to compute the propensity score |
PSM | method | method = ‘nearest’ | The method for matching |
PSM | exact | exact = NULL | Exact matching for specific variables (this format matches subjects with the exact same value of covariate1 and covariate2) |
PSM | ratio | ratio = 1 | The number of control subjects to be matched with one treated subject |
PSM | replace | replace = FALSE | Matching without replacement (restricts each control subject to be matched with one treated subject) |
PSM | discard | discard = ‘none’ | The way to manage subjects that fall outside the common support region of the propensity scores before matching |
PSM | reestimate | reestimate = FALSE | Determines whether to reestimate the propensity scores after some subjects have been discarded based on the ‘discard’ option |
PSM | caliper | caliper = NULL | The width of the caliper to be used in the matching |
PSM | std.caliper | std.caliper = TRUE | The way to measure the caliper |
PSM | m.order | m.order = NULL | The order that the matching is performed |
PSE, propensity score estimation; PSM, propensity score matching..
a)The argument ‘formula’ must be specified in any case..
Table 2 . Specifications of each model in matchit() for propensity score estimation.
Argument | Details | Utilized function | Specific options |
---|---|---|---|
distance = ‘glm’ (default) or distance = ‘ps’ | A generalized linear model (e.g. logistic regression) | glm() | family, link |
distance = ‘ridge’ | A generalized linear model with L2 regularization (Ridge model) | glmnet::cv.glmnet() | family, link |
distance = ‘lasso’ | A generalized linear model with L1 regularization (Lasso model) | glmnet::cv.glmnet() | family, link |
distance = ‘elasticnet’ | A generalized linear model with L1 and L2 regularizations (Elastic Net model) | glmnet::cv.glmnet() | family, link, alpha |
distance = ‘gbm’ | A generalized boosted model | gbm::gbm() | n.trees, interaction.depth, shrinkage, bag.fraction, n.minobsinnode |
distance = ‘randomforest’ | A random forest | randomForest::ra ndomForest() | mtry, ntree, nodesice, sampsize, maxnodes |
distance = ‘bart’ | The Bayesian additive regression trees (BART) | dbarts::bart2() | num.trees, num.burn, num.iter, alpha, beta, sigma |
distance = ‘nnet’ | A single-hidden-layer neural network | nnet::nnet() | size, decay, maxit, linout, trace |
Table 3 . Link functions in matchit().
Argument | Function | Model | Suitable case |
---|---|---|---|
link = ‘logit’ | Logistic regression | Commonly used for binary outcome in practice | |
link = ‘probit’ | Probit regression | Used for binary outcome which has more extreme values close to 0 or 1 | |
link = ‘cloglog’ | Complementary log-log regression | Used for binary outcome of rare event | |
link = ‘cauchit’ | Cauchit regression | Used for binary outcome which has heavy tails and is very sensitive to outliers |
Table 4 . Distance options for generalized boosted model in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 5,000 | The number of trees to be generated | More trees can improve model fitting in training set, but increase the risk of overfitting. |
interaction.depth | 3 | The maximum depth of each tree or allowable interactions | Greater depth can capture complex patterns better but may lead to overfitting. |
shrinkage | 0.01 | Learning rate, which is a scalar coefficient multiplied to each tree | Lower values make the model learn more slowly and can improve generalization, but too low can make training very slow. |
n.minobsinnode | 10 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growth, helping to prevent overfitting. |
Table 5 . Distance options for random forest in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
n.trees | 500 | The number of trees to be generated | More trees can improve model stability and accuracy, but increase computation. |
mtry | sqrt(p) | The number of covariates to consider at each split where p is the total number of covariates | Larger values can improve model accuracy but may increase the risk of overfitting and computation. |
nodesize | 1 | The minimum number of observations required in the terminal node of each tree | Larger values make the trees less complex, which can help prevent overfitting but may reduce accuracy. |
maxnodes | NULL | The maximum number of terminal nodes that trees can have | Limiting the number of nodes can reduce overfitting, but too small value may result in underfitting. |
importance | FALSE | Whether to compute variable importance | Variable importance can provide insights into the model but increases computation time. |
Table 6 . Distance options for Bayesian additive regression trees (BART) in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
num_trees | 200 | The number of trees to be generated | More trees can improve model stability and accuracy but increase computation. |
k | 2 | A hyperparameter controlling the variance of the tree predictions | Larger values reduce variance but can make the model more conservative, potentially underfitting. |
alpha | 0.95 | The base terminal node probability in the tree-building process | Higher values increase the probability of smaller trees, reducing complexity and overfitting. |
beta | 2 | A hyperparameter influencing the depth of the trees | Higher values encourage deeper trees, which make model more complex, potentially overfitting. |
n_min | 5 | The minimum number of observations required in the terminal node of each tree | Larger values limit three growths, helping to prevent overfitting. |
n_iter | 1,000 | The number of iterations for the MCMC algorithm | More iterations can improve model convergence and stability but increase computation time. |
burn_in | 100 | The number of initial MCMC iterations to be discarded | Larger values ensure more stable models but increase computation time. |
MCMC, Markov chain Monte Carlo..
Table 7 . Distance options for a single-hidden-layer neural network in matchit().
Argument | Default value | Details | Impact |
---|---|---|---|
size | 5 | The number of nodes in the hidden layer | More nodes can capture more complex patterns but increase the risk of overfitting and computation time. |
decay | 0.1 | The weight decay parameter for regularization | Higher values help prevent overfitting but may lead to underfitting. |
maxit | 100 | The maximum number of iterations for training the neural network | More iterations improve convergence but increase the risk of overfitting and computation time. |
linout | FALSE | Specifies if the activation function should be linear | For propensity score of binary treatment, a nonlinear activation function is more appropriate. |
trace | TRUE | Whether to produce output during training | Outputs provide insight into the training process, which can be useful for debugging and monitoring, but also increase computation. |
Table 8 . Matching algorithms in matchit().
Argument | Matching algorithm | Details | Suitable case |
---|---|---|---|
method = NULL | None | Estimate the propensity score without matching | None |
method = ‘nearest’ (default) | Nearest neighbor matching | Match each subject to the closest propensity score within the designated caliper | Evenly distributed propensity scores |
method = ‘optimal’ | Optimal pair matching | Select the optimal one from all possible matching combinations | High dimensional data with many covariates, high imbalanced between treated and control |
method = ‘full’ | Optimal full matching | Find the overall optimal matching from all subjects for matching | Useful when desired to use all subjects |
method = ‘genetic’ | Matching by genetic algorithm | Optimize covariate balance across groups based on genetic algorithm | High dimensional data with many covariates, high imbalanced between treated and control, or small sample size |
Table 9 . Options for propensity score matching in matchit().
Argument | Format | Details |
---|---|---|
exact | exact = ~ covariate1 + covariate2 | Exact matching on specific variables (in this format, match subjects with the exactly same value of covariate1 and covariate2) |
ratio | 1 (default) | The number of control subjects to be matched with one treated subject |
replace | FALSE (default) | Matching without replacement (restrict each control subject to be matched with one treated subject) |
TRUE | Matching with replacement (allow control subjects to be matched with several treated subjects) | |
discard | none’ (default) | No units are discarded |
control’ | Discard only control subjects with propensity scores outside the common support region | |
treated’ | Discard only treated subjects with propensity scores outside the common support region | |
both’ | Discard both treated and control subjects with propensity scores outside the common support region | |
reestimate | FALSE (default) | Use the original propensity score for matching, even for some subjects are discarded |
TRUE | Reestimate the propensity score using only the subjects who remained after discard process | |
caliper | NULL (default) (for no caliper) | The width of the caliper to be used in matching |
std.caliper | TRUE (default) | The caliper is defined as a fraction or multiple of the standard deviation of the logit of the propensity score |
FALSE | The caliper is defined in the raw units of the propensity score | |
m.order | NUKK (default) | Corresponds to ‘largest’ |
random’ | Match in a random order | |
smallest’ | Match subjects with the smallest propensity scores first | |
largest’ | Match subjects with the largest propensity scores first |
Yoonhong Kim, Ki Hyun Kim, Kyung Won Seo, Seung Hun Lee, Gyung Mo Son
Journal of Minimally Invasive Surgery 2022; 25(1): 24-31