Journal of Minimally Invasive Surgery 2023; 26(1): 9-18
Published online March 15, 2023
https://doi.org/10.7602/jmis.2023.26.1.9
© The Korean Society of Endo-Laparoscopic & Robotic Surgery
Correspondence to : Hae In Bang
Department of Laboratory Medicine, Soonchunhyang University Seoul Hospital, 59 Daesagwan-ro, Yongsan-gu, Seoul 04401, Korea
E-mail: genuine43@schmc.ac.kr
ORCID:
https://orcid.org/0000-0001-7854-3011
Youngho Park
Department of Big Data Application, College of Smart Interdisciplinary Engineering, Hannam University, 70 Hannamro, Daedeok-gu, Daejeon 34430, Korea
E-mail: yhpark@hnu.kr
ORCID:
https://orcid.org/0000-0002-7096-3967
Hae In Bang and Youngho Park contributed equally to this study as co-corresponding authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Since the era of evidence-based medicine, it has become a matter of course to use statistics to create objective evidence in clinical research. As an extension of this, it has become essential in clinical research to calculate the correct sample size to demonstrate a clinically significant difference before starting the study. Also, because sample size calculation methods vary from study design to study design, there is no formula for sample size calculation that applies to all designs. It is very important for us to understand this. In this review, each sample size calculation method suitable for various study designs was introduced using the R program (R Foundation for Statistical Computing). In order for clinical researchers to directly utilize it according to future research, we presented practice codes, output results, and interpretation of results for each situation.
Keywords Sample size, Effect size, Continuous outcome, Categorical outcome
This article will cover the following topics: (1) Why is sample size calculation important?; (2) Components of sample size calculation; and (3) How to calculate the required sample size?
The main purpose of sample size calculation is to determine the minimum number of subjects required to detect a clinically relevant treatment effect. The fundamental reason for calculating the number of subjects in the study can be divided into the following three categories [1,2].
In clinical studies, if the sample is not large enough, statistical significance may not be found even if an important relationship or difference exists. In other words, it may not be possible to successfully conclude the study because the study may lack the power to detect the effect. Conversely, when a study is based on a very large sample, small effect differences may be considered statistically significant and lead to clinical misjudgment. Either way, your research may not be successful for other reasons and the conclusion is a waste of money, time, and resources.
Oversized studies are likely to include more subjects than the study actually needs, exposing unnecessarily many subjects to potentially harmful or futile treatments. Similarly, in undersized studies, ethical issues may arise in that subjects are exposed to unnecessary situations in studies that may have low success rates.
If a negative result is obtained after conducting a study, it is necessary to consider whether the sample size of the study was sufficient or insufficient. First, if the study was conducted with sufficient sample size, it can be interpreted that there is no clinically significant effect. However, if the study is conducted with insufficient sample size, meaningful clinical results with statistically significant differences in practice may be missed. Notice that not being able to reject the null hypothesis does not mean that it is true; it means that we do not have enough evidence to reject it.
Additionally, calculating sample size at the study design stage, when receiving ethics committee approval, has become a requirement rather than an option. As a result, calculating the optimal sample size is an important process that must be done at the design stage before a study is conducted in order to ensure the validity, accuracy, reliability, and scientific and ethical integrity of the study.
Appropriate sample size usually depends on the statistical hypotheses made with the study’s primary outcome and the study design parameters. The basic statistical six concepts that must be considered essential for estimating the sample size are as follows.
There are various research designs [3] in clinical research, but among them, the most commonly used design is the
When establishing statistical hypotheses in research, two hypotheses are always required, which we call the null hypothesis (
The
The hypothesis testing process is as follows: (1) assume that the null hypothesis is true, calculate the test statistic using sample data and (2) decide whether or not to reject the null hypothesis according to the result. That is, we always choose one of the four decisions shown in Table 2 and two types of errors inevitably occur:
Variables to see clinically significant differences may vary, but the most important factor among them should be selected. This is called the primary outcome, and the other measurements are referred to as secondary outcomes. The number of samples is calculated using the primary outcome. At this time, the parameter information of the primary outcome for calculating the sample size can be obtained from prior studies or pilot studies. Both continuous and categorical data can be used as primary outcomes, and the parameters used to calculate
The sample size estimation formula yields the minimum number of subjects required to meet statistical significance for a given hypothesis. However, in an actual study, subject dropout may occur during the study, so in order to satisfy all the number of subjects desired by the researcher, the total number of subjects considering the dropout rate must be calculated so that more subjects can be enrolled. If ‘n’ is the number of samples calculated according to the formula and ‘dr’ is the dropout rate, then the adjusted sample size ‘N’ is given by: N = n / (1 – dr).
Depending on the study design, there are many more considerations in addition to the six concepts mentioned above. Although not considered in the practice below, we would like to mention three points that are frequently mentioned and used in actual clinical research to help researchers.
In clinical trials, available patients, treatment costs, and treatment resources may influence the allocation ratio (k) decision. According to Lachin [8] and van Belle [9], it can be applied as follows.
(1) Calculate the sample size
In the confirmatory trials, there are cases in which interim analysis, whether planned or unplanned, is performed at the research planning stage. When calculating the number of subjects taking this into account, the false positive rate increases with the number of interim analyses, so type I error should be considered.
In survival analysis, the outcome variable is the time until a specific event such as death occurs, and whether or not an event occurs for each subject and the time from the start of the clinical trial to the occurrence of the event (or censoring) are used as outcome variables. In particular, the power of survival analysis is a function of the number of events and generally increases with a shorter period (
However, for studies with relatively low event rates and high censoring, the following sample size formula using only event rates can be used:
Using the 17 tests in Table 3, which are widely used in research, we would like to show an example using an R program version 4.1.2 (R Foundation for Statistical Computing; ‘pwr’, ‘exact2x2’ and ‘WebPower’ [11] packages), one of the free programs. Basically, when using R, you need to install a package that includes the function you want to analyze and then use it. After that, you can use the function you want to use after calling package using the ‘library()’ function. More details will be explained through the example below.
All studies intend to use a parallel group design. A two-tailed test with a significance of 0.05 and a power of 80% was established. The dropout rate is different for each research field, but here we will unify it at 20%. For nonparametric tests on continuous variables, as a rule of thumb [12], calculate the sample size required for parametric tests and add 15%. Effect size can be defined as ‘a standardized measure of the magnitude of the mean difference or relationship between study groups’ [13]. In other words, an index that divides the effect size by its dispersion (standard deviation, etc.) is not affected by the measurement unit and can be used regardless of the unit, and is called an ‘effect size index’ or ‘standardized effect size.’ Cohen intuitively introduced effect sizes as small, medium, and large for easy understanding [14]. However, since the value presented by Cohen may vary depending on the population or distribution of the variable, there may be limitations in using it as an absolute value. When estimating the number of subjects, effect sizes (such as Cohen’s d, r, or the relative ratio, etc.) should be calculated using parameter information (MD and SD) found in the literature relevant to each primary outcome and entered as arguments to the function. Additionally, whether an effect size should be interpreted as small, medium, or large may depend on the analysis method. We use the guidelines mentioned by Cohen [14] and Sawilowsky [15] and use the medium effect size considered for each test in the examples below.
When the primary outcome considered in the study is continuous data, the number of samples can be calculated using the ‘pwr’ package. At this time, you can consider comparing the mean of a single group, two groups, or more than three groups, and Cohen’s d and f will be used for the effect size. When applied to your study, parameters can be taken from a previous or pilot study and calculated using the effect size calculation formula below.
The pwr.t.test() function (Supplementary data 1, Table 1) can be utilized with the ‘type’ argument for (1) one-sample
Assuming a
Assuming a
In the case of paired samples, if there is a correlation coefficient (r) between the variables before and after, it can be calculated as the
A total of 43 was calculated by one-sample
By two-sample
The 43 pairs were calculated by paired
The pwr.anova.test() function (Supplementary data 1, Table 2) can be used in studies that compare averages of three or more groups. In this function, ‘k’ means the number of comparison groups and ‘f’ means the effect size, and Cohen’s f is used here. The detailed calculation formula can be found below. Cohen suggests that f values of 0.1, 0.25, and 0.4 indicate small, medium, and large effect sizes respectively. Also, we will use medium effect size = 0.25.
Assume that the
By one-way ANOVA, 66 people were calculated for each group, and if 15% of each group is additionally considered, a total of 297 people are calculated.
If the primary outcome considered in your study is categorical data, you can use the ‘pwr’ package for parametric tests and the ‘exact2x2’ package for nonparametric tests to calculate the number of samples.
The pwr.p.test() and pwr.2p.test() functions (Supplementary data 1, Table 3) are used when comparing one-sample and two-sample proportions, respectively. Cohen’s h is used here as the effect size. The calculation formula is: h = 2arcsin(
In the one-sample proportion test,
Assuming a
In the chi-square test, which is a commonly used method for measuring the association between categorical data, Cohen’s w is used as a measure of effect size. The pwr.chisq.test() function (Supplementary data 1, Table 3) takes ‘w’ as an argument for effect size and ‘df’ as an argument for degrees of freedom. Assuming that two categorical variables have
Similarly, we assumed a
For nonparametric testing of categorical data, power calculation can be performed using the ss2x2() function [17] (Supplementary data 2, Table 1). Fisher exac
Assuming that the event rate of the control group was 0.2 and that of the treatment group was 0.8, the allocation ratio of each group was set at 1:1. If a two-sided test is performed with a significance level of 0.05 and a power of 80%, 12 samples are calculated for each group. Considering a dropout rate of 20%, 15 subjects are required for each group, for a total of 30 subjects.
Assuming that the event rate of the matched control group was 0.2 and that of the matched case (or treatment) group was 0.8, the allocation ratio of each group was set at 1:1. If a two-sided test is performed with a significance level of 0.05 and a power of 80%, 13 samples are calculated for each group. Considering a dropout rate of 20%, 16 subjects are required for each group, for a total of 32 subjects.
Correlation analysis determines whether there is a linear relationship between two continuous variables. The ‘pwr’ package will be used for this test.
The pwr.r.test() function (Supplementary data 1, Table 5) can be used in correlation analysis. The correlation coefficient (r) is used as a measure of effect size. Cohen suggests that r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively. We will use a medium effect size of 0.3.
Assuming a
Generalized linear models [18] have been formulated as a way to incorporate a variety of other statistical models, including linear regression, logistic regression, and Poisson regression. We will use the ‘pwr’ package for linear regression and the ‘We’ package for logistic/Poisson regression.
The pwr.f2.test() function (Supplementary data 1, Table 6) can be used for multiple linear regression analysis. We will use Cohen’s f2 as the effect size using the R2 value used as a measure of goodness of fit in regression analysis (Cohen’s f2 =
Similarly, we assumed a
The wp.logistict() and wp.poisson() function (Supplementary data 3, Table 1 and 2) can be used for logistic and Poisson regression analysis respectively. The two arguments “family” and “parameter” should contain information about the distribution of the predictor (or risk factor). Default values are used when the information in a predictor is unknown. You can change the parameter value if you know.
If predictor (X) is a continuous variable, it can be used as family = “normal” and the ‘parameter’ is used as default. The way p0 and p1 are calculated can be calculated using the 1SD range of X. You can set p1 to the probability of being in range and p0 to the probability of being out of range. In this example, p0 = 0.15 and p1 = 0.1 were used. Similarly, we assumed a
If predictor (X) is a binary variable, it can be used as family = “bernoulli” and the ‘parameter’ will be used as its default value. For exp0, a base rate of 1 under the null hypothesis was used, and for exp1, expected relative risk = 1.2 was set as the relative increment of the event rate. Similarly, we assumed a
In conclusion, sample size calculation plays the most important role in the research design process before starting the study. In particular, since randomized controlled trial studies, which are frequently conducted in clinical settings, are directly related to cost issues, the number of samples must be carefully calculated. However, although there are various references related to sample size calculation, it can be difficult to correctly use a method suitable for your own study. Of course, it would be better to seek expert advice for more complex studies, but we hope that this article will help researchers calculate the right number of subjects for their own research.
Conceptualization: YHK, HIB, YP
Data curation: SP, YHK, YHK
Formal analysis: SP, HIB
Investigation: SP, HIB
Methodology: SP, YHK
Project administration: YHK, Y.P
Visualization: HIB
Writing–Original Draft: SP, HIB
Writing–Review & Editing: All authors
All authors have no conflicts of interest to declare.
This work was supported by the Soonchunhyang University Research Fund.
Supplementary data 1–3 can be found via https://doi.org/10.7602/jmis.2023.26.1.9.
jmis-26-1-9-supple.pdfTypes of hypothesis testing
Test for | Null hypothesis ( | Alternative hypothesis ( |
---|---|---|
Equality | ||
Equivalence | | | | |
Superiority | ||
Non-inferiority |
Type I and type II error
True status | Decision | |
---|---|---|
Correct decision | Type I error (α) | |
Type II error (β) | Correct decision |
Tests for calculating sample size
Test | R package | Function | |||
---|---|---|---|---|---|
No. | Type | # of group | Name | ||
1 | Continuous/Parametric | 1 | One-sample | pwr | pwr.t.test |
2 | 2 | Two-sample | pwr | pwr.t.test | |
3 | 2 | Paired | pwr | pwr.t.test | |
4 | ≥3 | One-way ANOVA | pwr | pwr.anova.test | |
5 | Continuous/Nonparametric | 1 | One-sample Wilcoxon test | pwr | pwr.t.test |
6 | 2 | Mann-Whitney | pwr | pwr.t.test | |
7 | 2 | Paired Wilcoxon test | pwr | pwr.t.test | |
8 | ≥3 | Kruskal-Wallis test | pwr | pwr.anova.test | |
9 | Categorical/Parametric | 1 | One-sample proportion test | pwr | pwr.p.test |
10 | 2 | Two-sample proportion test | pwr | pwr.2p.test | |
11 | - | Chi-square test | pwr | pwr.chisq.test | |
12 | Categorical/Nonparametric | 2 | Fisher exac | exact2x2 | ss2x2 |
13 | 2 | McNemar test | exact2x2 | ss2x2 | |
14 | Correlation analysis | pwr | pwr.r.test | ||
15 | Linear regression | pwr | pwr.f2.test | ||
16 | Logistic regression | WebPower | wp.logistic | ||
17 | Poisson regression | WebPower | wp.poisson |
ANOVA, analysis of variance.
Journal of Minimally Invasive Surgery 2023; 26(1): 9-18
Published online March 15, 2023 https://doi.org/10.7602/jmis.2023.26.1.9
Copyright © The Korean Society of Endo-Laparoscopic & Robotic Surgery.
Suyeon Park1,2,3 , Yeong-Haw Kim3 , Hae In Bang4 , Youngho Park5
1Department of Biostatistics, Academic Research Office, Soonchunhyang University Seoul Hospital, Seoul, Korea
2International Development and Cooperation, Graduate School of Multidisciplinary Studies Toward Future, Soonchunhyang University, Asan, Korea
3Department of Applied Statistics, Chung-Ang University, Seoul, Korea
4Department of Laboratory Medicine, Soonchunhyang University Seoul Hospital, Seoul, Korea
5Department of Big Data Application, College of Smart Interdisciplinary Engineering, Hannam University, Daejeon, Korea
Correspondence to:Hae In Bang
Department of Laboratory Medicine, Soonchunhyang University Seoul Hospital, 59 Daesagwan-ro, Yongsan-gu, Seoul 04401, Korea
E-mail: genuine43@schmc.ac.kr
ORCID:
https://orcid.org/0000-0001-7854-3011
Youngho Park
Department of Big Data Application, College of Smart Interdisciplinary Engineering, Hannam University, 70 Hannamro, Daedeok-gu, Daejeon 34430, Korea
E-mail: yhpark@hnu.kr
ORCID:
https://orcid.org/0000-0002-7096-3967
Hae In Bang and Youngho Park contributed equally to this study as co-corresponding authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Since the era of evidence-based medicine, it has become a matter of course to use statistics to create objective evidence in clinical research. As an extension of this, it has become essential in clinical research to calculate the correct sample size to demonstrate a clinically significant difference before starting the study. Also, because sample size calculation methods vary from study design to study design, there is no formula for sample size calculation that applies to all designs. It is very important for us to understand this. In this review, each sample size calculation method suitable for various study designs was introduced using the R program (R Foundation for Statistical Computing). In order for clinical researchers to directly utilize it according to future research, we presented practice codes, output results, and interpretation of results for each situation.
Keywords: Sample size, Effect size, Continuous outcome, Categorical outcome
This article will cover the following topics: (1) Why is sample size calculation important?; (2) Components of sample size calculation; and (3) How to calculate the required sample size?
The main purpose of sample size calculation is to determine the minimum number of subjects required to detect a clinically relevant treatment effect. The fundamental reason for calculating the number of subjects in the study can be divided into the following three categories [1,2].
In clinical studies, if the sample is not large enough, statistical significance may not be found even if an important relationship or difference exists. In other words, it may not be possible to successfully conclude the study because the study may lack the power to detect the effect. Conversely, when a study is based on a very large sample, small effect differences may be considered statistically significant and lead to clinical misjudgment. Either way, your research may not be successful for other reasons and the conclusion is a waste of money, time, and resources.
Oversized studies are likely to include more subjects than the study actually needs, exposing unnecessarily many subjects to potentially harmful or futile treatments. Similarly, in undersized studies, ethical issues may arise in that subjects are exposed to unnecessary situations in studies that may have low success rates.
If a negative result is obtained after conducting a study, it is necessary to consider whether the sample size of the study was sufficient or insufficient. First, if the study was conducted with sufficient sample size, it can be interpreted that there is no clinically significant effect. However, if the study is conducted with insufficient sample size, meaningful clinical results with statistically significant differences in practice may be missed. Notice that not being able to reject the null hypothesis does not mean that it is true; it means that we do not have enough evidence to reject it.
Additionally, calculating sample size at the study design stage, when receiving ethics committee approval, has become a requirement rather than an option. As a result, calculating the optimal sample size is an important process that must be done at the design stage before a study is conducted in order to ensure the validity, accuracy, reliability, and scientific and ethical integrity of the study.
Appropriate sample size usually depends on the statistical hypotheses made with the study’s primary outcome and the study design parameters. The basic statistical six concepts that must be considered essential for estimating the sample size are as follows.
There are various research designs [3] in clinical research, but among them, the most commonly used design is the
When establishing statistical hypotheses in research, two hypotheses are always required, which we call the null hypothesis (
The
The hypothesis testing process is as follows: (1) assume that the null hypothesis is true, calculate the test statistic using sample data and (2) decide whether or not to reject the null hypothesis according to the result. That is, we always choose one of the four decisions shown in Table 2 and two types of errors inevitably occur:
Variables to see clinically significant differences may vary, but the most important factor among them should be selected. This is called the primary outcome, and the other measurements are referred to as secondary outcomes. The number of samples is calculated using the primary outcome. At this time, the parameter information of the primary outcome for calculating the sample size can be obtained from prior studies or pilot studies. Both continuous and categorical data can be used as primary outcomes, and the parameters used to calculate
The sample size estimation formula yields the minimum number of subjects required to meet statistical significance for a given hypothesis. However, in an actual study, subject dropout may occur during the study, so in order to satisfy all the number of subjects desired by the researcher, the total number of subjects considering the dropout rate must be calculated so that more subjects can be enrolled. If ‘n’ is the number of samples calculated according to the formula and ‘dr’ is the dropout rate, then the adjusted sample size ‘N’ is given by: N = n / (1 – dr).
Depending on the study design, there are many more considerations in addition to the six concepts mentioned above. Although not considered in the practice below, we would like to mention three points that are frequently mentioned and used in actual clinical research to help researchers.
In clinical trials, available patients, treatment costs, and treatment resources may influence the allocation ratio (k) decision. According to Lachin [8] and van Belle [9], it can be applied as follows.
(1) Calculate the sample size
In the confirmatory trials, there are cases in which interim analysis, whether planned or unplanned, is performed at the research planning stage. When calculating the number of subjects taking this into account, the false positive rate increases with the number of interim analyses, so type I error should be considered.
In survival analysis, the outcome variable is the time until a specific event such as death occurs, and whether or not an event occurs for each subject and the time from the start of the clinical trial to the occurrence of the event (or censoring) are used as outcome variables. In particular, the power of survival analysis is a function of the number of events and generally increases with a shorter period (
However, for studies with relatively low event rates and high censoring, the following sample size formula using only event rates can be used:
Using the 17 tests in Table 3, which are widely used in research, we would like to show an example using an R program version 4.1.2 (R Foundation for Statistical Computing; ‘pwr’, ‘exact2x2’ and ‘WebPower’ [11] packages), one of the free programs. Basically, when using R, you need to install a package that includes the function you want to analyze and then use it. After that, you can use the function you want to use after calling package using the ‘library()’ function. More details will be explained through the example below.
All studies intend to use a parallel group design. A two-tailed test with a significance of 0.05 and a power of 80% was established. The dropout rate is different for each research field, but here we will unify it at 20%. For nonparametric tests on continuous variables, as a rule of thumb [12], calculate the sample size required for parametric tests and add 15%. Effect size can be defined as ‘a standardized measure of the magnitude of the mean difference or relationship between study groups’ [13]. In other words, an index that divides the effect size by its dispersion (standard deviation, etc.) is not affected by the measurement unit and can be used regardless of the unit, and is called an ‘effect size index’ or ‘standardized effect size.’ Cohen intuitively introduced effect sizes as small, medium, and large for easy understanding [14]. However, since the value presented by Cohen may vary depending on the population or distribution of the variable, there may be limitations in using it as an absolute value. When estimating the number of subjects, effect sizes (such as Cohen’s d, r, or the relative ratio, etc.) should be calculated using parameter information (MD and SD) found in the literature relevant to each primary outcome and entered as arguments to the function. Additionally, whether an effect size should be interpreted as small, medium, or large may depend on the analysis method. We use the guidelines mentioned by Cohen [14] and Sawilowsky [15] and use the medium effect size considered for each test in the examples below.
When the primary outcome considered in the study is continuous data, the number of samples can be calculated using the ‘pwr’ package. At this time, you can consider comparing the mean of a single group, two groups, or more than three groups, and Cohen’s d and f will be used for the effect size. When applied to your study, parameters can be taken from a previous or pilot study and calculated using the effect size calculation formula below.
The pwr.t.test() function (Supplementary data 1, Table 1) can be utilized with the ‘type’ argument for (1) one-sample
Assuming a
Assuming a
In the case of paired samples, if there is a correlation coefficient (r) between the variables before and after, it can be calculated as the
A total of 43 was calculated by one-sample
By two-sample
The 43 pairs were calculated by paired
The pwr.anova.test() function (Supplementary data 1, Table 2) can be used in studies that compare averages of three or more groups. In this function, ‘k’ means the number of comparison groups and ‘f’ means the effect size, and Cohen’s f is used here. The detailed calculation formula can be found below. Cohen suggests that f values of 0.1, 0.25, and 0.4 indicate small, medium, and large effect sizes respectively. Also, we will use medium effect size = 0.25.
Assume that the
By one-way ANOVA, 66 people were calculated for each group, and if 15% of each group is additionally considered, a total of 297 people are calculated.
If the primary outcome considered in your study is categorical data, you can use the ‘pwr’ package for parametric tests and the ‘exact2x2’ package for nonparametric tests to calculate the number of samples.
The pwr.p.test() and pwr.2p.test() functions (Supplementary data 1, Table 3) are used when comparing one-sample and two-sample proportions, respectively. Cohen’s h is used here as the effect size. The calculation formula is: h = 2arcsin(
In the one-sample proportion test,
Assuming a
In the chi-square test, which is a commonly used method for measuring the association between categorical data, Cohen’s w is used as a measure of effect size. The pwr.chisq.test() function (Supplementary data 1, Table 3) takes ‘w’ as an argument for effect size and ‘df’ as an argument for degrees of freedom. Assuming that two categorical variables have
Similarly, we assumed a
For nonparametric testing of categorical data, power calculation can be performed using the ss2x2() function [17] (Supplementary data 2, Table 1). Fisher exac
Assuming that the event rate of the control group was 0.2 and that of the treatment group was 0.8, the allocation ratio of each group was set at 1:1. If a two-sided test is performed with a significance level of 0.05 and a power of 80%, 12 samples are calculated for each group. Considering a dropout rate of 20%, 15 subjects are required for each group, for a total of 30 subjects.
Assuming that the event rate of the matched control group was 0.2 and that of the matched case (or treatment) group was 0.8, the allocation ratio of each group was set at 1:1. If a two-sided test is performed with a significance level of 0.05 and a power of 80%, 13 samples are calculated for each group. Considering a dropout rate of 20%, 16 subjects are required for each group, for a total of 32 subjects.
Correlation analysis determines whether there is a linear relationship between two continuous variables. The ‘pwr’ package will be used for this test.
The pwr.r.test() function (Supplementary data 1, Table 5) can be used in correlation analysis. The correlation coefficient (r) is used as a measure of effect size. Cohen suggests that r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively. We will use a medium effect size of 0.3.
Assuming a
Generalized linear models [18] have been formulated as a way to incorporate a variety of other statistical models, including linear regression, logistic regression, and Poisson regression. We will use the ‘pwr’ package for linear regression and the ‘We’ package for logistic/Poisson regression.
The pwr.f2.test() function (Supplementary data 1, Table 6) can be used for multiple linear regression analysis. We will use Cohen’s f2 as the effect size using the R2 value used as a measure of goodness of fit in regression analysis (Cohen’s f2 =
Similarly, we assumed a
The wp.logistict() and wp.poisson() function (Supplementary data 3, Table 1 and 2) can be used for logistic and Poisson regression analysis respectively. The two arguments “family” and “parameter” should contain information about the distribution of the predictor (or risk factor). Default values are used when the information in a predictor is unknown. You can change the parameter value if you know.
If predictor (X) is a continuous variable, it can be used as family = “normal” and the ‘parameter’ is used as default. The way p0 and p1 are calculated can be calculated using the 1SD range of X. You can set p1 to the probability of being in range and p0 to the probability of being out of range. In this example, p0 = 0.15 and p1 = 0.1 were used. Similarly, we assumed a
If predictor (X) is a binary variable, it can be used as family = “bernoulli” and the ‘parameter’ will be used as its default value. For exp0, a base rate of 1 under the null hypothesis was used, and for exp1, expected relative risk = 1.2 was set as the relative increment of the event rate. Similarly, we assumed a
In conclusion, sample size calculation plays the most important role in the research design process before starting the study. In particular, since randomized controlled trial studies, which are frequently conducted in clinical settings, are directly related to cost issues, the number of samples must be carefully calculated. However, although there are various references related to sample size calculation, it can be difficult to correctly use a method suitable for your own study. Of course, it would be better to seek expert advice for more complex studies, but we hope that this article will help researchers calculate the right number of subjects for their own research.
Conceptualization: YHK, HIB, YP
Data curation: SP, YHK, YHK
Formal analysis: SP, HIB
Investigation: SP, HIB
Methodology: SP, YHK
Project administration: YHK, Y.P
Visualization: HIB
Writing–Original Draft: SP, HIB
Writing–Review & Editing: All authors
All authors have no conflicts of interest to declare.
This work was supported by the Soonchunhyang University Research Fund.
Supplementary data 1–3 can be found via https://doi.org/10.7602/jmis.2023.26.1.9.
jmis-26-1-9-supple.pdfTable 1 . Types of hypothesis testing.
Test for | Null hypothesis ( | Alternative hypothesis ( |
---|---|---|
Equality | ||
Equivalence | | | | |
Superiority | ||
Non-inferiority |
Table 2 . Type I and type II error.
True status | Decision | |
---|---|---|
Correct decision | Type I error (α) | |
Type II error (β) | Correct decision |
Table 3 . Tests for calculating sample size.
Test | R package | Function | |||
---|---|---|---|---|---|
No. | Type | # of group | Name | ||
1 | Continuous/Parametric | 1 | One-sample | pwr | pwr.t.test |
2 | 2 | Two-sample | pwr | pwr.t.test | |
3 | 2 | Paired | pwr | pwr.t.test | |
4 | ≥3 | One-way ANOVA | pwr | pwr.anova.test | |
5 | Continuous/Nonparametric | 1 | One-sample Wilcoxon test | pwr | pwr.t.test |
6 | 2 | Mann-Whitney | pwr | pwr.t.test | |
7 | 2 | Paired Wilcoxon test | pwr | pwr.t.test | |
8 | ≥3 | Kruskal-Wallis test | pwr | pwr.anova.test | |
9 | Categorical/Parametric | 1 | One-sample proportion test | pwr | pwr.p.test |
10 | 2 | Two-sample proportion test | pwr | pwr.2p.test | |
11 | - | Chi-square test | pwr | pwr.chisq.test | |
12 | Categorical/Nonparametric | 2 | Fisher exac | exact2x2 | ss2x2 |
13 | 2 | McNemar test | exact2x2 | ss2x2 | |
14 | Correlation analysis | pwr | pwr.r.test | ||
15 | Linear regression | pwr | pwr.f2.test | ||
16 | Logistic regression | WebPower | wp.logistic | ||
17 | Poisson regression | WebPower | wp.poisson |
ANOVA, analysis of variance..