El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica
Resumen
Resumen
El cálculo de tamaño muestral es uno de los aspectos más importantes en la planificación de la mayoría de las investigaciones, pudiendo derivar una muestra insuficiente a una inutilidad de la investigación en sí misma. Tradicionalmente se han utilizado los cálculos de tamaño muestral basados en potencia, pero actualmente se han empezado implementar los cálculos basados en precisión. En el presente escrito se presentan una serie de recomendaciones para cálculos para ensayos clínicos aleatorizados, modelos de regresión lineal y logística múltiples, análisis de reproducibilidad y de modelos predictivos multivariables, junto con algunos ejemplos prácticos de su implementación, así como algunas consideraciones con respecto a realización y utilización de datos de estudios piloto a la hora de planificar un cálculo de tamaño muestral.
Abstract
Sample size calculation is one of the most important aspects in the planning of most research, and an insufficient sample can lead to the uselessness of the research itself. Traditionally, power-based sample size calculations have been used, but now precision-based calculations have begun to be implemented. This paper presents recommendations for calculations for randomised clinical trials, multiple linear and logistic regression models, reproducibility analysis, and multivariable predictive models, along with some practical examples of their implementation, as well as some considerations regarding the development and use of pilot study data when planning a sample size calculation.
.
Cómo citar
Referencias
Algina, J., & Olejnik, S. (2000). Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient. Multivariate Behavioral Research, 35(1), 119–137. https://doi.org/10.1207/S15327906MBR3501_5
Arienti, C., Armijo-Olivo, S., Minozzi, S., Tjosvold, L., Lazzarini, S. G., Patrini, M., & Negrini, S. (2021). Methodological Issues in Rehabilitation Research: A Scoping Review. Archives of Physical Medicine and Rehabilitation, 102(8), 1614-1622.e14. https://doi.org/10.1016/J.APMR.2021.04.006
Austin, P. C., & Steyerberg, E. W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 68(6), 627–636. https://doi.org/10.1016/J.JCLINEPI.2014.12.014
Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Statistics in Medicine, 25(2), 233–245. https://doi.org/10.1002/SIM.2231
Beal, S. (1991). Response to “Confidence intervals and sample sizes.” Biometrics, 47(4), 1602–1603.
Beal, S. L. (1989). Sample Size Determination for Confidence Intervals on the Population Mean and on the Difference Between Two Population Means. Biometrics, 45(3), 969. https://doi.org/10.2307/2531696
Bell, M. L., Whitehead, A. L., & Julious, S. A. (2018). Guidance for using pilot studies to inform the design of intervention trials with continuous outcomes. Clinical Epidemiology, 10, 153–157. https://doi.org/10.2147/CLEP.S146397
Bland, j. M., & Altman, D. G. (1995). Multiple significance tests: the Bonferroni method. BMJ, 310(6973), 170. https://doi.org/10.1136/BMJ.310.6973.170
Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size? BMJ (Clinical Research Ed.), 339(7730), 1133–1135. https://doi.org/10.1136/BMJ.B3985
Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine, 21(9), 1331–1335. https://doi.org/10.1002/sim.1108
Borm, G. F., Fransen, J., & Lemmens, W. A. J. G. (2007). A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of Clinical Epidemiology, 60(12), 1234–1238. https://doi.org/10.1016/J.JCLINEPI.2007.02.006
Browne, R. H. (1995). On the use of a pilot sample for sample size determination. Statistics in Medicine, 14(17), 1933–1940. https://doi.org/10.1002/SIM.4780141709
Cantor, A. B. (1996). Sample-Size Calculations for Cohen’s Kappa. Psychological Methods, 1(2), 150–153.
Cocks, K., & Torgerson, D. J. (2013). Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology, 66(2), 197–201. https://doi.org/10.1016/J.JCLINEPI.2012.09.002
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Cohen, J. F., Korevaar, D. A., Altman, D. G., Bruns, D. E., Gatsonis, C. A., Hooft, L., Irwig, L., Levine, D., Reitsma, J. B., De Vet, H. C. W., & Bossuyt, P. M. M. (2016). STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open, 6(11), e012799. https://doi.org/10.1136/BMJOPEN-2016-012799
Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Medicine, 13(1), 1–10. https://doi.org/10.1186/S12916-014-0241-Z/TABLES/1
Cook, J. A., Julious, S. A., Sones, W., Hampson, L. V., Hewitt, C., Berlin, J. A., Ashby, D., Emsley, R., Fergusson, D. A., Walters, S. J., Wilson, E. C. F., Maclennan, G., Stallard, N., Rothwell, J. C., Bland, M., Brown, L., Ramsay, C. R., Cook, A., Armstrong, D., … Vale, L. D. (2018). DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. Trials, 19(1). https://doi.org/10.1186/S13063-018-2884-0
Copsey, B., Thompson, J. Y., Vadher, K., Ali, U., Dutton, S. J., Fitzpatrick, R., Lamb, S. E., & Cook, J. A. (2018). Sample size calculations are poorly conducted and reported in many randomized trials of hip and knee osteoarthritis: results of a systematic review. Journal of Clinical Epidemiology, 104, 52–61. https://doi.org/10.1016/J.JCLINEPI.2018.08.013
Dechartres, A., Trinquart, L., Boutron, I., & Ravaud, P. (2013). Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ (Clinical Research Ed.), 346(7908). https://doi.org/10.1136/BMJ.F2304
Eldridge, S. M., Chan, C. L., Campbell, M. J., Bond, C. M., Hopewell, S., Thabane, L., Lancaster, G. A., Altman, D., Bretz, F., Campbell, M., Cobo, E., Craig, P., Davidson, P., Groves, T., Gumedze, F., Hewison, J., Hirst, A., Hoddinott, P., Lamb, S. E., … Tugwell, P. (2016). CONSORT 2010 statement: extension to randomised pilot and feasibility trials. BMJ, 355. https://doi.org/10.1136/BMJ.I5239
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE.
Gardner, M. J., & Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal (Clinical Research Ed.), 292(6522), 746. https://doi.org/10.1136/BMJ.292.6522.746
Gonzalez, G. Z., Moseley, A. M., Maher, C. G., Nascimento, D. P., Costa, L. da C. M., & Costa, L. O. (2018). Methodologic Quality and Statistical Reporting of Physical Therapy Randomized Controlled Trials Relevant to Musculoskeletal Conditions. Archives of Physical Medicine and Rehabilitation, 99(1), 129–136. https://doi.org/10.1016/J.APMR.2017.08.485
Grieve, A. (1989). Confidence intervals and trial sizes (Letter). Lancet, i, 337.
Grieve, A. (1991). Confidence intervals and sample sizes. Biometrics, 47(4), 1597–1603. https://doi.org/https://doi.org/10.2307/2532411
Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600
Gwet, K. L. (2021a). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 1: Analysis of Categorical Ratings (5th ed.). AgreeStat Analytics.
Gwet, K. L. (2021b). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 2: Analysis of Quantitative Ratings (5th ed.). AgreeStat Analytics.
Harrell, F. E. (2001). Regression modeling strategies. Springer-Verlag.
Haynes, A. G., Lenz, A., Stalder, O., & Limacher, A. (2021). `presize`: An R-package for precision-based sample size calculation in clinical research. Journal of Open Source Software, 6(60), 3118. https://doi.org/10.21105/JOSS.03118
Hingorani, A. D., Van Der Windt, D. A., Riley, R. D., Abrams, K., Moons, K. G. M., Steyerberg, E. W., Schroter, S., Sauerbrei, W., Altman, D. G., Hemingway, H., Briggs, A., Brunner, N., Croft, P., Hayden, J., Kyzas, P., Malats, N., Peat, G., Perel, P., Roberts, I., & Timmis, A. (2013). Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ, 346. https://doi.org/10.1136/BMJ.E5793
Hsieh, F., Bloch, D., & Larsen, M. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–1634.
Jan, S. L., & Shieh, G. (2018). The Bland-Altman range of agreement: Exact interval procedure and sample size determination. Computers in Biology and Medicine, 100, 247–252. https://doi.org/10.1016/J.COMPBIOMED.2018.06.020
Julious, S. A., & Owen, R. J. (2006). Sample size calculations for clinical studies allowing for uncertainty about the variance. Pharmaceutical Statistics, 5(1), 29–37. https://doi.org/10.1002/PST.197
Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: an R package. Behavior Research Methods, 39(4), 979–984. https://doi.org/10.3758/BF03192993
Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321. https://doi.org/10.1037/1082-989X.8.3.305
Kent, D. M., Paulus, J. K., Van Klaveren, D., D’Agostino, R., Goodman, S., Hayward, R., Ioannidis, J. P. A., Patrick-Lake, B., Morton, S., Pencina, M., Raman, G., Ross, J. S., Selker, H. P., Varadhan, R., Vickers, A., Wong, J. B., & Steyerberg, E. W. (2020). The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Annals of Internal Medicine, 172(1), 35–45. https://doi.org/10.7326/M18-3667
Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B. J., Hróbjartsson, A., Roberts, C., Shoukri, M., & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64(1), 96–106. https://doi.org/10.1016/j.jclinepi.2010.03.002
Lai, K., & Kelley, K. (2012). Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: sample size planning via narrow confidence intervals. The British Journal of Mathematical and Statistical Psychology, 65(2), 350–370. https://doi.org/10.1111/J.2044-8317.2011.02029.X
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(NOV). https://doi.org/10.3389/fpsyg.2013.00863
Liu, S., & Luo, J. (2010). A Study on the Current Development of Body Shape during Infancy in Shanghai. In Jiang, Y and Zou, YL and Zhang, JG and Chen, JQ (Ed.), PROCEEDINGS OF THE 2010 INTERNATIONAL SYMPOSIUM ON CHILDREN AND YOUTH FITNESS AND HEALTH, VOL 1 (pp. 256–259).
Liu, X. S. (2010). Sample Size for Confidence Interval of Covariate-Adjusted Mean Difference. Http://Dx.Doi.Org/10.3102/1076998610381401, 35(6), 714–725. https://doi.org/10.3102/1076998610381401
Moons, K. G. M., Altman, D. G., Vergouwe, Y., & Royston, P. (2009). Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ, 338(7709), 1487–1490. https://doi.org/10.1136/BMJ.B606
Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E., & Altman, D. G. (2009). Prognosis and prognostic research: what, why, and how? BMJ, 338(7706), 1317–1320. https://doi.org/10.1136/BMJ.B375
Pan, H., Liu, S., Miao, D., & Yuan, Y. (2018). Sample size determination for mediation analysis of longitudinal data. BMC Medical Research Methodology, 18(1), 1–11. https://doi.org/10.1186/S12874-018-0473-2/FIGURES/3
Pate, A., Riley, R. D., Collins, G. S., van Smeden, M., Van Calster, B., Ensor, J., & Martin, G. P. (2023). Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Statistical Methods in Medical Research, 32(3). https://doi.org/10.1177/09622802231151220
Riley, R. D., Ensor, J., Snell, K. I. E., Harrell, F. E., Martin, G. P., Reitsma, J. B., Moons, K. G. M., Collins, G., & Van Smeden, M. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ (Clinical Research Ed.), 368. https://doi.org/10.1136/BMJ.M441
Riley, R. D., Hayden, J. A., Steyerberg, E. W., Moons, K. G. M., Abrams, K., Kyzas, P. A., Malats, N., Briggs, A., Schroter, S., Altman, D. G., & Hemingway, H. (2013). Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001380
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019a). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019b). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993
Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019c). Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Statistics in Medicine, 38(7), 1276–1296. https://doi.org/10.1002/SIM.7992
Rothman, K. J., & Greenland, S. (2018). Planning Study Size Based on Precision Rather Than Power. Epidemiology (Cambridge, Mass.), 29(5), 599–603. https://doi.org/10.1097/EDE.0000000000000876
Royston, P., Moons, K. G. M., Altman, D. G., & Vergouwe, Y. (2009). Prognosis and prognostic research: Developing a prognostic model. BMJ, 338(7707), 1373–1377. https://doi.org/10.1136/BMJ.B604
Saito, Y., Sozu, T., Hamada, C., & Yoshimura, I. (2006). Effective number of subjects and number of raters for inter-rater reliability studies. Statistics in Medicine, 25(9), 1547–1560. https://doi.org/10.1002/SIM.2294
Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 31(3), 699–714. https://doi.org/10.1177/001316447103100310/ASSET/001316447103100310.FP.PNG_V03
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068
Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ (Online), 340(7748), 698–702. https://doi.org/10.1136/bmj.c332
Shieh, G. (2009). Detection of interactions between a dichotomous moderator and a continuous predictor in moderated multiple regression with heterogeneous error variance. Behavior Research Methods, 41(1), 61–74. https://doi.org/10.3758/BRM.41.1.61
Shieh, G. (2010). Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables. Behavior Research Methods, 42(3), 824–835. https://doi.org/10.3758/BRM.42.3.824
Shieh, G. (2018). Power and sample size calculations for comparison of two regression lines with heterogeneous variances. PLoS ONE, 13(12). https://doi.org/10.1371/JOURNAL.PONE.0207745
Sim, J. (2019). Should treatment effects be estimated in pilot and feasibility studies? Pilot and Feasibility Studies, 5(1). https://doi.org/10.1186/S40814-019-0493-7
Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/ptj/85.3.257
Steyerberg, E. W., Moons, K. G. M., van der Windt, D. A., Hayden, J. A., Perel, P., Schroter, S., Riley, R. D., Hemingway, H., & Altman, D. G. (2013). Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001381
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 1–13. https://doi.org/10.1186/1745-6215-15-264/FIGURES/8
Van Smeden, M., De Groot, J. A. H., Moons, K. G. M., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2016). No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Medical Research Methodology, 16(1), 1–12. https://doi.org/10.1186/S12874-016-0267-3/TABLES/4
van Smeden, M., Moons, K. G. M., de Groot, J. A. H., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2019). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical Methods in Medical Research, 28(8), 2455–2474. https://doi.org/10.1177/0962280218784726/ASSET/IMAGES/LARGE/10.1177_0962280218784726-FIG4.JPEG
Vandenbroucke, J. P., von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., Poole, C., Schlesselman, J. J., & Egger, M. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). Epidemiology, 18(6), 805–835. https://doi.org/10.1097/EDE.0b013e3181577511
Vickers, A. J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: A simulation study. BMC Medical Research Methodology, 1(1), 1–4. https://doi.org/10.1186/1471-2288-1-6/TABLES/1
Vickers, A. J. (2003). Underpowering in randomized trials reporting a sample size calculation. Journal of Clinical Epidemiology, 56(8), 717–720. https://doi.org/10.1016/S0895-4356(03)00141-0
Vickers, A. J., & Altman, D. G. (2001). Statistics Notes: Analysing controlled trials with baseline and follow up measurements. BMJ : British Medical Journal, 323(7321), 1123. https://doi.org/10.1136/BMJ.323.7321.1123
Walter, S., & Donner A, M. E. (1998). Sample size and optimal designs for reliability studies. Stat Med, 17(1), 101–110.
Walters, S. J., Jacques, R. M., Henriques-Cadby, I. B. D. A., Candlish, J., Totton, N., & Shu Xian, M. T. (2019). Sample size estimation for randomised controlled trials with repeated assessment of patient-reported outcomes: what correlation between baseline and follow-up outcomes should we assume? Trials, 20(1), 566. https://doi.org/10.1186/S13063-019-3671-2
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. In American Statistician (Vol. 70, Issue 2, pp. 129–133). American Statistical Association. https://doi.org/10.1080/00031305.2016.1154108
Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19(1), 231–240. https://doi.org/10.1519/15184.1
Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057–1073. https://doi.org/10.1177/0962280215588241
World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. (2013). JAMA, 310(20), 2191–2194. https://doi.org/10.1001/JAMA.2013.281053
Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in Medicine, 31(29), 3972–3981. https://doi.org/10.1002/sim.5466