Metodologí­a de Investigación
9 agosto 2023

El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica

Ruben Fernandez-Matias
a:1:{s:5:"es_ES";s:86:"Unidad de Investigación, Hospital Universitario Fundación Alcorcón, Madrid, España";}
Tamaño de la MuestraEstadísticaMetodologíaSample SizeStatisticsMethodology
Vol. 5 Núm. 1 (2023): Junio

  Métricas

Resumen

Resumen

El cálculo de tamaño muestral es uno de los aspectos más importantes en la planificación de la mayoría de las investigaciones, pudiendo derivar una muestra insuficiente a una inutilidad de la investigación en sí misma. Tradicionalmente se han utilizado los cálculos de tamaño muestral basados en potencia, pero actualmente se han empezado implementar los cálculos basados en precisión. En el presente escrito se presentan una serie de recomendaciones para cálculos para ensayos clínicos aleatorizados, modelos de regresión lineal y logística múltiples, análisis de reproducibilidad y de modelos predictivos multivariables, junto con algunos ejemplos prácticos de su implementación, así como algunas consideraciones con respecto a realización y utilización de datos de estudios piloto a la hora de planificar un cálculo de tamaño muestral.

Abstract

Sample size calculation is one of the most important aspects in the planning of most research, and an insufficient sample can lead to the uselessness of the research itself. Traditionally, power-based sample size calculations have been used, but now precision-based calculations have begun to be implemented. This paper presents recommendations for calculations for randomised clinical trials, multiple linear and logistic regression models, reproducibility analysis, and multivariable predictive models, along with some practical examples of their implementation, as well as some considerations regarding the development and use of pilot study data when planning a sample size calculation.

.

  Cómo citar

1.
Fernandez-Matias R. El Cálculo del Tamaño Muestral en Ciencias de la Salud: Recomendaciones y Guía Práctica. MOVE [Internet]. 9 de agosto de 2023 [citado 22 de septiembre de 2023];5(1):481-503. Disponible en: https://jomts.com/index.php/MOVE/article/view/915
  

  Referencias

Algina, J., & Olejnik, S. (2000). Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient. Multivariate Behavioral Research, 35(1), 119–137. https://doi.org/10.1207/S15327906MBR3501_5

Arienti, C., Armijo-Olivo, S., Minozzi, S., Tjosvold, L., Lazzarini, S. G., Patrini, M., & Negrini, S. (2021). Methodological Issues in Rehabilitation Research: A Scoping Review. Archives of Physical Medicine and Rehabilitation, 102(8), 1614-1622.e14. https://doi.org/10.1016/J.APMR.2021.04.006

Austin, P. C., & Steyerberg, E. W. (2015). The number of subjects per variable required in linear regression analyses. Journal of Clinical Epidemiology, 68(6), 627–636. https://doi.org/10.1016/J.JCLINEPI.2014.12.014

Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Statistics in Medicine, 25(2), 233–245. https://doi.org/10.1002/SIM.2231

Beal, S. (1991). Response to “Confidence intervals and sample sizes.” Biometrics, 47(4), 1602–1603.

Beal, S. L. (1989). Sample Size Determination for Confidence Intervals on the Population Mean and on the Difference Between Two Population Means. Biometrics, 45(3), 969. https://doi.org/10.2307/2531696

Bell, M. L., Whitehead, A. L., & Julious, S. A. (2018). Guidance for using pilot studies to inform the design of intervention trials with continuous outcomes. Clinical Epidemiology, 10, 153–157. https://doi.org/10.2147/CLEP.S146397

Bland, j. M., & Altman, D. G. (1995). Multiple significance tests: the Bonferroni method. BMJ, 310(6973), 170. https://doi.org/10.1136/BMJ.310.6973.170

Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size? BMJ (Clinical Research Ed.), 339(7730), 1133–1135. https://doi.org/10.1136/BMJ.B3985

Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine, 21(9), 1331–1335. https://doi.org/10.1002/sim.1108

Borm, G. F., Fransen, J., & Lemmens, W. A. J. G. (2007). A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of Clinical Epidemiology, 60(12), 1234–1238. https://doi.org/10.1016/J.JCLINEPI.2007.02.006

Browne, R. H. (1995). On the use of a pilot sample for sample size determination. Statistics in Medicine, 14(17), 1933–1940. https://doi.org/10.1002/SIM.4780141709

Cantor, A. B. (1996). Sample-Size Calculations for Cohen’s Kappa. Psychological Methods, 1(2), 150–153.

Cocks, K., & Torgerson, D. J. (2013). Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology, 66(2), 197–201. https://doi.org/10.1016/J.JCLINEPI.2012.09.002

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

Cohen, J. F., Korevaar, D. A., Altman, D. G., Bruns, D. E., Gatsonis, C. A., Hooft, L., Irwig, L., Levine, D., Reitsma, J. B., De Vet, H. C. W., & Bossuyt, P. M. M. (2016). STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open, 6(11), e012799. https://doi.org/10.1136/BMJOPEN-2016-012799

Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Medicine, 13(1), 1–10. https://doi.org/10.1186/S12916-014-0241-Z/TABLES/1

Cook, J. A., Julious, S. A., Sones, W., Hampson, L. V., Hewitt, C., Berlin, J. A., Ashby, D., Emsley, R., Fergusson, D. A., Walters, S. J., Wilson, E. C. F., Maclennan, G., Stallard, N., Rothwell, J. C., Bland, M., Brown, L., Ramsay, C. R., Cook, A., Armstrong, D., … Vale, L. D. (2018). DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. Trials, 19(1). https://doi.org/10.1186/S13063-018-2884-0

Copsey, B., Thompson, J. Y., Vadher, K., Ali, U., Dutton, S. J., Fitzpatrick, R., Lamb, S. E., & Cook, J. A. (2018). Sample size calculations are poorly conducted and reported in many randomized trials of hip and knee osteoarthritis: results of a systematic review. Journal of Clinical Epidemiology, 104, 52–61. https://doi.org/10.1016/J.JCLINEPI.2018.08.013

Dechartres, A., Trinquart, L., Boutron, I., & Ravaud, P. (2013). Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ (Clinical Research Ed.), 346(7908). https://doi.org/10.1136/BMJ.F2304

Eldridge, S. M., Chan, C. L., Campbell, M. J., Bond, C. M., Hopewell, S., Thabane, L., Lancaster, G. A., Altman, D., Bretz, F., Campbell, M., Cobo, E., Craig, P., Davidson, P., Groves, T., Gumedze, F., Hewison, J., Hirst, A., Hoddinott, P., Lamb, S. E., … Tugwell, P. (2016). CONSORT 2010 statement: extension to randomised pilot and feasibility trials. BMJ, 355. https://doi.org/10.1136/BMJ.I5239

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE.

Gardner, M. J., & Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal (Clinical Research Ed.), 292(6522), 746. https://doi.org/10.1136/BMJ.292.6522.746

Gonzalez, G. Z., Moseley, A. M., Maher, C. G., Nascimento, D. P., Costa, L. da C. M., & Costa, L. O. (2018). Methodologic Quality and Statistical Reporting of Physical Therapy Randomized Controlled Trials Relevant to Musculoskeletal Conditions. Archives of Physical Medicine and Rehabilitation, 99(1), 129–136. https://doi.org/10.1016/J.APMR.2017.08.485

Grieve, A. (1989). Confidence intervals and trial sizes (Letter). Lancet, i, 337.

Grieve, A. (1991). Confidence intervals and sample sizes. Biometrics, 47(4), 1597–1603. https://doi.org/https://doi.org/10.2307/2532411

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600

Gwet, K. L. (2021a). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 1: Analysis of Categorical Ratings (5th ed.). AgreeStat Analytics.

Gwet, K. L. (2021b). Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Volume 2: Analysis of Quantitative Ratings (5th ed.). AgreeStat Analytics.

Harrell, F. E. (2001). Regression modeling strategies. Springer-Verlag.

Haynes, A. G., Lenz, A., Stalder, O., & Limacher, A. (2021). `presize`: An R-package for precision-based sample size calculation in clinical research. Journal of Open Source Software, 6(60), 3118. https://doi.org/10.21105/JOSS.03118

Hingorani, A. D., Van Der Windt, D. A., Riley, R. D., Abrams, K., Moons, K. G. M., Steyerberg, E. W., Schroter, S., Sauerbrei, W., Altman, D. G., Hemingway, H., Briggs, A., Brunner, N., Croft, P., Hayden, J., Kyzas, P., Malats, N., Peat, G., Perel, P., Roberts, I., & Timmis, A. (2013). Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ, 346. https://doi.org/10.1136/BMJ.E5793

Hsieh, F., Bloch, D., & Larsen, M. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–1634.

Jan, S. L., & Shieh, G. (2018). The Bland-Altman range of agreement: Exact interval procedure and sample size determination. Computers in Biology and Medicine, 100, 247–252. https://doi.org/10.1016/J.COMPBIOMED.2018.06.020

Julious, S. A., & Owen, R. J. (2006). Sample size calculations for clinical studies allowing for uncertainty about the variance. Pharmaceutical Statistics, 5(1), 29–37. https://doi.org/10.1002/PST.197

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: an R package. Behavior Research Methods, 39(4), 979–984. https://doi.org/10.3758/BF03192993

Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321. https://doi.org/10.1037/1082-989X.8.3.305

Kent, D. M., Paulus, J. K., Van Klaveren, D., D’Agostino, R., Goodman, S., Hayward, R., Ioannidis, J. P. A., Patrick-Lake, B., Morton, S., Pencina, M., Raman, G., Ross, J. S., Selker, H. P., Varadhan, R., Vickers, A., Wong, J. B., & Steyerberg, E. W. (2020). The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Annals of Internal Medicine, 172(1), 35–45. https://doi.org/10.7326/M18-3667

Kottner, J., Audigé, L., Brorson, S., Donner, A., Gajewski, B. J., Hróbjartsson, A., Roberts, C., Shoukri, M., & Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64(1), 96–106. https://doi.org/10.1016/j.jclinepi.2010.03.002

Lai, K., & Kelley, K. (2012). Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: sample size planning via narrow confidence intervals. The British Journal of Mathematical and Statistical Psychology, 65(2), 350–370. https://doi.org/10.1111/J.2044-8317.2011.02029.X

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(NOV). https://doi.org/10.3389/fpsyg.2013.00863

Liu, S., & Luo, J. (2010). A Study on the Current Development of Body Shape during Infancy in Shanghai. In Jiang, Y and Zou, YL and Zhang, JG and Chen, JQ (Ed.), PROCEEDINGS OF THE 2010 INTERNATIONAL SYMPOSIUM ON CHILDREN AND YOUTH FITNESS AND HEALTH, VOL 1 (pp. 256–259).

Liu, X. S. (2010). Sample Size for Confidence Interval of Covariate-Adjusted Mean Difference. Http://Dx.Doi.Org/10.3102/1076998610381401, 35(6), 714–725. https://doi.org/10.3102/1076998610381401

Moons, K. G. M., Altman, D. G., Vergouwe, Y., & Royston, P. (2009). Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ, 338(7709), 1487–1490. https://doi.org/10.1136/BMJ.B606

Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E., & Altman, D. G. (2009). Prognosis and prognostic research: what, why, and how? BMJ, 338(7706), 1317–1320. https://doi.org/10.1136/BMJ.B375

Pan, H., Liu, S., Miao, D., & Yuan, Y. (2018). Sample size determination for mediation analysis of longitudinal data. BMC Medical Research Methodology, 18(1), 1–11. https://doi.org/10.1186/S12874-018-0473-2/FIGURES/3

Pate, A., Riley, R. D., Collins, G. S., van Smeden, M., Van Calster, B., Ensor, J., & Martin, G. P. (2023). Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Statistical Methods in Medical Research, 32(3). https://doi.org/10.1177/09622802231151220

Riley, R. D., Ensor, J., Snell, K. I. E., Harrell, F. E., Martin, G. P., Reitsma, J. B., Moons, K. G. M., Collins, G., & Van Smeden, M. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ (Clinical Research Ed.), 368. https://doi.org/10.1136/BMJ.M441

Riley, R. D., Hayden, J. A., Steyerberg, E. W., Moons, K. G. M., Abrams, K., Kyzas, P. A., Malats, N., Briggs, A., Schroter, S., Altman, D. G., & Hemingway, H. (2013). Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001380

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019a). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019b). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Statistics in Medicine, 38(7), 1262–1275. https://doi.org/10.1002/SIM.7993

Riley, R. D., Snell, K. I. E., Ensor, J., Burke, D. L., Harrell, F. E., Moons, K. G. M., & Collins, G. S. (2019c). Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Statistics in Medicine, 38(7), 1276–1296. https://doi.org/10.1002/SIM.7992

Rothman, K. J., & Greenland, S. (2018). Planning Study Size Based on Precision Rather Than Power. Epidemiology (Cambridge, Mass.), 29(5), 599–603. https://doi.org/10.1097/EDE.0000000000000876

Royston, P., Moons, K. G. M., Altman, D. G., & Vergouwe, Y. (2009). Prognosis and prognostic research: Developing a prognostic model. BMJ, 338(7707), 1373–1377. https://doi.org/10.1136/BMJ.B604

Saito, Y., Sozu, T., Hamada, C., & Yoshimura, I. (2006). Effective number of subjects and number of raters for inter-rater reliability studies. Statistics in Medicine, 25(9), 1547–1560. https://doi.org/10.1002/SIM.2294

Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 31(3), 699–714. https://doi.org/10.1177/001316447103100310/ASSET/001316447103100310.FP.PNG_V03

Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068

Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMJ (Online), 340(7748), 698–702. https://doi.org/10.1136/bmj.c332

Shieh, G. (2009). Detection of interactions between a dichotomous moderator and a continuous predictor in moderated multiple regression with heterogeneous error variance. Behavior Research Methods, 41(1), 61–74. https://doi.org/10.3758/BRM.41.1.61

Shieh, G. (2010). Sample size determination for confidence intervals of interaction effects in moderated multiple regression with continuous predictor and moderator variables. Behavior Research Methods, 42(3), 824–835. https://doi.org/10.3758/BRM.42.3.824

Shieh, G. (2018). Power and sample size calculations for comparison of two regression lines with heterogeneous variances. PLoS ONE, 13(12). https://doi.org/10.1371/JOURNAL.PONE.0207745

Sim, J. (2019). Should treatment effects be estimated in pilot and feasibility studies? Pilot and Feasibility Studies, 5(1). https://doi.org/10.1186/S40814-019-0493-7

Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/ptj/85.3.257

Steyerberg, E. W., Moons, K. G. M., van der Windt, D. A., Hayden, J. A., Perel, P., Schroter, S., Riley, R. D., Hemingway, H., & Altman, D. G. (2013). Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Medicine, 10(2). https://doi.org/10.1371/JOURNAL.PMED.1001381

Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 1–13. https://doi.org/10.1186/1745-6215-15-264/FIGURES/8

Van Smeden, M., De Groot, J. A. H., Moons, K. G. M., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2016). No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Medical Research Methodology, 16(1), 1–12. https://doi.org/10.1186/S12874-016-0267-3/TABLES/4

van Smeden, M., Moons, K. G. M., de Groot, J. A. H., Collins, G. S., Altman, D. G., Eijkemans, M. J. C., & Reitsma, J. B. (2019). Sample size for binary logistic prediction models: Beyond events per variable criteria. Statistical Methods in Medical Research, 28(8), 2455–2474. https://doi.org/10.1177/0962280218784726/ASSET/IMAGES/LARGE/10.1177_0962280218784726-FIG4.JPEG

Vandenbroucke, J. P., von Elm, E., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., Poole, C., Schlesselman, J. J., & Egger, M. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE). Epidemiology, 18(6), 805–835. https://doi.org/10.1097/EDE.0b013e3181577511

Vickers, A. J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: A simulation study. BMC Medical Research Methodology, 1(1), 1–4. https://doi.org/10.1186/1471-2288-1-6/TABLES/1

Vickers, A. J. (2003). Underpowering in randomized trials reporting a sample size calculation. Journal of Clinical Epidemiology, 56(8), 717–720. https://doi.org/10.1016/S0895-4356(03)00141-0

Vickers, A. J., & Altman, D. G. (2001). Statistics Notes: Analysing controlled trials with baseline and follow up measurements. BMJ : British Medical Journal, 323(7321), 1123. https://doi.org/10.1136/BMJ.323.7321.1123

Walter, S., & Donner A, M. E. (1998). Sample size and optimal designs for reliability studies. Stat Med, 17(1), 101–110.

Walters, S. J., Jacques, R. M., Henriques-Cadby, I. B. D. A., Candlish, J., Totton, N., & Shu Xian, M. T. (2019). Sample size estimation for randomised controlled trials with repeated assessment of patient-reported outcomes: what correlation between baseline and follow-up outcomes should we assume? Trials, 20(1), 566. https://doi.org/10.1186/S13063-019-3671-2

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. In American Statistician (Vol. 70, Issue 2, pp. 129–133). American Statistical Association. https://doi.org/10.1080/00031305.2016.1154108

Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19(1), 231–240. https://doi.org/10.1519/15184.1

Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057–1073. https://doi.org/10.1177/0962280215588241

World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. (2013). JAMA, 310(20), 2191–2194. https://doi.org/10.1001/JAMA.2013.281053

Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in Medicine, 31(29), 3972–3981. https://doi.org/10.1002/sim.5466