Understanding between-cluster variation in prevalence, and limits for how much variation is plausible

Chatfield, Mark D. and Farewell, Daniel

2021. Understanding between-cluster variation in prevalence, and limits for how much variation is plausible. Statistical Methods in Medical Research 30 (1) , pp. 286-298. 10.1177/0962280220951831

PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (687kB)

Official URL: http://dx.doi.org/10.1177/0962280220951831

Abstract

In clinical trials and observational studies of clustered binary data, understanding between-cluster variation is essential: in sample size and power calculations of cluster randomised trials, for example, the intra-cluster correlation coefficient is often specified. However, quantifications of between-cluster variation can be unintuitive, and an intra-cluster correlation coefficient as low as 0.04 may correspond to surprisingly large between-cluster differences. We suggest that understanding is improved through visualising the implied distribution of true cluster prevalences – possibly by assuming they follow a beta distribution – or by calculating their standard deviation, which is more readily interpretable than the intra-cluster correlation coefficient. Even so, the bounded nature of binary data complicates the interpretation of variances as primary measures of uncertainty, and entropy offers an attractive alternative. Appealing to maximum entropy theory, we propose the following rule of thumb: that plausible intra-cluster correlation coefficients and standard deviations of true cluster prevalences are both bounded above by the overall prevalence, its complement, and one third. We also provide corresponding bounds for the coefficient of variation, and for a different standard deviation and intra-cluster correlation defined on the log odds scale. Using previously published data, we observe the quantities defined on the log odds scale to be more transportable between studies with different outcomes with different prevalences than the intra-cluster correlation and coefficient of variation. The latter increase and decrease, respectively, as prevalence increases from 0% to 50%, and the same is true for our bounds. Our work will help clinical trialists better understand between-cluster variation and avoid specifying implausibly high values for the intra-cluster correlation in sample size and power calculations.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Medicine
Publisher:	SAGE
ISSN:	0962-2802
Date of First Compliant Deposit:	7 August 2020
Date of Acceptance:	26 July 2020
Last Modified:	29 Nov 2024 18:15
URI:	https://orca.cardiff.ac.uk/id/eprint/134043

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)