Data-driven methods distort optimal cutoffs and accuracy estimates of depression screening tools: a simulation study using individual participant data is a research paper published in Journal of Clinical Epidemiology (2021). On theSindex it has a DataRank of 0.889. It has been cited 20 times, with 13 citing works in its 1-hop citation network.
ObjectiveTo evaluate, across multiple sample sizes, the degree that data-driven methods result in (1) optimal cutoffs different from population optimal cutoff and (2) bias in accuracy estimates.Study design and settingA total of 1,000 samples of sample size 100, 200, 500 and 1,000 each were randomly drawn to simulate studies of different sample sizes from a database (n = 13,255) synthesized to assess Edinburgh Postnatal Depression Scale (EPDS) screening accuracy. Optimal cutoffs were selected by maximizing Youden's J (sensitivity+specificity-1). Optimal cutoffs and accuracy estimates in simulated samples were compared to population values.ResultsOptimal cutoffs in simulated samples ranged from ≥ 5 to ≥ 17 for n = 100, ≥ 6 to ≥ 16 for n = 200, ≥ 6 to ≥ 14 for n = 500, and ≥ 8 to ≥ 13 for n = 1,000. Percentage of simulated samples identifying the population optimal cutoff (≥ 11) was 30% for n = 100, 35% for n = 200, 53% for n = 500, and 71% for n = 1,000. Mean overestimation of sensitivity and underestimation of specificity were 6.5 percentage point (pp) and -1.3 pp for n = 100, 4.2 pp and -1.1 pp for n = 200, 1.8 pp and -1.0 pp for n = 500, and 1.4 pp and -1.0 pp for n = 1,000.ConclusionsSmall accuracy studies may identify inaccurate optimal cutoff and overstate accuracy estimates with data-driven methods.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Base Score Contribution
0.457
From this paper's citation signal
Citation Network Contribution
0.432
From 10 citing papers with measurable signal
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 51% comes from its base citations and 49% from the citation network (10 citing papers contributed measurable signal).
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.