Applied usage and performance of statistical matching in bibliometrics: The comparison of milestone and regular papers with multiple measurements of disruptiveness as an empirical example is a research paper published in Quantitative Science Studies (2021). On theSindex it has a DataRank of 0.434. It has been cited 17 times.
Abstract Controlling for confounding factors is one of the central aspects of quantitative research. Although methods such as linear regression models are common, their results can be misleading under certain conditions. We demonstrate how statistical matching can be utilized as an alternative that enables the inspection of post-matching balancing. This contribution serves as an empirical demonstration of matching in bibliometrics and discusses the advantages and potential pitfalls. We propose matching as an easy-to-use approach in bibliometrics to estimate effects and remove bias. To exemplify matching, we use data about papers published in Physical Review E and a selection classified as milestone papers. We analyze whether milestone papers score higher in terms of a proposed class of indicators for measuring disruptiveness than nonmilestone papers. We consider disruption indicators DI1, DI5, DI1n, DI5n, and DEP and test which of the disruption indicators performs best, based on the assumption that milestone papers should have higher disruption indicator values than nonmilestone papers. Four matching algorithms (propensity score matching (PSM), coarsened exact matching (CEM), entropy balancing (EB), and inverse probability weighting (IPTW)) are compared. We find that CEM and EB perform best regarding covariate balancing and DI5 and DEP performing well to evaluate disruptiveness of published papers.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Base Score Contribution
0.434
From this paper's citation signal
Citation Network Contribution
0
Citation network not refreshed for this result
This paper's DataRank is currently driven only by its base citation score. Citation network data was not refreshed for this result.
Learn more about DataRank methodology →DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 100% comes from its base citations and 0% from the citation network.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.