Star Wars at Central Banks
Researchers in economics often write stars (*) next to their results, to highlight which results pass conventional thresholds of certainty, or ‘statistical significance’. Researchers typically like showing starry results because they can improve perceptions about the value of a piece of work, broadening its influence. But are these stars always what they seem?
In our paper we investigate whether researchers use improper methods to produce more starry results, something that we worry would foster exaggeration. For example, researchers often have to decide whether to delete suspicious-looking data points. Each deletion can change results, and the right choice is often subjective. So researchers have freedom to shape their results somewhat. If researchers use that freedom to favour starry results, their work will suffer from exaggeration.
Others have also investigated this problem, focusing mostly on research published in academic journals. Their findings do suggest exaggeration and now there is growing support for lifting research standards. Still, our work is important because it is unclear whether the findings about journals apply to central banks.
To investigate, we compile 2 decades of research results from the Federal Reserve Bank of Minneapolis, the Reserve Bank of Australia and the Reserve Bank of New Zealand. We then use 2 popular methods to detect exaggeration in the dataset. Both build on the observation that researchers start assigning stars at a human-made threshold of significance, whereas nature, which should dictate the true pattern of results, is indifferent to that threshold. So if the observed pattern of results shows anomalies at the starry threshold, we can be confident that the anomalies come from researchers. Most complex is the final step: figuring out whether the anomalies come from exaggeration or something else.
Our findings are mixed. The first method shows no evidence of exaggeration but often misses exaggeration when it occurs. The second method shows some evidence of exaggeration but relies on strong assumptions. We test those assumptions and challenge their merit. At this point, all that is clear is that central banks produce results with patterns different from those in journals, there being less bunching at the starry threshold (see the figure below). The source of this difference remains a mystery.
Notes: We position results on the horizontal axis using a measure of statistical significance called the z-score. The academic journals are The American Economic Review, Journal of Political Economy and The Quarterly Journal of Economics.
Source: Authors’ calculations; Brodeur et al (2016); Federal Reserve Bank of Minneapolis; Reserve Bank of Australia; Reserve Bank of New Zealand.