Citing is Easy, Reading is Hard

Errors in published work are easily propagated. And senior people make mistakes as well. A recent paper in AMPPS by someone who ought to know better (Gigerenzer, 2018) cited a seminal study on statistical misconceptions, which itself contained a statistical error. Worse, this error has been copied uncritically—verbatim—through many subsequent citations. Yet this is very common: it happens much more often than it should.

In the first part of this series, Marije aan het Rot mentioned an often-cited but fictional paper: “The art of writing a scientific article.” Not only does the article not exist, but the authors that didn’t write it don’t exist and the journal in which it wasn’t published doesn’t exist. Despite this, however, it has been cited 400 times. (It was given as an example in Elsevier’s Guide for Authors as part of an illustration of house style, then took on a life of its own.) Here, we present other examples of incorrect or haphazard citations, and we argue for the careful reading of original sources, rather than the parroted recitation of works that were cited somewhere else.

In the seminal study illustrating the problem we seek to take on, Oakes (1986) presented academic psychologists with the following scenario:

Suppose you have a treatment which you suspect may alter performance on a certain task. You compare the means of your control and experimental groups (say 20 subjects in each sample). Further, suppose you use a simple independent means t test and your result is significant (t = 2.7, df = 18, p = 0.01). Please mark each of the statements below as  ‘true’ or ‘false’. (p. 80)

The scenario was followed by six statements about the interpretation of the p-value in the result, participants were asked for each of these if they followed from the scenario above. Crucially, all statements were false, but in practice almost everyone incorrectly endorsed one or more of the statements.

The book in which the study appeared has been widely cited (892 times).[1] The study itself has been replicated multiple times, and has been featured alongside materials included in Haller and Krauss (2002; 346 citations), Gigerenzer (2004; 793 citations), Hoekstra et al. (2014; 238 citations), and most recently in Gigerenzer (2018; 46 citations). In each of these four instances, the material was copied (almost) verbatim.[2] All of these publications use the materials to illustrate the misconceptions many researchers have about methods of statistical inference that are considered to be relatively basic.

It is striking that in all these resources, no-one seems to have noticed the statistical error in the original scenario. (The degrees of freedom for an independent samples t-test with 20 participants in each group should be 38, not 18.) This is salient because this technique is taught early in most introductory courses on statistical inference, and yet a vast array of authors citing these papers seem to have overlooked the error. These are the literal authorities: the people who, out of everyone, ought to be counted upon to know better.

We hasten to say that we do not wish to make the point that these authors do not know their statistics. Nor do we think this makes the results of these studies in any way less valid: the different value for the degrees of freedom does not affect the conclusions that can be drawn from any of these studies. We do, however, believe this highlights a wider issue in science: citing is easy, and reading is hard.

A related thing seems to have happened to the seminal publication by Rosenthal (1979; 6403 citations) called “The file drawer problem and tolerance for null results.” Its title was altered to “An introduction to the file drawer problem” in an editorial by Pashler and Wagenmakers (2012; 955 citations), which was ironically about the crisis in confidence in the social sciences due to the large amount of errors. This incorrect citation was then subsequently copied by 21 other papers in identical form.

These examples are not isolated incidents. Other examples of lazy, incorrect citation practices leading to the propagation of error and myths have been described by Harzing (2002), Wetterer (2006), and Rekdal (2014). The problem seems ubiquitous.

Based on the extent to which misprints in the original bibliography of 13 papers were copied in citing papers, Simkin and Roychowdhury (2002) have estimated that only about 20% of citers had actually read the original text. Klitzing, Hoekstra, and Strijbos (2018) showed that more than three quarters of authors admitted to partial readings of the papers that they cited. Moreover, many errors are made even when the original has been read (see Harzing, 2002, for an inventory). The result is then the creation and perpetuation of “academic urban legends” (Rekdal, 2014), which sometimes slowly mutate due to a “Chinese whispers” effect (Harzing, 2002, p. 141).

So what are we to learn from this? First and foremost, there is no substitute for careful reading. Although automated procedures exist for checking reported statistics (Epskamp & Nuijten, 2014), these operate on certain algorithms and cannot catch every error. (Indeed, this one would not have been flagged.) And while it is tempting to trust the work of experts, the self-correcting nature of science can only be safeguarded if we scrutinize every paper we build upon, no matter how many times it has been cited.

 

[1] Throughout this letter, we count citations according to Google Scholar, as of 4 November 2019

[2] Four small textual changes were implemented between the first and last version of the scenario: ‘which’ in the first sentence was replaced by ‘that’, ‘further’ in second sentence was replaced by ‘furthermore’, ‘p=0.01’ was replaced by ‘p=.01’, and some typographic changes with slanting and quotation marks were changed. Crucially, the incorrect degrees of freedom remained throughout each version.

 

References

Epskamp, S., & Nuijten, M. B. (2014). statcheck: Extract statistics from articles and recompute p values (R package version 1.0. 0). Retrieved from https://cran.r-project.org/web/packages/statcheck.

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics33, 587-606. doi:10.1016/j.socec.2004.09.033.

Gigerenzer, G. (2018). Statistical Rituals: The Replication Delusion and How We Got There. Advances in Methods and Practices in Psychological Science, 1(2), 198-218, doi:10.1177/2515245918771329.

Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research7(1), 1-20. Retrieved from http://www.metheval.uni-jena.de/lehre/0405-ws/evaluationuebung/haller.pdf.

Harzing, A. W. (2002). Are our referencing errors undermining our scholarship and credibility? The case of expatriate failure rates. Journal of Organizational Behavior23(1), 127-148. doi: 10.1002/job.125

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust misinterpretation of confidence intervals. Psychonomic bulletin & review21(5), 1157-1164. doi: 10.3758/s13423-013-0572-3

Klitzing, N., Hoekstra, R., & Strijbos, J. W. (2018). Literature practices: processes leading up to a citation. Journal of Documentation. doi:10.1108/JD-03-2018-0047.

Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York, NY: Wiley.

Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science7(6), 528-530. doi: 10.1177/1745691612465253.

Rekdal, O. B. (2014). Academic urban legends. Social Studies of Science, 44(4), 638–654. doi: 10.1177/0306312714535679

Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results. Psychological bulletin86(3), 638-641. doi: 10.1037/0033-2909.86.3.638.

Simkin, M. V., & Roychowdhury, V. P. (2003). Copied citations create renowned papers? arXiv preprint cond-mat/0305150.

Wetterer, J. K. (2006). Quotation error, citation copying, and ant extinctions in Madeira. Scientometrics67(3), 351-372. doi:10.1556/Scient.67.2006.3.2.

Don van Ravenzwaaij (website) is an Associate Professor at the Psychometrics and Statistics department. In May 2018, he was awarded an NWO Vidi grant (a 5-year fellowship) for improving the evaluation of statistical evidence in the field of biomedicine. The first pillar of this research is about proper use of statistical inference in science. And the second pillar is about the advancement and application of response time models to speeded decision making.


Casper Albers (website) is professor in Applied Statistics and Data Visualisation at the Psychometrics & Statistics department. One of his main research lines concerns the development of psychological dynamic models and their application to behavioural data from different contexts, with emphasis on clinical and environmental psychology. He also writes columns (mostly in Dutch) for various outlets, including national newspaper De Volkskrant and the local university newspaper UKrant.


Maarten Derksen (new book) is the acting chair of the Theory and History of Psychology department. He is interested in the rhetoric with which psychologists demarcate their discipline from common sense, as boundary-work, as well as social technology (as sources and sites of control and resistance) and the replication discussion in psychology (including crisis-talk). He teaches theory and history of psychology, as well as research ethics and scientific integrity. He also serves as chair of the examencommisie for psychology at RUG.


Rink Hoekstra (profile page) is an assistant professor at the Groningen Institute for Educational Research. His research focuses on the use and misuse of statistical techniques, inferential statistics, confidence intervals, significance testing.


You may also like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.