Science needs better rules, judgments and a lot more
Science is in crisis, and there appears to be a broad consensus (on Twitter at least) that the way research is being conducted and reported has to improve. But how? To reflect on this, Maarten Derksen [see also his post on codes] recently organised a symposium focusing on the question of whether we need better rules, or whether researchers need to be taught to make better judgements. In short: “Do we need better rules or better judgments”? In the colloquium, Arie Dijkstra and Richard Morey gave their perspective, and a lively debate ensued.
My purpose in this blog post is modest. I simply wanted point out some alternative questions one could (and I believe should) be asking. In this way, I hope to nuance and deepen the debate somewhat and point out the directions in which a consensus about these points might be reached.
I believe the question “Do we need better rules or better judgments?” needs clarification.
The reason why the question of the symposium was good, is that we can debate it forever. The reason why it is bad, is that we can not agree until we clarify it first. I believe the question “Do we need better rules or better judgments?” needs at least four clarifications. We should (a) question the word “or” which implies that rules and judgments are antagonistic, (b) ask who is “we”, (c) clarify what we need these rules/judgments for, and (d) examine if rules/judgments are all that “we” might need.
I address clarification (a) first because I think readers can easily agree with me about this. Whoever “we” is, whatever purposes “we” pursue and no matter how good the rules, we will always need to make judgments to decide which rules to apply, and how. Moreover, in many situations rules do not exist: particularly in science, new and unexpected problems crop up all the time that may evade or question existing rules. Thus, for many reasons, judgments will always be required.
But at the same time, I cannot imagine any situation in which humans act as if there are no rules. After all, even in situations in which “everything seems to be possible” there are still some behavioural “rules” (e.g., at some point you must eat and drink) that people will end up following. Some rules are embodied, or are imposed by the situation. Living one’s life without any rules may be possible in theory, but in any realistic situation a human being encounters, it is practically unthinkable.
And so we must logically conclude that rules and judgments are not antagonistic. We need some of both. In fact, a more useful question might be something like this:
“If we want to do good science, what kind of rules can help us make better judgments”?
Now let’s look at clarification area (d): Are rules and judgments all we need, a bit like the Beatles song “all you need is love”? Again we will easily reach consensus about this: of course rules and judgments (and love) are not all we need. Researchers need information, resources such as energy, time, technology, etc. Rules and judgments within the sciences more generally are informed by principles and ideals. So no: rules and judgments are not all we need.
Having established this, we can take a look at clarification areas b and c: who is this “we” that needs rules and/or judgments and what do we need them for? The word “we” is a slippery term. It can refer to the collective level and to the individual level. Each level requires different rules and judgments, in order to achieve different outcomes.
At the collective level, ministries, research foundations and executive boards of universities have interpreted this to mean that “we” need rules to keep science healthy (a top-down interpretation). The chosen approach was to formulate rules to stop particular individuals from misbehaving. In the Netherlands, the Association of Universities VSNU has formulated a code of conduct (see www.vsnu.nl).
On closer inspection, this code consists not of behavioural rules but of abstract principles that academics should adhere to. The VSNU has basically summarized contemporary beliefs about what it means to “be good.” How these abstract ideals should be executed in practical settings is left almost entirely to the interpretation of individual academics, their departments or “the field” (even though the employer is the only one with any power to keep employees in check). Apparently, there is one unstated overarching rule: make good judgments!
The word “we” is a slippery term. It can refer to the collective level and to the individual level.
Of course, others have interpreted the question very differently. Science in transition (www.scienceintransition.nl) points out that problems in science are not just due to flawed top-down rules that give “bad apples” the liberty to misbehave: it adopts a bottom-up approach and points to systemic flaws that encourage misbehavior. For example, there is criticism of the overly competitive system with excessive rewards for a very small club of “excellent” researchers. According to this perspective, “we” are the ministries, research foundations and executive boards! These insitutions are the ones who need to change their ideas of quality and merit: the basic rules by which the powers that be determine what is good and bad research.
And finally, at the level of individual researchers, “we” need rules and guidelines to help us make better judgments in everyday research matters. It should be clear that collective level solutions (e.g., the code of conduct) as well as systemic change (e.g., who deserves funding) are almost entirely useless as concrete guides for such day-to-day judgments. It follows that abstract discussions about general principles are going to be fruitful when it comes to identifying certain principles and valid perspectives, but useless as foundations for practice.
Looking forward: How to deal with the crisis?
In social psychology, there was already a crisis of confidence in the value of research towards the end of the 1960s. The conclusion many drew then was that quantitative empirical research itself had so many shortcomings that it would be good to try other things (I recommend reading Gergen, 1972; Moscovici, 1972; Ring, 1967; Tajfel, 1972). In some circles, collecting quantitative data became a taboo. The response to the current crisis is in some ways the exact opposite. Most people seem to be of the opinion that “better data” and “harder facts” are the answer to all our woes (Ioannidis, 2005; Open Science Collaboration, 2012; Nosek & Lakens, 2014).
As a result, this discipline is responding to legitimate doubts about the validity of a body of research by developing all kinds of rules for collecting or analysing data (e.g., pre-registration, sample sizes, replication, see Ioannidis et al., 2014, for an overview). For many research projects, this might do some good. But will it advance knowledge and understanding? I am afraid not. It promotes the idea that we need data, and that one kind of data is superior to all others.
An undesirable side-effect might be that these new rules enshrine how and when scientists can make claims about “the truth”. This of course assumes that the truth exists, that we can know it, and that powerful scientists can show it. It reinforces the idea that, in certain cases, there are “facts” that outstanding scientists can demonstrate.
This fetishization of hypothesis testing risks devaluing explorative and qualitative research
Problematic is that all this energy is devoted to devising better hypothesis tests: quantitative tests of a priori hypotheses. The neglect of all other kinds of research, which do not test hypotheses, implies they are somehow second rate: “merely explorative” or (even worse perhaps) mere theoretical conjecture. This fetishization of hypothesis testing risks devaluing explorative and qualitative research as if it is just tinkering around before doing the real stuff of science. But this would be a very grave error: for the researcher themselves, it is this tinkering around which produces the insights and learning. The actual test of the hypothesis is, at best, just another step in a learning process. There is not one superior form of data: we need all we can get.
There is one further problem: having more rules and sharper judges risks creating a climate that discourages people from exploring without sufficient safety precautions. Will you embark on a journey on rocky seas if you must, at all times, be able to convince an inspector that your ship is seaworthy? When the standards of seaworthiness are raised and when anxieties about the consequences of sailing mount, most vessels will never leave the coastal waters.
This is not an imaginary scenario: if we must, at all times, ensure that the power of our tests is .80 or more (a very stringent demand), it would simply become impossible to continue investigating many of the topics that we currently believe are important knowing about. Our colleague André Aleman would have to stop researching schizophrenia because there are not enough schizophrenics for his research and I would have to stop doing research on violence in crowds because there are not enough riots in the Netherlands.
It has been argued that to apply these sharp and strict criteria to individual studies is just as nonsensical as demanding that p < .05 (Hunter & Schmidt, 2004, pp. 11-12). We have to determine how we can accumulate knowledge the most effectively as a discipline. For that, it is not necessary for individual studies to have a certain power that we deem to be “enough”. We also need to think about the accumulation of knowledge across many different studies: a large enough number of them, but also a broad enough spectrum. We also need to think logically about our theories and critically reflect on the assumptions of our science.
Often, the problem with research is not the data, but the operationalization or conceptualization
Developing rules that increase our trust in data ignores that data are just one of many aspects that need to be taken into account when evaluating evidence. Far too often “the data” (no matter how narrow the paradigm) has silenced legitimate concerns and doubts about the wild and broad claims that were being made. Often, I find, the problem with research is not the data per se, it is with operationalization or conceptualization: a lack of ecological or other forms of validity.
It is for this reason that I dread any “solutions” that involve more or better data, that specify what data is beyond reproach or criticism. I believe that any new rules should not just empower positivistic researchers who are keen to produce irrefutable data, but also critical researchers who are willing to debate research no matter how “strong” the data appear to be.
As much as we enjoy having abstract debates about this, we are unlikely to ever reach consensus. Is this problematic? I don’t think so. I know from personal experience that at the concrete level of doing research and learning that “we” (individual social psychologists, clinical psychologists, methodologists, etc.) will often reach consensus quite easily about how to best collect data, interpret it and report its results.
This consensus can be reached within research groups but also in inter-group collaboration. So even though we will inevitably disagree about the answers to the question whether we need better rules or better judgments, as soon as we conduct our joint research we find that we can learn new things together. It is at this concrete level of collaborating that the present crisis can be resolved.
To summarize the main points of the above: the question whether we need better rules or better judgments is a nonsensical one. The academic community as a whole needs certain general rules, but we need to equip individual researchers with the means to make good judgments. In addition, the community and individual researchers within it will need a lot more than just rules and judgments.
I believe the efforts to create this optimal infrastructure are best spent not on abstract debate. The basic criticisms of our current research practices were formulated decades or longer ago. If we focus our energy on having abstract debates, we will repeat things we knew already. But we can make a real difference if we develop research communities to adopt and develop better working practices.
Many things are required for this, including better theory development, better means of exploring data and (perhaps, at the end of a journey) better confirmations of predictions. To achieve this, tighter collaboration across disciplinary boundaries would be a major step forward. The current crisis offers a wonderful opportunity for methodologists, philosophers of science and researchers to collaborate on research. It is by learning new things about interesting subjects that we can develop our field at multiple levels.
On Friday March 20, 15.00-17.00, Prof. dr. Tom Postmes, Prof. dr. Henk Kiers, and Prof. dr. Marieke Timmerman will elaborate on this topic in the colloquium “Promoting healthy research practices”.
Gergen, K. J. (1973). Social psychology as history. Journal of Personality and Social Psychology, 26(2), 309.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. London: Sage.
Moscovici, S. (1972). Society and theory in social psychology. In J. Israel, & H. Tajfel (Eds.), The context of social psychology (pp. 17-68). London: Academic Press.
Ioannidis, J.P. (2005). Why most published research findings are false. PLoS Med. 2, e124.
Ioannidis, J.P., Munafò, M. R., Fusar-Poli, P., Nosek, B. A., & David, S. P. (2014). Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in cognitive sciences, 18(5), 235-241.
Nosek, B. A., & Lakens, D. (2014). Registered reports. Social Psychology, 45(3), 137-141.
Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7(6), 657-660.
Ring, K. (1967). Experimental social psychology: Some sober questions about some frivolous values. Journal of Experimental Social Psychology, 3(2), 113-123.
Tajfel, H. (1972). Experiments in a vacuum. In J. Israel, & H. Tajfel (Eds.), The context of social psychology (pp. 69-119). London: Academic Press.
this is a very thoughtful post (although i might not agree with everything you said). One thing that struck me was the part about fetishization of hypothesis testing. To me, this describes the ‘old’ and of course still very much current way of doing research (ie, before the crisis), where you are basically forced to describe your research as if you tested an a apriori hypothesis. If you were to write your exploratory study up the way you actually did it, you won’t get published. This way, we don’t know which hypothesis was in fact a priori, ie, which research was in fact hypothesis-testing and which was exploratory. But we need to know that to interpret the results appropriately. All we need is a clear label, and that’s what preregistration wants to accomplish. It may turn out that only a minority of studies are in fact hypothesis tests. Perhaps many important studies are exploratory, but we will not find out until we can separate both kinds. I do not think preregistration leads to fetsihization of hypothesis testing. to the contrary, it might help reestablish the value of exploratory studies.
Thanks Christoph. You argue pre-registration is a good thing and is not the same as fetishising hypothesis testing. I agree that pre-registration would or could be a good thing in a different reality than the one we currently inhabit.
But my concern is this: If, in our current reality, we would register our research as explorative (or register the many contradictory hypotheses we hold at the start of a project) then we would have a very serious problem when we write up the results. In your own words: “you won’t get published”. The reason for this is that, as you also note, editors and journals all too often demand (or even force you) to specify a priori hypotheses.
So: in the current world we inhabit, if we demand pre-registration we will be in effect confronting researchers with the need to specify a priori what hypothesis they are testing. This is problematic because not every research project is a random-controlled drug trial, and not all research topics lend themselves to that method of accumulating knowledge.
Interestingly towards the end of your response you write “Perhaps many important studies are exploratory, but we will not find out until we can separate both kinds.” I think it is a bit more complex than this: the only study I can think of that is not exploratory at some level is a direct replication. A distinction between kinds is problematic.
Hope this makes some sense
Oh, what a nice surprise you have chosen Raphael’s School of Athens for this inspiring post.
Did you know that this painting was note only made to represent ancient Greek philosophy, the roots of Enlightenment (as the Renaissance period in which it was made is connecting to these ancient roots), but in particular the principle of “Causarum Cognitio”, that is, seek knowledge of causes?
Unfortunately I cannot join you on March 20th due to other duties abroad, but I hope that you will be able to identify as many causes of the present problems and their possible solutions as possible!
Thanks for your kind words Stephan. I too was pleasantly surprised with this picture even though I did not know the link to seeking the knowledge of causes. I’m not sure who selected it but they’ve clearly done well. I’m happy to share the slides of the 20th with anyone who is interested and there might be a recording too. Best, Tom