The blessing and the curse of classifying neuroimaging data

Machine learning in cognitive neuroscience

In modern cognitive neuroscience, it has become common-practice to apply machine learning techniques to data obtained through neuroimaging. Despite this widespread use, however, there is something amazingly enigmatic about it. On the one hand, there is this organ that for millennia has eluded scholars: billions of neurons connected in myriads of ways too complex to comprehend, firing in intricate patterns to communicate and give rise to thought. This firing is then in some capacity picked up by machines that are as advanced as they are noisy. Methods such as the electroencephalogram (EEG) and functional magnetic imaging (fMRI) try their best at capturing neural firing. But the former is too far away to reliably say where activity is coming from, and the latter is too sluggish to say much about when (or in what order) neural processes were happening.

And yet, it seems that researchers nowadays just throw these patterns of activity in some mystical machine learning algorithm, and infer what a participant has been thinking about.

But what are we really learning from classifying neural data?

Classical neuroimaging research

Classically, cognitive psychology and cognitive neuroscience have followed more or less a similar logic in how they conduct experiments and treat their data. The cognitive psychologist will hypothesize that, say, the mind has more difficulty to process images of green apples than of red apples. They will devise an experiment to test this, and using reaction times and accuracy will infer whether the hypothesis is correct.

Next, the cognitive neuroscientist steps in, and repeats the same experiment in participants who are wearing an EEG cap. By averaging across different trials with red or green apples, the scientist may observe that certain ERP components are consistently less strong with green than with red apples. As such, this difference in neural activity is very likely to be related to be involved in the difference in performance – a result worthy of a scientific publication!

It is worth stressing that this conventional method hinges on two criteria. That is, the differences in neural activity between green and red apples should be somewhat consistent across trials, but also across participants: if in different participants, the neural activity between green and red apples manifests in completely different ways, then there would probably not be an observable, statistically reliable difference in the grand average. The conventional method, and in fact most statistical tests of neuroimaging data, assumes that given different conditions, we will observe systematic differences in the data that are somewhat comparable across different participants.

Machine learning: the great switch-up

One of the greatest tricks employed by most machine learning methods, is that they flip the logic of the conventional method on its head. Instead of using the conditions (red or green apples) to say something about the data (a high or a low amplitude measured at the scalp), classifiers will do the opposite: and take the neural data, and try to determine which condition was presented. If they can do so successfully, the classifier is able to tell from a pattern of neural data whether a participant saw a red or a green apple. In more popular jargon: the algorithm has read your mind.

A core reason for any new technique to become popular in a branch of science is that it yields results. Indeed, classifiers have allowed researchers to point out differences between conditions that had previously proven difficult to find. As it turns out, merely looking at averages and average differences is sometimes too short-sighted. Classifiers allowed researchers to identify whether the brain treats red and green apples differently “in any measurable way”.

Trick or treat?

However, there is something misleading about how many classification results are interpreted. It is quite an intricate issue, but I am sometimes afraid this misconception might have been key to the success and popularity of neural classifiers as an analysis tool. That issue is that classifiers are developed, trained and evaluated separately for data from different participants.

To see why this is relevant, recall the phrase from above about the classical averaging method: if in different participants, the neural activity between green and red apples manifests in completely different ways, then there would probably not be an observable, statistically reliable difference. In the case of separate classifiers per participant, this would be different: now, regardless of the (vast) individual differences in neural data, the classifiers could all consistently say that green and red apples are different. As a result, the average classification performance might be very high across participants – even if they share hardly any commonalities in neural data.

In fact, this feature of classifiers is often hailed as one of the strengths of the technique. Classifiers are able to mark that there is a difference, without pinpointing what this difference is. Interesting as that may be, it is in a way at odds with the goals of the cognitive neuroscientist – to look for reliable neural correlates of behavior. In fact, a popular topic of discussion at the moment is how to interpret the results of classifiers that are able to accurately decode neural data. What is it they are actually basing this decoding on? And – interestingly – are there any commonalities across classifiers fit to different participants? However, these developments are still in their infancy, are often specific to the exact type of classifier used. Moreover, they are only now slowly gaining popularity, over ten years after classifiers burst on the scene of cognitive neuroscience.

What to learn from all of this?

Does this make classification analyses worthless for our field? Of course not. There are many inferences about neural processing that have been made possible by means of classification analyses: many of these inferences are valuable and would have been impossible with conventional methods. Furthermore, the growing insights that predicting conditions from data can be just as valuable as predicting data from conditions, has lead to a new appreciation of what can be done with the data we collect. However, when a new technique garners widespread popularity in a short time span, it is worth critically evaluating where this popularity comes from. This is especially the case when a new technique is promoted as being ‘more sensitive’ – which may sometimes only mean that it produces more (false?) positives.

At the very least, it is worth asking ourselves what it is a classifier really does. If we don’t, we simply end up using tools we don’t understand, to make inferences about an organ we don’t understand.

Assistant Professor at Experimental Psychology

Wouter is an Assistant Professor in the department of Experimental Psychology. His research covers topics including visual attention, time perception, temporal preparation, episodic memory. He studies these using Computational modeling, Psychophysics, and EEG.

You may also like


  • Stephan Schleim December 14, 2022  

    Thanks for your interesting post. But whether one trains an individual classifier per individual subject is a methodological choice, isn’t it? And it also depends on the aim.

    If you want to allow, say, a paralyzed patient to select letters on a computer screen (Brain-Computer Interface), then you really want to optimize performance for that individual.

    But if you want to understand “how cognition works” more broadly, then it may be better to have a general classifier. Individuals’ neuroimaging data used to be transformed into a standardized 3D space with the same coordinates when I was active in that field. Feeding all such data into the classifier would make the individual “invisible”. And isn’t that what you want to get in the end? A classifier to detect, for example, perception of green apples in all subjects?

    Such classifiers can do amazing things. But does that already provide an explanation of the information processing that would satisfy the cognitive (neuro-)scientist? What do you think? As far as I know, it’s not trivial to link the classification system back to the brain.

  • Wouter Kruijne December 14, 2022  

    You’re absolutely correct in that it depends on the goal of analysis. In particular for BCI applications, relying on machine learning (ML) has lead to immense boosts in performance. This, precisely because this approach is tailored to the individual. In these settings, group-level data is simply less valuable; it might serve as a starting point, but exploiting individual differences is clearly the better choice than ignoring it.

    My point is that if the goal is inference, however, the merits of these approaches will in a way shoot you in the foot: by exploiting any individual differences, a researcher is by definition making it harder to generalize the findings.

    Issues like these are not harmful in themselves, but there is nowadays, a certain level of ‘hype’ associated with applying ML, that bothers me and muddles the scientific discourse. For example, studies that boast with ‘high classifier scores’ as an achievement, or being the ‘first team to be able to decode information about dimension X’. In numerous articles, the emphasis of the narrative is on the observation _that_ scores were high, while the details of the ML approach used to get there are tucked away in supplementary methods. For inferences about the neural signals (and by extension, about the brain) it would be much more informative to highlight what exactly it is about the new approach that lead to higher classifier performance. For example: if a linear classifier is unable to reliably decode red/green apples, but a nonlinear classifier can, then the interesting finding is that apparently these neural signals manifest in a nonlinear fashion; not that the nonlinear classifier is an objectively better method that leads to a very high score.

    This is what I tried to hint at in the concluding sentence: the fact that an ML algorithm is able to ‘read your mind’ is a beautiful feat, but is in itself not that exciting. Once we get a handle on /how/, though, we can make inferences about the signal (neural activity) itself.

  • Stephan Schleim December 16, 2022  

    Thanks, Wouter. I’ve recently seen a paper where they tried to apply “lesions” to artificial neural networks to better understand HOW they work, thus using methods from brain research/neuropsychology. (Which is somehow ironic in that NNs were also developed to better understand how the brain works.)

    In terms of the power of ML, I can recommend this new demonstration by Neuralink:

    Although I’m not in favor with the business approach of some of these people, the demonstration looks realistic, to me.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.