Envisioning time: Why seeing at max speed isn’t what we want
Our visual system is mesmerising. It is our predominant means of corresponding with the outside world, allowing us to navigate safely through traffic, spotting the mosquito bugging us at night, or reading blogs like this on our phone, tablet or computer. To make things visible, the very first process going on inside our bodies is the transformation of light into neuronal signals on the retinas of our eyes. To do this, our retina is equipped with so-called rods and cones, photoreceptors that turn light photons into electrical signals. Rods are good at sensing light-dark differences while cones come in three different types, each responding best to different wavelengths or colours. Fun side note: Some women have superhuman vision because they carry two recessive genes on each X chromosome resulting in the expression of a fourth type of cone, thus enabling them to perceive additional colours! Overall, each of us has about 90 million rods (mostly in the periphery of our eyes) and 5 million cones (mostly in the fovea, or the centre, of our eyes). Simplifying this, our eyes have an initial resolution of 95 million “pixels”, which is far better than the full HD resolution of 1920×1080 (~2 million) pixels on our TVs.
Despite this high-end resolution at the very beginning of our visual system, it is important to realise that we are not (and couldn’t possibly be) thinking about every single one of these pixels at the same time. Rather, our brains need to make sense of what we see, aggregating information from single pixels into meaningful combinations; for example, we aggregate pixels forming the letters of words in this blog while separating those that don’t- like the background pixels between letters. This process of aggregating something small (like pixels) into a larger, meaningful construct has been referred to as chunking in the scientific literature.
Chunking as a memory aid
We have all used chunking many times in our lives due to its mnemonic (meaning memory-aiding) benefits. Just picture going to the supermarket with eggs, flour, and sugar on your grocery list: You could either memorise all three of these items, thus occupying 3 slots in your brain’s memory system, or simply “chunk” them together to “pancakes” to trim the list down to a single item and memory slot. This method of simplification through chunking is what memory experts use by creating a story around to-be-remembered words or what expert chess players use who recall a game position rather than individual pieces at a time. These examples show that chunking is either a deliberate, conscious process (like the pancake example) or involves previous knowledge and expertise (like the expert chess player example). Research has linked these so-called higher-level forms of chunking to activity in the lateral frontal cortex of our brains. Yet, lower-level forms of chunking also exist.
Going back to the pixels of our retina, naturally, light from a single object will fall onto, and activate, adjacent photoreceptors in our eyes. After the photons have been transformed into electrical signals, these signals will be funnelled through so-called ganglion cells to neurons in the optic nerve (reducing the number of fibres here from 95 to 1 million!). From the optic nerve, the signals speed through the optic chiasm (a crossroad at the bottom of our brains) to an area at the back of our brain called the occipital lobe, which holds our visual processing areas. In these visual processing areas, we see something very similar to the pattern on our retinas: neurons close to each other receive input from photoreceptors close to each other on the retina (so ultimately from points in space close to each other). This adjacency in space is referred to as a retinotopic map as the “map” on the retina is matched in our brains. So if we see, for example, a stone on the road, the signal of this stone’s shape is projected both onto our retinas and our visual processing area in the brain. Ultimately, this adjacency helps our brain to chunk and integrate single pixels into coherent objects to make sense of the world in an efficient way.
It is important to note at this point that chunking together such adjacent items only refers to chunking in space, since pixels stemming from a single image or a single moment in time are aggregated. Much research has been conducted on this form of chunking, which proves very helpful for our brain’s memory system as we don’t need to hold information on all possible pixels but only information on a few meaningful objects. Even though we know a lot about chunking in space, we sadly know much less about what happens across time, which is how we see the world on a day-to-day basis. We just published results from an experiment in the Journal of Cognitive Neuroscience investigated the notion of chunking in time, which might shed some more light on how chunking works from moment to moment.
Vision in time
To understand chunking in time, it is important to understand the temporal boundaries of our visual perception. In 1933, two researchers called Selig Hecht and Cornelis Verrijp discovered what they referred to as critical fusion frequency. Critical fusion frequency refers to the speed with which lights can be switched on and off without us noticing a flickering but instead perceiving a continuously shining light (this rapid switching on and off is actually how fluorescent or LED lights work!). It appears that the necessary speed for this fusion is at about 60 Hertz, that is switching the light on and off every ~8 milliseconds. Following from this, the neurons of our visual system need to fire about every 16 milliseconds since our rods and cones will have received a light signal just about every time they can fire away by then. So summing this up, our maximum visual speed is about 60 images per second.
Despite this maximum speed, however, most things in our lives will not require us to react every 16 milliseconds. As lead researcher Dr Elkan Akyürek from the University of Groningen has put it in our article: “Even when things move rapidly in the visual field, such as when a ball flies toward a catcher, the perceptual analysis of the situation need not necessarily revolve around the briefest time intervals that one could possibly distinguish”. In fact, researchers have estimated that the real, perceptual speed is at around every 100-250 milliseconds, meaning that we integrate information from these smaller time windows instead of what’s maximally possible. So why is it that we lose out at that much information? Why is it that we don’t perceive at max speed? Isn’t optimum performance what we would want, what should be favoured evolutionarily? In our experiment, we wanted to address these questions since we suspected that reducing the perceptual speed, by chunking together information across time, would lead to memory benefits just as chunking together information across space does!
Chunking across time
So how did we try to answer these questions? As so many psychologists and neuroscientists before us, we locked up 39 undergraduate psychology students in the University of Groningen’s basement and let them perform a task on the computer. During the computer task, the students were presented with two streams of rapidly appearing letters (1 every 80 milliseconds) one of which needed attending to (cued by < or > in the beginning). Importantly, one or two target stimuli would appear within the cued stream of letters. These targets consisted of single/ multiple corners of a window. Here is a gif I created as a rough example of how the experiment looked:
If two target stimuli were present, they would always appear right after each other (as in the gif example, if you spotted it correctly!). At the end of the stream, we asked participants to indicate the corners they saw for the first target and, if they saw a second target, the corners for the second target as well. Crucially, the corners of the window stimuli could be integrated into a combined target. For example, if target 1 was “┌” and target 2 was “┐”, students could have either perceived the targets as such OR they could have perceived the combined “┌ ┐” percept for target 1 and nothing for target 2. This setup allowed us to distinguish three types of trials:
- Trials with 1 target that was correctly identified
- Trials with 2 targets that were correctly identified
- Trials with 2 targets that were correctly identified but integrated into one percept
One important addition to this computer task was that we measured students brain waves using an electroencephalogram (EEG) to measure what are called event-related potentials. Event-related potentials are EEG waves time-locked to (meaning starting at) a specific event, which was the appearance of the second target in our case (or letter in case only 1 target was present in that trial). This time-locking of the EEG waves allowed us to study target-dependent processes in the brain. visualised by deflections in the EEG signal. In particular, three event-related potentials (also called components) were of interest to us, lovingly referred to as N2pc, CDA, and P300. More important than their names is the fact that greater amplitude (meaning a bigger wave) of the components means we use a greater amount of our attention in case of the N2pc and a greater amount of our brain’s memory system in case of the CDA and P300.
Looking at these brain waves for the three possible types of trials, we got the following results: For the attention-related N2pc component, we saw that students used the same amount of attention to detect two targets reported in integrated and separated forms but more than if only one target was present. This is what we might expect because two is greater than one. Yet, things were different for the memory-related CDA and P300. Here, we saw a greater deflection, that is, we used more memory capacity, for two targets perceived separately than for two targets integrated into one. On top of this, two targets integrated into one seemed to occupy a similar space in our brain’s memory system as was the case if only one target was present and identified. So, in line with the title of our scientific article, “what we see [truly] is what we remember”, independent of what is physically out there!
To us, these results go back to the notion of chunking: Since integrating images (or objects) across time uses up fewer space in our brain’s memory system, there might be an energetic benefit by this form of chunking as well. Since memorising items costs energy because neurons need to be kept active, we can actually save energy by chunking objects together across time. So while this means we don’t function at max speed, we might still function at a speed fast enough to cope with our external world, yet slow enough for our brain not to use all of our body’s resources (which still is about 20% of our calories when only accounting for 2% of our body weight!).
References
- Akyürek, E. G., Kappelmann, N., Volkert, M., & Rijn, H. v. What You See Is What You Remember: Visual Chunking by Temporal Integration Enhances Working Memory. Journal of Cognitive Neuroscience, 0(0), 1-12. doi:10.1162/jocn_a_01175
- Bor, D., Duncan, J., Wiseman, R. J., & Owen, A. M. (2003). Encoding strategies dissociate prefrontal activity from working memory demand. Neuron, 37(2), 361-367.
- Eriksen, C. W., & Collins, J. F. (1967). Some temporal characteristics of visual pattern perception. Journal of Experimental Psychology, 74, 476–484.
- Hecht, S., & Verrijp, C. D. (1933). Intermittent stimulation by light. The Journal of general physiology, 17(2), 251-268.