A note on the binding problem
Edit : never-mind, this post is redundant to this much more comprehensive review.
There was, a few years ago, some debate on "the binding problem". This problem stems from the fact that distinct areas of the brain are specialized for extracting certain visual features. For instance, the brain regions that represent the location and motion of objects are far away from the brain regions that identify objects. Nevertheless, a running cat is not perceived as, disjointly, a cat, and a moving thing. Somehow, even though the parts of the brain responsible for semantically identifying object know nothing of location, and the parts of the brain responsible for localizing objects know nothing of semantic identity, we experience an integrated reality where specific things have specific locations.
To simplify, say you are presented with a spoon on the right and a fork on the left, and asked to retrieve the fork. So, somewhere in the brain is the notion "there are two things here, one on the left and one on the right" and somewhere else in the brain is the notion "there is a spoon and a fork here, but I'm not sure where". How the brain combines these two representations has been the subject of much speculation.
Some have proposed that populations of neurons responding to the same object become synchronized, such that neurons firing for "thing on left" and neurons firing for "fork somewhere" tend to fire at the same time, and this somehow unifies the two areas. I am skeptical of this "binding by synchrony" hypothesis.
I am skeptical because, when I am not paying attention, I am very likely to pick up the wrong utensil, and I suspect that attention is critical for binding. This argument hinges upon some assumptions of how the visual system works and what attention is.
The visual system is hierarchical. At first, the brain extracts small pieces of lines and fragments of color. These features are well localized, and "low level". Then, the brain begins to extract more complex features. These may be corners, curves, textures, pieces of form. This information is not as well localized. The combining of features into more complex features is repeated a few times, until you get to "high level" representations complex enough to identify whole objects, like "forks" and "spoons". As features get more complex, they loose spatial precision, until the where neurons that can identify objects really have no idea where that object is.
In the visual system, there is feed-back from higher level to lower level representations. Activity in high level representations can bias activity in lower level representations. You may be most familiar with this phenomena when you are day-dreaming. We are able to control, to some extent, the activity in most visual areas, and we thing that this control constitutes imagination. We have more control over "high level" visual areas. This control weakens toward lower level visual areas. For instance, primary visual cortex appears to be inactive in dreaming and visualization.
When we are awake, this top-down control is used for attention. Attending to an object will make said object "pop out" ( become more salient ). This enhanced salience may propagate from higher to lower level visual areas. For instance, if I focus on "fork!", the neurons that know there is a fork somewhere will enhance all fork-like mid level features, which will enhance fork-like low level features, and so on.
The key point here is that, by focusing on the identity of an object, I can increase the salience of low and mid-level visual features representing that object. Although the semantic part of my brain may have no idea where the "fork" is, it can make the low-level fork features pop-out. And, these features are well localized. Thus, the part of my brain that knows "there is something on the left, and there is something on the right", will find that the item on the left suddenly seems more salient. This seems sufficient to let the brain know where it needs to reach to pick up the fork.
This effect works both ways. If I ask "what is the object on the left", the neurons that know where the thing on the left is will make the features of the left object more salient, which will enhance the representation of the "fork" features in the part of the brain that can identify what objects are. Note that this effect doesn't need to be large, or make the "fork" dominate over all other objects in the scene. You simply need a brief increase in the salience of "fork" over background objects to know that the thing on the left is a "fork".
All of this happens rapidly and automatically. Binding is achieved by attending to high-level properties of objects, and therefore gating which objects get processed in other, distant, high-level areas. Attention ensures that at a given time only one unified object is most salient.
To simplify, say you are presented with a spoon on the right and a fork on the left, and asked to retrieve the fork. So, somewhere in the brain is the notion "there are two things here, one on the left and one on the right" and somewhere else in the brain is the notion "there is a spoon and a fork here, but I'm not sure where". How the brain combines these two representations has been the subject of much speculation.
Some have proposed that populations of neurons responding to the same object become synchronized, such that neurons firing for "thing on left" and neurons firing for "fork somewhere" tend to fire at the same time, and this somehow unifies the two areas. I am skeptical of this "binding by synchrony" hypothesis.
I am skeptical because, when I am not paying attention, I am very likely to pick up the wrong utensil, and I suspect that attention is critical for binding. This argument hinges upon some assumptions of how the visual system works and what attention is.
The visual system is hierarchical. At first, the brain extracts small pieces of lines and fragments of color. These features are well localized, and "low level". Then, the brain begins to extract more complex features. These may be corners, curves, textures, pieces of form. This information is not as well localized. The combining of features into more complex features is repeated a few times, until you get to "high level" representations complex enough to identify whole objects, like "forks" and "spoons". As features get more complex, they loose spatial precision, until the where neurons that can identify objects really have no idea where that object is.
In the visual system, there is feed-back from higher level to lower level representations. Activity in high level representations can bias activity in lower level representations. You may be most familiar with this phenomena when you are day-dreaming. We are able to control, to some extent, the activity in most visual areas, and we thing that this control constitutes imagination. We have more control over "high level" visual areas. This control weakens toward lower level visual areas. For instance, primary visual cortex appears to be inactive in dreaming and visualization.
When we are awake, this top-down control is used for attention. Attending to an object will make said object "pop out" ( become more salient ). This enhanced salience may propagate from higher to lower level visual areas. For instance, if I focus on "fork!", the neurons that know there is a fork somewhere will enhance all fork-like mid level features, which will enhance fork-like low level features, and so on.
The key point here is that, by focusing on the identity of an object, I can increase the salience of low and mid-level visual features representing that object. Although the semantic part of my brain may have no idea where the "fork" is, it can make the low-level fork features pop-out. And, these features are well localized. Thus, the part of my brain that knows "there is something on the left, and there is something on the right", will find that the item on the left suddenly seems more salient. This seems sufficient to let the brain know where it needs to reach to pick up the fork.
This effect works both ways. If I ask "what is the object on the left", the neurons that know where the thing on the left is will make the features of the left object more salient, which will enhance the representation of the "fork" features in the part of the brain that can identify what objects are. Note that this effect doesn't need to be large, or make the "fork" dominate over all other objects in the scene. You simply need a brief increase in the salience of "fork" over background objects to know that the thing on the left is a "fork".
All of this happens rapidly and automatically. Binding is achieved by attending to high-level properties of objects, and therefore gating which objects get processed in other, distant, high-level areas. Attention ensures that at a given time only one unified object is most salient.
No comments:
Post a Comment