An interesting talk, but the main idea is not new. Gibson has rejected 20+ years ago the idea that vision is simplest when the eyes are fixed as if they are a camera taking a static snapshot to be transmitted to the brain. Visual information does not arrive at the eyes in the form of discrete packets of impoverished stimuli that must be further processed, instead, motion reveals persistent and transient structures in the ambient light - information that can be directly picked up with the senses as a perceptual system. “The activity being what occurs in the brain when the inputs get there. That was not what I meant by a perceptual system. I meant the activities of looking, listening, touching, tasting, or sniffing.” (Gibson, 1979, p.244)
This formulation is extended by the enactive approach to visual perception (Noë, 2004), which argues perception is not what happens in us, but something we do. The end product of perception, if there’s any, is our perceptual experience determined by what we do, not a computational output in the form of a linguistic token. This of course does not mean the brain is irrelevant, but it certainly rejects the idea that there’s something in the brain that is a percept or image or 2D/3D model of the world.