Newsy Tuesday

In lieu of actual content (maybe later), I'd like to post something cool from Singularity Hub.

vOICE is an innovative augmented reality system that lets people see with sound. Images from cameras are translated into tones, which are played through headsets. The system has surpisingly good fidelity, and implications for brain research.

What’s amazing about the system – and what makes it a hot item for neuroscience research – is that it appears to restore the actual subjective experience of vision (visual qualia) to blind users, rather than just teaching them to correlate objects and sounds. Users have reported the return of experiences like depth and the sense of empty space in their environment. The restored vision is not the same as normal visual experience – one user described it as being comparable to an old black and white film, while others report vague impressions of objects as shades of grey. Research is now underway to understand how the vOICe system might be rewiring the brain to achieve this effect.

Holy crap, we can see with sound. No inplants required, no electrode arrays on the tongue, just off the self tech and clever algorithms.


  1. Do you need to be blind to get the effect?

  2. my guess would be no, but like language without immersion it will take longer and require much more focus to become fluent.

  3. I believe I saw a downloadable game out there somewhere that was actually designed to train you on this specific interface.

    I feel like we can build an even better algorithm though. It would be good to reproduce the notion of a fovea in the sound-vision. This makes better use of bandwidth by representing the periphery at low resolution and creating a central area of particularly high resolution. If the individual's eye is still intact, we could even add sensors that track eye movement and focal depth so that the auditory fovea behaves like you would expect in normal vision. The question, though, is how to do the encoding. A linear sweep doesn't really work since it contributes the same amount of information at all points.

    I suggest testing a polar sweep, with the density of resolution increased near the center, to mimic the magnification factor measured in the retinotopic map. Log-polar approximation is what we are going for, with some fudging near the center to remove the singularity.

    If this works well, perhaps it would even facilitate reading ?

    Other, interesting algorithms for converting images might be possible. Perhaps there is some way to remove the inherently scan-line nature of this device.

  4. so, to elaborate on "polar sweep", I mean

    a radial line that goes around like a radar screen, but you give more of frequency space to the part closer to the center


    a circle that starts infinitesimally small and grows to the periphery, then re-starts, with the radius of the circle growing geometrically.

  5. hmm I guess this also has some bearing on my color-qualia speculation from a couple days ago.

    I essentially asserted that some component of the "Red" quality was hard wired.

    If we could re-wire color vision through some other sense, or through some part of the cortex not V1, we could test to see if "Redness" is a property of the information or hard coded somehow.

    Of course, you'd need a subject already familiar with standard "redness" and also with the ability to report qualitative ( as opposed to functional ) equivalence of the normal "red" with the "red" from another input stream.

    not exactly tractable.

  6. heh, well, I sent them an e-mail about using log-polar coordinates. Lets see if they respond and have tried the idea.

  7. So, the inventor was kind enough to answer some of my questions by e-mail. I didn't get permission to paste his response verbatim, but the gist is

    They already have support for a distortion-free foveal magnification map

    And have been working on an eye-tracking system to automatically reposition the gaze.

    My proposal of "OMG USE LOG-POLAR RADIAL SWEEP THEN IT'S LIKE THE VISUAL SYSTEM" was met with the following, very logical, engineering counterpoints
    -- we already implement foveal magnification in the orthogonal coordinate system
    -- if you use log-polar representations you lose translational symmetry in the system. That is, a circle will not always sound like a circle if you move it around.
    -- Eye tracking and the scan-rate may not be fast enough to mimic gaze redirection in the human visual system, which may obviate the usefulness of log-polar coordinates. Basically, log polar coordinates work in vision because you can do rapid saccades to track objects, which this system simply would not support. Thus, our built-in method of stabilizing objects in vision using gaze tracking, which would normally correct for the drawbacks of a log-polar representation, won't work with sound-vision.