People who are into sound AND code (Kevin, eMax, Paul, I’m looking at you), I’m making this mandatory reading. Or at least pretty please reading. With a cherry on top.

Once, while I was in the process of falling asleep in a hotel room somewhere in Appalachia, my coworker/temporary roommate Dan and I had a weird conversation about how programming would work if we were blind. We came to the rough conclusion that really, the hardest part would be digging through the massive amount of code to find the section that you need to work on. We chatted briefly about alternative solutions. I remember only this exact quote from that night, and it was for the sole purpose of mockery/humiliation:

Dan: Python is like IDM if C is classical. I don’t think I can admit that I just said that.

But anyway, the idea surfaces once in a while (usually during other falling-asleep-times) and I think it’s interesting/ephemeral enough that it’s high time to share it before I forget about it. While the specific idea focuses on the case study of helping a visually-impaired person program, it is also applicable to the problem of skimming large amounts of auditory information in general. Interested? Even if I’ve actually done no research on any of this? Read on…

I enjoy listening to people talk (more than most nerds, anyway), but I damn near NEVER listen to talk radio or podcasts or, really, anything I can get a visual equivalent of. Why? I like being able to skim (not enough time to read everything fully), to jump back and forth between things I’m absorbing (curse you, tabbed browsing), and to find an interesting sections 3 days from now without too much trouble (it’s true: all of my techtail party talk comes from Reddit). I guess in a way, the latter two are actually part of the same problem: if something is clearly modularized/easily navigable, it’s easier to find your place again after a pause.

But there’s really no reason this stuff is limited to the visual realm.

I’m pretty sure we’re better at quickly deducing information from A LOT of noisy complexity with our vision, but this might just because we’re better trained at it. And even if not, let’s not sell our auditory abilities short. We’re pretty good at picking out relevant noises–someone calling your name, a fire alarm–from the mess of noise that we deal with every day. We can pick out layers (bass, melody, words, sound effects) from songs while appreciating it as one cohesive sound. Most of us can recognize isolated beats and riffs from songs we’ve heard, and we have incredible aptitude for (sometimes subconsciously) recognizing songs that are played softly in the background of noisy bars, even those that we haven’t heard for years.

Back to the case study at hand. We look through programs in two ways: searching (when you know exactly what you’re looking for and what keyword you have to use to get it) and browsing (to get an overview of the program, to find something that you’ve forgotten the name/location of). The first is easy even without sight: ctrl-f and examine the context until you find what you’re looking for. Browsing is what we have to work on. Programs have structure to them, no matter what language they’re written in. Functions and variables are defined and called, iterators are used, etc. Programmers are already good at recognizing these patterns visually: with a properly indented program, I bet you could print out an abstract representation of the code (just the shape of the indentation, no actual text) and programmers would be able to do a decent job at figuring out what shape is what. But (as far as I know), there’s no way to do this with sound. Yet what if the code were abbreviated into a soundscape? If a certain bassline represented different functions and trills marked certain operations, couldn’t we skim over the code the way we skim over a mixtape to locate the parts that we like? We could just hit fast forward and the code-as-song, familiar to us since we built it up from the ground Fruity Loops style. You could have different default “themes” for your code-bit-sounds so that your code could sound like Bach or Boards of Canada or Dem Franchise Boyz (I feel like this option will not be popular). Code that was functionally elegant could become aesthetically pleasing, too.

Basically, this would exploit the unique ability of sound (that I may or may not have just made up) to contain many layers of information while being easily “summarizable.” There are other applications of this that I can think of off the top of my head. For example, what if we encoded hierarchy into audiobooks by messing with the playback speed for easy chapter location? Say an audiobook was normally played at a speed suitable for spoken word–what if there was another layer of sound that, played at triple the speed, contained information about where in the book you were? E.g. at slow speed, you would get “In the beginning was the Word…” and at the faster speed, you got “The Book of John”. At the slower speed, this heading information would be slow enough that it would just be an incomprehensible rumble of bass. At the higher speed, the normal words would just be chipmunk chatter dancing around the navigation information.

I think this(these?) idea(s) (is|are) pretty damn cool. Maybe someone has done something like it already? Maybe you are interested on working on it with me at some point? Maybe I have seriously flawed ideas about cognitive science? Any/all feedback/advice/suggestions appreciated!