Making Things See Available for Early Release

I’m proud to announce that my book, Making Things See: 3D Vision with Kinect, Processing, and Arduino, is now available from O’Reilly. You can buy the book through O’Reilly’s Early Release program here. The Early Release program lets us get the book out to you while O’Reilly’s still editing and designing it and I’m still finishing up the last chapters. If you buy it now, you’ll get the preface and the first two chapters immediately and then you’ll be notified as additional chapters are finished and you’ll be able to download them for free until you have the final book. This way you get the immediate access to the book and I get your early feedback to help me find mistakes and improve it before final publication.

So, what’s in these first two chapters? Chapter One provides an in-depth explanation of how the Kinect works and where it came from. It covers how the Kinect records the distance of the objects and people in front of it using an infrared projector and camera. It also explains the history of the open source efforts that made it possible to work with the Kinect in creative coding environments like Processing. After this technical introduction, the chapter includes interviews with seven artists and technologists who do inspiring work with the Kinect: Kyle McDonald, Robert Hodgin, Elliot Woods, blablablab, Nicolas Burrus, Oliver Kreylos, Alejandro Crawford, and Phil Torrone and Limor Fried of Adafruit. The idea for this section of the book was suggested to me by Zach Lieberman and it’s ended up being one of my favorites. Each one of the people I interviewed had a different set of interests and abilities that lead them to the Kinect and they’ve each used it in a radically different way. From Adafruit’s work initiating the project to create open drivers to Oliver Kreylos’s integration of the Kinect into his cutting edge virtual reality research to Alejandro Crawford’s use of the Kinect to create live visuals for the band MGMT, they each explore a different aspect of the creative possibilities unlocked by this new technology. Their diversity shows just how broad of an impact affordable depth cameras will potentially have going forwards.

Chapter Two begins the real work of learning to make interactive programs with the Kinect. It walks you through installing the SimpleOpenNI library for Processing and then shows you how to use that to access the depth image from the Kinect. We explore all kinds of aspects of the depth image and then use it to create a series of projects ranging from a virtual tape measure to a Minority Report-style app that lets you move photos around by waving your hands. Since the book as a whole is designed to be accessible to beginner programmers (and to help them “level up” to more advanced graphical skills), the examples in this chapter are all covered clearly and thoroughly to make sure that you understand fundamentals like how to loop through the pixels in an image.

I’m looking forward to more chapters coming out in the coming weeks, including the next two on working with point clouds and using the skeleton data. I’m currently working closely with Brian Jepson, my editor at O’Reilly, as well as Dan Shiffman (an ITP professor and the author of the first Kinect library for Processing) and Max Rheiner (an artist and lecturer at Zurich University and the author of SimpleOpenNI) to prepare them for publication. I can’t thank Brian, Dan, and Max enough for their help on this project.

I’m also excited to see what O’Reilly’s design team comes up with for a cover. The one pictured above is temporary. As soon as these new chapters (or the new cover) are available, I’ll announce it here.

Enjoy the book! And please let me know your thoughts and comments so I can improve it during this Early Release period.

Posted in kinect | Leave a comment

Techniques of the Observer

Last night at ITP’s Theory Club (a group that meets bi-weekly to discourse on abstract topics of interest), I gave a presentation on Jonathan Crary’s Techniques of the Observer. I called the talk Techniques of the Observer: Vision and Technology from the Camera Obscura to OpenGL. It was based on one portion of the proposal for a Platform Studies book on OpenGL I wrote over the summer. In Techniques of the Observer, Crary proposes a technique for characterizing a historical period’s ideas about vision by looking at its optical technologies and the metaphors they embody. The Camera Obscura tells you a lot about the Renaissance’s objective and universal geometric world view. Stereographs, phenakistoscopes, and film, all from the Modern era, couldn’t be more different from the Camera Obscura: they build the image inside the user’s mind using tricks of perception, hacks of the user’s sensorium. These resonate with a Modern world view of a series of independent subjectivities bound together into a consensual democracy.

In that earlier blog post and in this talk I set out to extend this way of thinking to cover contemporary computer-generated imagery. For the last 20 or so years, our most contemporary images have been the product of computer simulations designed to emulate an objective Renaissance perspective, but convert it into something fungible enough to become interactive and, when we want it, fantastical. And now, right now, we’re beginning to connect a new set of powerful artificial eyes to this simulation. We’re introducing something like the Reality Effect but to an inhuman mind’s eye. I think this combination explains some of the new Sensor Vernacular aesthetic that many of us have been struggling to put our fingers on. It is comprised of the first works of a new regime of vision struggling to be born.

I’ve uploaded my slides to Speaker Deck, a great new service that actually makes the process of uploading and viewing slide decks online simple and pleasurable. Here they are:

Posted in Opinion | Leave a comment

Announcing Drift

I’m proud to announce the launch of Drift, a text editor for iPad I’ve made with Devin Chalmers. Drift is a simple text editor that stores your documents as GitHub Gists so that they’re always backed up and easily shared. Drift also makes it simple to collaborate with other GitHub users on anything from a TODO list to an essay. You can search for gists created by other Github users, create your own copy with your own changes, and then share a link to your version. Plus since all gists are git repos, you can always browse your history of changes to see and even restore old versions. Or you can use the app completely anonymously without ever creating a Github account and Drift will remember all of your gist URLs for you. We hope that its great for active Github users and simple for everyone.

Drift is available in the App Store now for $1.99. We’d love to hear your feedback on it.

I built the original prototype for a desktop version of Drift using MacRuby back in the summer of 2009. Then, last summer, Devin and I starting talking about reviving the project. In a startling short time, Devin had built an iPad version and we’d won an honorable mention at iPadDevCamp 2010. Around the start of this year we began working on cleaning Drift up for submission to the App Store. We commissioned a logo from the excellent Rune Madsen and we did a few rounds of UI polishing. Over the summer we deemed Drift ready to go and submitted it to the App Store.

It was rejected. Repeatedly. Through five rounds of resubmission over more than two months. Over the course of these months we exchanged extensive phone calls and repeated emails with Apple’s App Store appeals board to try to discover why they were rejecting Drift. We eventually understood their full reasoning and were able to alter our app to gain approval. What we learned in the process will be relevant to many other app developers, especially those interested in building on top of existing web service APIs. Devin has written up a great comprehensive post explaining the situation: Stuck in the Middle with Users: Apple, Apps, Appeals, & Appeasement: the Story of Drift. I highly recommend you read the whole thing, but here’s the gist:

App Store guideline 11.13 requires that any purchases initiated from within an app go through Apple’s in-app purchasing mechanism. Apps are forbidden from linking out to an external website for the user to make a “purchases or subscription”. Drift was rejected because Apple interpreted our links to Github.com as an upsell to the paying Github service. This is true even though a free Github account is perfectly adequate for all of the API features used in Drift.

And here’s the disturbing take away message for other app developers: any link out to an external site that has a pay service can potentially be rejected under the rubric of 11.13. This could prevent the creation of a lot of apps. Mashing up APIs and building clever clients is at the heart of contemporary programming culture. As Devin puts it:

One of the great App Store gold rushes was Twitter clients. What makes our undertaking, a Gist client, different from theirs? Well, largely it’s that GitHub has a business model, while Twitter doesn’t. Think of that: if Twitter charged a buck a month for their service, instead of aggregating your sentiments to sell to OmniCom and co. to turn into failed viral marketing campaigns, Loren Brichter might still be quietly churning out the most polished iPhone apps in the world; people might meet Craig Hockenberry and exclaim “oh, the Icon Factory! It’s so cool to meet a graphic designer at WWDC.”

Posted in Permanent Maintenance | Leave a comment

Today

Today.

Today I saw for the first time a video (embedding disabled) that demonstrates a new technique that uses Functional Magnetic Resonance Imaging to detect the brain activity of a person and from that reproduce the visual image that person is seeing in real time. If you watch the video, the images on the left represent what the person was seeing, the images on the right represent what the system was able to reconstruct from the live brain scan. This could realistically be expected to work with dreams as well. Technology that (it must accurately be said) reads minds. (You’ll note that our brains really like faces.)

Today CERN announced that it may have detected a particle moving faster than the speed of light. CERN’s OPERA particle detector (Oscillation Project with Emulsion-tRacking Apparatus) in the Italian Alps moved a neutrino at faster than the speed of light. If the discovery is verified it would be direct evidence that contradicts Einstein’s Special Theory of Relativity and would throw much of the understanding of the universe painstakingly built by 20th century physics into doubt.

Today I spent most of my day printing out plastic objects using a small 3D printer that sits on my desk at school. I created many of these objects using a cheap toy 3D scanner.

Frequently, in my field, I have experiences that feel “futuristic”. New gadgets, gizmos and experiences that come my way that will one day be ubiquitous. But today was different. Everything that happened felt part of some new world. Not alien bits protruding in, but a whole new fabric. Still struggling mightily to make sense, even to itself, but having little relationship to the 20th century other than as history.

Posted in Opinion | Leave a comment

On Being Seen by the Machine

The light edged into view through my tightly shut eyelids. Waves of stippled white pulsed down towards a corona of laser pink before spilling into a liquid bokeh of soft hexagons. I tried to relax my eyelids against the involuntary movements of my pupils, briefly seeing strands of eyelash catch the light when my eyes threatened to open. I could feel the light on my skin as heat even though I knew the LEDs were dead cool. It swept across the center of my vision to settle somewhere low and right, crystalline geometries floating where my sight settled back to black.

The intensity wavered momentarily. I felt the invisible presence of the tech moving soundlessly somewhere in front of me. I listened to the chorus of humming coming from the printer room, a cacophony of mistuned exhaust fans and above it someone whistling semi-tunelessly.

How long had the scan been underway? The tech took seemingly endless pauses, saying nothing, the light of the scanner off or at least off my face. Just as I’d gird myself to open my eyes, the light would flare up and I’d squeeze them shut feeling the dots on my face shift subtly, their edges peeling away from the peaks of folding flesh. I straightened my back and willed my facial muscles to stay still. The flickering of the light made my eyes jitter involuntarily beneath their lids in a parody of REM sleep.

The tech had explained that the scanner could find the calibration dots, knew where they should be. So even if I shifted position it would be able to reassemble my face. If I moved too much, he said, the scanner would fail to put my face back together, leaving it a blend between the two expressions. I fought the urge to grimace, arch my eyebrow, or feel the itch tickling the side of my mouth.

I listened to the muffled sound of the shop manager berating a junior employee about links on a website. The light shut off again with finality. I relaxed and felt the inside of my face unstick from my skull as the muscles slackened.

I opened my eyes.

The tech was standing with his back to me, hunched over a laptop suspended on a chest-high shelf, his broad back a vivid swath of red across my vision as I blinked my eyes clear. His short sleeves revealed faded tattoos on his right wrist as it hovered over the computer.

“It will take a few minutes to process,” he said. “I wasn’t setup for you and this is the first time we’ve tried the full resolution.”

I stood up and peered past his shoulder at the screen. Thin strokes outlined a black cube over a pale blue background. The cube enclosed what looked like a picture of my face in three-quarter profile. My skin was rubbery and dead, shiny with flat reflections from a light that seemed to press in from all sides. It was mottled with what looked like lesions where the system had failed to fully fold the tracking dots back into the surrounding texture. A death mask.

The tech tilted my head down, bringing my forehead into closer view. At the hairline my forehead dissolved into a skerry of varicolored slashes, the lines of data from the laser forming a riotous coastline against the calm blue below. The tech toggled the texture on and off. Without it, my head was reduced to a dully gray, revealing the raw geometry. He rotated and probed as he ran various operations designed to close the thousands of small holes in my neck, whose lower reaches sprouted enough extending tendrils to make the whole assemblage resemble the lost android head of Philip K. Dick. My cheeks revealed a distorting ripple that the tech smoothed with a virtual brush.

Finishing cleanup operations the tech demonstrated how his new software could shell my head — expanding it from a volumeless surface into a thin solid. With some of the most fragile fringes paired away, this could even be printed in powder or resin by one of the humming behemoths in the next room. Samples of prints littered the shelves around the office: intricate assemblages in honey-colored translucent resin, a planetary gear and a working wrench in a strangely heavy dark dray plastic, and a few small heads and insects in the gleaming white of the powder printer.

Imagining my shelled face printed in this last material completed its imaginary transformation into a death mask. I could see the object like something out of Greek ritual, the third theatrical mask after comedy and tragedy: mundanity.

After a few more minutes at the controls, which for the first time in the session he punctuated with polite chatter about the sudden storm outside, the tech finished his work. He handed me back my key drive with the results of the scan. The full geometry both cooked and raw. The colored texture file which un-peeled the sides of my face to lay flat, like the skin of a Yakuza gangster mounted to display its tattoos in the Tokyo University museum.

I took the drive, thanked him, and turned to leave. A few steps on, I stopped at the door and looked back, suddenly confused, possessed by a strong sensation that I was somehow shoplifting, that I’d forgotten to pay some unasked cost. The tech disappeared around a sterile mounting table and back into his office, out of view. I turned and continued out and down towards the street. I pushed through a throng of students and out towards the rotating doors. I felt light, sure I’d lost something, but clueless as to what.

Note: this brief story attempts to describe an experience I had earlier this week having my face scanned with the new ZScanner 700 CX 24-bit color laser scanner at NYU’s Advanced Media Studio. Below I present a number of pictures captured in the course of that experience. While I think these images have a powerful visual impact, part of what I wanted to capture in the story was the profound non-visual nature of the experience. It was an experience not of seeing, but of being seen. Hence my decision to present these images at the bottom of this post rather than interspersed throughout. There’s an increasing body of writing that looks at these New Aesthetic or Robot-Readable World images in order to analyze their visual qualities in an attempt to understand how these new seeing technologies picture us. Here instead I wanted to capture an intimate experience of being the conscious subject of this new form of vision — in this case even when our eyes are literally closed in the process. I found the experience simultaneously meditative, reflective, and mildly alienating. I hope this piece of writing communicates some sense of that. Enjoy the pictures. I do.

Head scan with photo texture

IMG_1057

IMG_1054

IMG_1053

IMG_1044

Posted in Art | Leave a comment

Presenting at MakerFaire NYC This Weekend

MakerFaire NYC 2011 is this weekend. It’s a breathtakingly big event that’s equal parts science fair, renn fayre, and trade show from the near future. In a recent interview with Dale Dougherty Anil Dash spoke eloquently about the positive political meaning embodied in the “Maker Movement”, how it focuses us away from the conflict that’s so common in our culture towards the shared desire to figure out “what our country’s going to be when we grow up”.

From that sublime sentiment to the gloriously geeky opportunity to get my hands on the new Makerbot MK7 Stepstruder and the brand new 1.0 version of the Arduino IDE I’ve seen Massimo, David, and Tom working on frantically around ITP this week, I’m very proud to be participating in the Faire.

I’ll be giving two talks at MakeFaire this weekend. I’ll be presenting the Kinect Abnormal Movement Assessment System at the Health 2.0 tent. I’ll explain how the project came about, do my best to describe some of the science behind how it might be able to help, and announce some progress and plans for the near future (we have an intern!). This session will be Saturday morning at 11am.

I’m also be teaching a tutorial session at the ITP Cafe later in the day, starting at 4:30pm. I’ll cover an introduction to using the Kinect for skeleton tracking in Processing and a give little background about how it works. It’s a mini preview of some of the topics in my book.

Whether you’re geeking out or glorying in democratic optimism, I’ll hope to see you there!

Posted in Opinion | Leave a comment

Back to Work No Matter What: 10 Things I’ve Learned While Writing a Technical Book for O’Reilly

I’m rapidly approaching the midway point in writing my book. Writing a book is hard. I love to write and am excited about the topic. Some days I wake excited and can barely wait to get to work. I reach my target word count without feeling the effort. But other days it’s a battle to even get started and every paragraph requires a conscious act of will to not stop and check twitter or go for a walk outside. And either way when the day is done the next one still starts from zero with 1500 words to write and none written.

Somewhere in the last month I hit a stride that has given me the beginnings of a sense confidence that I will be able to finish on time and with a text that I am proud of. I’m currently preparing for the digital Early Release of the book which should happen by the end of the month, which is a big landmark that I find both exciting and terrifying. I thought I’d mark the occasion by writing down a little bit of what I’ve learned about the process of writing.

I make no claim that these ten tips will apply to anyone else, but identifying them and trying to stick by them has helped me. And obviously my tips here are somewhat tied in with writing the kind of technical book that I’m working on and would be much less relevant for a novel or other more creative project.

  1. Write everyday. It gets easier and it makes the spreadsheet happy. (I’ve been using a spreadsheet to track my progress and project my completion date based on work done so far.)
  2. Everyday starts as pulling teeth and then goes downhill after 500 words or so. Each 500 words is easier than the last.
  3. Outlining is easier than writing, if you’re stuck outline what comes next.
  4. Writing code is easier than outlining. if you don’t know the structure, write the code.
  5. Making illustrations is easier than writing code. If you don’t know what code to write make illustrations or screen caps from existing code.
  6. Don’t start from a dead stop. read, edit, and refine the previous few paragraphs to get a running start.
  7. If you’re writing sucky sentences, keep going, you can fix them later. Also they’ll get better as you warm up.
  8. When in doubt make sentences shorter. they will be easier to write and read.
  9. Reading good writers makes me write better. This includes writers in radically different genres from my own (DFW) and similar ones (Shiffman).
  10. Give yourself regular positive feedback. I count words as I go to see how much I’ve accomplished.

A note of thanks: throughout this process I’ve found the Back to Work podcast with Merlin Mann and Dan Benjamin to be…I want to say “inspiring”, but that’s exactly the wrong word. What I’ve found useful about the show is how it knocks down the process of working towards your goals from the pedestal of inspiration to the ground level of actually working every day, going from having dreams of writing a book to being a guy who types in a text file five hours a day no matter what. I especially recommend Episode 21: Assistant to the Regional Monkey. and the recent Episode 23: Failure is ALWAYS an Option. The first of those does a great job talking about how every day you have to start from scratch, forgiving yourself when you miss a day and not getting too full of yourself when you have a solid week of productivity. The second one speaks eloquently of the dangers of taking on a big project (like writing a book) as a “side project”. Dan and Merlin talked about the danger of not fully committing to a project like this. For my part I found these two topics to be closely related. I’ve found that a big part of being fully committed to the project is to forgive myself for failures — days I don’t write at all, days I don’t write as much as I want, sections of the book I don’t write as well as I know I could. The commitment has to be a commitment to keep going despite these failures along the way.

And I’m sure I’ll have plenty more of those failures in the second half of writing this book. But I will write it regardless.

Posted in kinect | Leave a comment

Physical GIF Launches on Kickstarter

I’m proud to announce the launch of Physical GIF on Kickstarter. Physical GIF is a collaboration with Scott Wayne Indiana to turn animated GIFs into table top toys. We use a laser cutter and a strobe light to produce a kind of zoetrope from each animated GIF so you can watch it on your coffee table. Here’s our Kickstarter video which explains the whole process and shows you what they look like in action:

For our Kickstarter campaign we have four main pledge levels. At $50 you get a Physical GIF along with everything you need to play it at home: the strobe, the plastic GIF disc and frames, and the hardware. You can choose from three designs that scott created, BMX Biker:

Elephant-Rabbit Costume Party:

and New York Fourth of July:

For a $100 pledge, we’ll send you a kit with all three of these Physical GIFs.

We’ve also recruited four amazing animated GIF artists to design special limited edition Physical GIFs: Ryder Ripps, Nullsleep, Sara Ludy, and Sterling Crispin. More info about these artists on our project page At $250, you can reserve one of the Physical GIFs from any of these artists. We’re going to be working with them to explore materials and techniques for turning their designs into Physical GIFs. We’re hoping that they explore some of the limitations and possibilities of this new medium. Each of the Physical GIFs they produce will come in a limited numbered edition with documentation from the artist.

And at the top pledge level, we’ll work with you directly to manufacture your own custom Physical GIF from your design. We’ve only made five of this reward available because we want to be able to spend as much time as it takes working with you to turn your animated GIF ideas into physical reality.

We’re incredibly excited about this project and can’t wait to see how people react to it. Head over to Kickstarter right now to give us some help: Physical GIF on Kickstarter. Thanks!

Posted in Art, Business, Physical GIF | Leave a comment

Two Kinect talks: Open Source Bridge and ITP Camp

In the last couple of weeks, I’ve given a couple of public presentations about the Kinect. This post will be a collection of relevant links, media, and follow up to those talks. The first talk, last week, was in Portland, Oregon at Open Source Bridge. It was a collaboration with Devin Chalmers, my longtime co-conspirator. We designed out talk to be as much like a circus as possible. We titled it Control Emacs with Your Beard: the All-Singing All-Dancing Intro to Hacking the Kinect.

Control Emacs with Your Beard: the All-Singing All-Dancing Intro to Hacking the Kinect

Devin demonstrates controlling Emacs with his “beard”.

Our first demo was, as promised in our talk title, an app that let you control Emacs with your “beard”. This app included the ability to launch Emacs by putting on a fake beard, to generate all kinds of very impressive looking C code by waving your hands in front of you (demonstrated above), and to quit Emacs by removing your fake beard. Our second app sent your browser tabs to the gladiator arena. It let you spare or execute (close) each one by giving a caesar-esque thumbs up or thumbs down gesture. To get you in the mood for killing it also played a clip from Gladiator each time you executed a tab.

Both of these apps used the Java Robot library to issue key strokes and fire off terminal commands. It’s an incredibly helpful library for controlling any GUI app on your machine. All our code (and Keynote) is available here: github/osb-kinect. Anyone working on assistive tech (or other kinds of alternative input to the computer) with gestural interfaces should get to know Robot well.

In addition to these live demos, we also covered other things you can do with the Kinect like 3D printing. I passed around the Makerbot-printed head of Kevin Kelly that I made at FOO camp:

Kevin Kelly with 3D printed head

Kevin Kelly with a tiny 3D printed version of his own head.

We also showed Nicholas Burrus’s Kinect RGB Demo app which does all kinds of neat things like scene reconstruction:

Control Emacs with Your Beard: the All-Singing All-Dancing Intro to Hacking the Kinect

Me making absurd gestures in front of a reconstructed image of the room

Tonight I taught a class at ITP Camp about building gestural interfaces with the Kinect in Processing. It had some overlap with the Open Source Bridge talk. In addition to telling the story of the Kinect’s evolution, I showed some of the details of working with Simple OpenNI’s skeleton API. I wrote two apps based on measuring the distance between the user’s hands. The first one simply displayed the distance between the hands in pixels on the screen. The second one used that distance to scale an image up and down and the location of one of the hands to position that image: a typical Minority Report-style interaction.

The key point was: all you really need to make something interactive in a way that the user can viscerally understand is a single number that tightly corresponds to what you’re doing as the user. With just that ITP-types can make all kinds of cool interactive apps.

The class was full of clever people who asked all kinds of interesting questions and had interesting ideas for ways to apply this stuff. I came away with a bunch of ideas for the book, which is helpful because I’m going to be starting the skeleton tracking chapter soon.

Of course, all of the code for this project is online in the
ITP Camp Kinect repo on Github. That repo includes all of the code I showed as well as a copy of my Keynote presentation.

Posted in kinect | Leave a comment

Into The Matrix: Proposal for a Platform Studies Approach to OpenGL

In the last few years, new media professors Ian Bogost (Georgia Tech) and Nick Montfort (MIT) have set out to advance a new approach to the study of computing. Bogost and Montfort call this approach Platform Studies:

“Platform Studies investigates the relationships between the hardware and software design of computing systems and the creative works produced on those systems.”

The goal of Platform Studies is to close the distance between the thirty thousand foot view of cultural studies and the ant’s eye view of much existing computer history. Scholars from a cultural studies background tend to stay remote from the technical details of computing systems while much computer history tends to get lost in those details, missing the wider interpretative opportunities.

Bogost and Montfort want to launch an approach that’s based “in being technically rigorous and in deeply investigating computing systems in their interactions with creativity, expression, and culture.” They demonstrated this approach themselves with the kickoff book in the Platform Studies series for MIT Press:
Racing the Beam: The Atari Video Computer System. That book starts by introducing the hardware design of the Atari and how it evolved in relationship to the available options at the time. They then construct a comprehensive description of the affordances that this system provided to game designers. The rest of the book is a history of the VCS platform told through a series of close analyses of games and how their creators co-evolved the games’ cultural footprints with their understanding of how to work with, around, and through the Atari’s technical affordances.

Bogost and Montfort have put out the call for additional books in the Platform Studies series. Their topic wish list includes a wide variety of platforms from Unix to the Game Boy to the iPhone. In this post, I would like to propose an addition to this list: OpengGL. In addition to arguing for OpenGL as an important candidate for inclusion in the series, I would also like to present a sketch for what a Platform Studies approach to OpenGL might look like.

According to Wikipedia, OpenGL “is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics.” This dry description belies the fact that OpenGL has been at the center of the evolution of computer graphics for more than 20 years. It has been the venue for a series of negotiations that have redefined visuality for the digital age.

In the introduction to his seminal study, Techniques of the Observer: On Vision and Modernity in the 19th Century, Jonathan Crary describes the introduction of computer graphics as “a transformation in the nature of visuality probably more profound than the break that separates medieval imagery from Renaissance perspective”. Crary’s study itself tells the story of the transformation of vision enacted by 19th century visual technology and practices. However, he recognized that, as he was writing in the early 1990s, yet another equally significant remodeling of vision was underway towards the “fabricated visual spaces” of computer graphics. Crary described this change as “a sweeping reconfiguration of relations between an observing subject and modes of representation that effectively nullifies most of the culturally established meanings of the term observer and representation.”

I propose that the framework Crary laid out in his analysis of the emergence of modern visual culture can act as a guide in understanding this more recent digital turn. In this proposal, I will summarize Crary’s analysis of the emergence of modern visual culture and try to posit an analogous description of the contemporary digital visual regime of which OpenGL is the foundation. In doing so, I will constantly seek to point out how such a description could be supported by close analysis of OpenGL as a computing platform and to answer the two core questions that Crary poses of any transformation of vision: “What forms or modes are being left behind?” and “What are the elements of continuity that link contemporary imagery with older organizations of the visual?” Due to the nature of OpenGL, this analysis will constantly take technical, visual, and social forms.

As a platform, OpenGL has played stage to two stories that are quintessential to the development of much 21st century computing. It has been the site of a process of industry standardization and it represents an attempt to model the real world in a computational environment. Under close scrutiny, both of these stories reveal themselves to be tales of negotiation between multiple parties and along multiple axes. These stories are enacted on top of OpenGL as what Crary calls the “social surface” that drives changes in vision:

“Whether perception or vision actually change is irrelevant, for they have no autonomous history. What changes are the plural forces and rules composing the field in which perception occurs. And what determines vision at any given historical moment is not some deep structure, economic base, or world view, but rather the functioning of a collective assemblage of disparate parts on a single social surface.”

As the Wikipedia entry emphasized, OpenGL is a platform for industry standardization. It arose out of the late 80s and early 90s when a series of competing companies (notably Silicon Graphics, Sun Microsystems, Hewlett-Packard, and IBM) each brought incompatible 3D hardware systems to market. Each of these systems were accompanied by their own disparate graphics programming APIs that took advantage of the various hardware systems’ different capabilities. Out of a series of competitive stratagems and developments, OpenGL emerged as a standard, backed by Silicon Graphics, the market leader.

The history of its creation and governance was a process of negotiating both these market convolutions and the increasing interdependence of these graphics programming APIs with the hardware on which they executed. An understanding of the forces at play in this history is necessary to comprehend the current compromises represented by OpenGL today and how they shape the contemporary hardware and software industries. Further OpenGL is not a static complete system, but rather is undergoing continuous development and evolution. A comprehensive account of this history would represent the backstory that shapes these developments and help the reader understand the tensions and politics that structure the current discourse about how OpenGL should change in the future, a topic I will return to at the end of this proposal.

The OpenGL software API co-evolved with the specialized graphics hardware that computer vendors introduced to execute it efficiently. These Graphical Processing Units (GPUs) were added to computers to make common graphical programming tasks faster as part of the competition between hardware vendors. In the process the vendors built assumptions and concepts from OpenGL into these specialized graphics cards in order to improve the performance of OpenGL-based applications on their systems. And, simultaneously, the constraints and affordances of this new graphics hardware influenced the development of new OpenGL APIs and software capabilities. Through this process, the GPU evolved to be highly distinct from the existing Central Processing Units (CPUs) on which all modern computing had previously taken place. The GPU became highly tailored to the parallel processing of large matrices of floating point numbers. This is the fundamental computing technique underlying high-level GPU features such as texture mapping, rendering, and coordinate transformations. As GPUs became more performant and added more features they became more and more important to OpenGL programming and the boundary where execution moves between the CPU and the GPU became one of the central features in the OpenGL programming model.

OpenGL is a kind of pidgin language built up between programmers and the computer. It negotiates between the programmers’ mental model of physical space and visuality and the data structures and functional operations which the graphics hardware is tuned to work with. In the course of its evolution it has shaped and transformed both sides of this negotiation. I have pointed to some ways in which computer hardware evolved in the course of OpenGL’s development, but what about the other side of the negotiation? What about cultural representations of space and visuality? In order to answer these questions I need to both articulate the regime of space and vision embedded in OpenGL’s programming model and also to situate that regime in a historical context, to contrast it with earlier modes of visuality. In order to achieve these goals, I’ll begin by summarizing Crary’s account of the emergence of modern visual culture in the 19th century. I believe this account will both provide historical background as well as a vocabulary for describing the OpenGL vision regime itself.

In Techniques of the Observer, Crary describes the transition between the Renaissance regime of vision and the modern one by contrasting the camera obscura with the stereograph. In the Renaissance, Crary argues, the camera obscura was both an actual technical apparatus and a model for “how observation leads to truthful inferences about the world”. By entering into its “chamber”, the camera obscura allowed a viewer to separate himself from the world and view it objectively and completely. But, simultaneously, the flat image formed by the camera obscura was finite and comprehensible. This relation was made possible by the Renaissance regime of “geometrical optics”, where space obeyed well-known rigid rules. By employing these rules, the camera obscura could become, in Crary’s words, an “objective ground of visual truth”, a canvas on which perfect images of the world would necessarily form in obeisance to the universal rules of geometry.

In contrast to this Renaissance mode of vision, the stereograph represented a radically different modern visuality. Unlike the camera obscura’s “geometrical optics”, the stereograph and its fellow 19th century optical devices, were designed to take advantage of the “physiological optics” of the human eye and vision system. Instead of situating their image objectively in a rule-based world, they constructed illusions using eccentricities of the human sensorium itself. Techniques like persistence of vision and stereography manipulate the biology of the human perception system to create an image that only exists within the individual viewer’s eye. For Crary, this change moves visuality from the “objective ground” of the camera obscura to posses a new “mobility and exchangability” within the 19th century individual. Being located within the body, this regime also made vision regulatable and governable by the manipulation and control of that body and Crary spends a significant portion of Techniques of the Observer teasing out the political implications of this change.

But what of the contemporary digital mode of vision? If interactive computer graphics built with OpenGL are the contemporary equivalent of the Renaissance camera obscura or 19th century stereography, what mode of vision do they embody?

OpenGL enacts a simulation of the rational Renaissance perspective within the virtual environment of the computer. The process of producing an image with OpenGL involves generating a mathematical description of the full three dimensional world that you want to depict and then rendering that world into a single image. OpenGL contains within itself both the camera obscura, its image, and the world outside its walls. OpenGL programmers begin by describing objects in the world using geometric terms such as points and shapes in space. They then apply transformations and scaling to this geometry in absolute and relative spatial coordinates. They proceed to annotate these shapes with color, texture, and lighting information. They describe the position of a virtual camera within the three dimensional scene to capture it into a two dimensional image. And finally they animate all of these properties and make them responsive to user interaction.

To extend Crary’s history, where the camera obscura embodied a “geometric optics” and the stereograph a “physiological optics”, OpenGL employs a “symbolic optics”. It produces a rule-based simulation of the Renaissance geometric world, but leaves that simulation inside the virtual realm of the computer, keeping it as matrices of vertices on the GPU rather than presuming it to be the world itself. OpenGL acknowledges its system is a simulation, but we undergo a process of “suspension of simulation” to operate within its rules (both as programmers and as users of games, etc. built on the system). According to Crary, modern vision “encompasses an autonomous perception severed from any system”. OpenGL embodies the Renaissance system and imbues it with new authority. It builds this system’s metaphors and logics into its frameworks. We agree to this suspension because the system enforces the rules of a Renaissance camera obscura-style objective world, but one that is fungible and controllable.

The Matrix is the perfect metaphor for this “symbolic optics”. In addition to being a popular metaphor of a reconfigurable reality that exists virtually within a computer, the matrix is actually the core symbolic representation within OpenGL. OpenGL transmutes our description of objects and their properties into a series of matrices whose values can then be manipulated according to the rules of the simulation. Since OpenGL’s programming models embeds the “geometric optics” of the Renaissance within it, this simulation is not infinitely fungible. It posses a grain towards a set of “realistic” representational results and attempting to go against that grain requires working outside the system’s assumptions. However, the recent history of OpenGL has seen an evolution towards making its system itself programmable, loosening these restrictions by providing programmers ability to reprogram parts of its default pipeline themselves in the form of “shaders”. I’ll return to this topic in more detail at the end of this proposal.

To illustrate these “symbolic optics”, I would conduct a close analysis of various components of the OpenGL programming model in order to examine how they embed Renaissance-style “geometric optics” within OpenGL’s “fabricated visual spaces”. For example, OpenGL’s lighting model with its distinction between ambient, diffuse, and specular forms of light and material properties would bear close analysis. Similarly, I’d look closely at OpenGL’s various mechanisms for representing perspective, from the depth buffer,to its various blending modes and fog implementation. Both of these topics, light and distance, have a rich literature in the history of visuality that would make for a powerful launching point for this analysis of OpenGL.

To conclude this proposal, I want to discuss two topics that look forward to how OpenGL will change in the future both in terms of its ever-widening cultural application and the immediate roadmap for the evolution of the core platform.

Recently, Matt Jones of British design firm Berg London and James Bridle of the Really Interesting Group, have been tracking an aesthetic movement that they’ve been struggling to describe. In his post introducing the idea, The New Aesthetic, Bridle describes this as a “a new aesthetic of the future” based on seeing “the technologies we actually have with a new wonder”. In his piece, Sensor-Vernacular, Jones describes it as “an aesthetic born of the grain of seeing/computation. Of computer-vision, of 3d-printing; of optimised, algorithmic sensor sweeps and compression artefacts. Of LIDAR and laser-speckle. Of the gaze of another nature on ours.”

What both Jones and Bridle are describing is the introduction of a “photographic” trace of the non-digital world into the matrix space of computer graphics. Where previously the geometry represented by OpenGL’s “symbolic optics” was entirely specified by designers and programmers working within its explicit affordances, the invention of 3D scanners and sensors allows for the introduction of geometry that is derived “directly” from the world. The result is imagery that feel made up of OpenGL’s symbols (they are clearly textured three dimensional meshes with lighting) but in a configuration different from what human authors have previously made with these symbols. However these images also feel dramatically distinct from traditional photographic representation as the translation to OpenGL’s symbolic optics is not transparent, but instead reconfigures the image along lines recognizable from games, simulations, special effects, and the other cultural objects previously produced on the OpenGL platform. The “photography effect” that witnessed the transition from the Renaissance mode of vision to the modern becomes a “Kinect effect”.

A full-length platform studies account of OpenGL should include analyses of some of these Sensor-Vernacular images. A particularly good candidate subject for this would be Robert Hodgin’s Body Dysmophic Disorder, a realtime software video project that used the Kinect’s depth image to distort the artist’s own body. Hodgin has discussed the technical implementation of the project in depth and has even put much of the source code for the project online.

Finally, I want to discuss the most recent set of changes to OpenGL as a platform in order to position them within the framework I’ve established here and sketch some ideas of what issues might be in play as they develop.

Much of the OpenGL system as I have referred to it here assumes the use of the “fixed-function pipeline”. The fixed-function pipeline represents the default way in which OpenGL transforms user-specified three dimensional geometry into pixel-based two dimensional images. Until recently, in fact, the fixed-function pipeline was the only rendering route available within OpenGL. However, around 2004, with the introduction of the OpenGL 2.0 specification, OpenGL began to make parts of the rendering pipeline itself programmable. Instead of simply abiding by the logic of simulation embedded in the fixed-function pipeline, programmers began to be able to write special programs, called “shaders”, that manipulated the GPU directly. These programs provided major performance improvements, dramatically widened the range of visual effects that could be achieved, and placed programmers in more direct contact with the highly parallel matrix-oriented architecture of the GPU.

Since their introduction, shaders have gradually transitioned from the edge of the OpenGL universe to its center. New types of shaders, such as geometry and tessellation shaders, have been added that allow programmers to manipulate not just superficial features of the image’s final appearance but to control how the system generates the geometry itself. Further in the most recent versions of the OpenGL standard (versions 4.0 and 4.1) the procedural, non-shader approach, has been removed entirely.

How will this change alter OpenGL’s “symbolic optics”? Will the move towards shaders remove the limits of the fixed-function pipeline that enforced OpenGL’s rule-based simulation logic or will that logic be re-inscribed in this new programming model? Either way how will the move to shaders alter the affordances and restrictions of the OpenGL platform?

To answer these questions, a platform studies approach to OpenGL would have to include an analysis of the shader programming model, how it provides different aesthetic opportunities than the procedural model, how those differences have shaped the work made with OpenGL as well as the programming culture around it. Further, this analysis, which began with a discussion of standards when looking at the emergence of OpenGL would have to return to that topic when looking at the platform’s present prospects and conditions in order to explain how the shader model became central to the OpenGL spec and what that means for the future of the platform as a whole.

That concludes my proposal for a platform studies approach to OpenGL. I’d be curious to hear from people more experienced in both OpenGL and Platform Studies as to what they think of this approach. And if anyone wants to collaborate in taking on this project, I’d be glad to discuss it.

Posted in kinect | Leave a comment