Machine Pareidolia: Hello Little Fella Meets FaceTracker

In a recent post on the BERG blog, Gardens and Zoos, Matt Jones explored a series of ideas for designing personality and life into technology products. One of the most compelling of these takes advantage of pareidolia, the natural human inclination to see faces everywhere around us.


Jones’s slide introducing pareidolia.

Jones advocates designing faces into new technology products as a way of making them more approachable, using pareidolia to give products personality and humanize them without climbing all the way down into the Uncanny Valley. He even runs a Flickr group collecting images of pareidolia-inducing objects: Hello Little Fella!

Lately I’ve been thinking a lot about faces. I’ve had mine scanned and turned it into a digital puppet. I’ve been working extensively with face tracking, building a series of experiments and prototypes with Kyle McDonald’s ofxFaceTracker, an OpenFrameworks frontend to Jason Saradigh’s excellent FaceTracker project. Most publicly so far, I demonstrated that FaceTracker can track hand-drawn faces.

Using FaceTracker OSC to draw in Processing

Accessing FaceTracker data in Processing.

Facial recognition techniques give computers their own flavor of pareidolia. In addition to responding to actual human faces, facial recognition systems, just like the human vision system, sometimes produce false positives, latching onto some set of features in the image as matching their model of a face. Rather than the millions of years of evolution that shapes human vision, their pareidolia is based on the details of their algorithms and the vicissitudes of the training data they’ve been exposed to.

Their pareidolia is different from ours. Different things trigger it.

Face In The Window

Face in the Window. FaceTracker seeing a face in a window at CMU’s Studio for Creative Inquiry during Art && Code.

After reading Jones’s post, I came up with an experiment designed to explore this difference. I decided to run all of the images from the Hello Little Fella Flickr group through FaceTracker and record the result. These images induce pareidolia in us, but would they do the same to the machine?

Using the Flickr API, I pulled down 681 images from the group. I whipped up an OpenFrameworks app that loaded each image and passed it to FaceTracker for detection, saving an image of the resulting face if it was detected. The result was that FaceTracker detected a face in 50 of the images, or about 7%.

When I looked through the results I found that they broke down into three different categories in terms of how the face detected by the software related to the face that a person would see in the photo: agreement, near agreement, and totally other. Each of these categories reveals a different possible relationship between the human vision system and the software vision system. Significantly I also found that I had a different emotional reaction to each of these types of results. I think the spectrum of possibilities outlined by these three categories is one we’re going to see a lot as we find ourselves surrounded by more and more designed objects that are embedded with computer vision. At the end of this post I’ll share some ideas about the repercussions this might have for the design of the Robot-Readable World, both for the robots themselves and the things we create for them to look at.

But first a little more about each of the categories.

Agreement

Agreement happens when the face tracking system detects exactly the part of the scene that originally induced pareidolia in the photographer, inspiring them to take the photo in the first place. In many ways these are the most satisfying results. They give you the confirming feeling that YES it saw just what I saw. Here are some results that show Agreement:

450

320

281

This one is rather good. I hadn’t really even been able to see the face in this cookie until the app showed it to me.

508

I think this one is especially exciting because there’s an inductive implication that it could see all of these:

201

50

One major ingredient of Agreement seems to be a clearly defined boundary around the prospective face’s features. I discovered something similar when experimenting with getting FaceTracker to see hand-drawn faces.

Near Agreement

The next category is Near Agreement. Near Agreement takes place when some — but not all — facial features the algorithm picks out match those a human eye would see.

For example, here’s a case where it sees the same eyes as I do, but we disagree about the nose and mouth.

28

I see the black hole there as the mouth of the little fella. The algorithm sees that as his nose and the shift in the reflection below that as the mouth.

When these kinds of Near Agreements occur I find myself going through a quick series of emotions. Excitement: it sees it! Let down: oh, but that’s not quite it. Empathy: you were so close; just a little to left, I see where you went wrong…

662

Got the mouth right, but the eyes were just a little too far out of reach:

633

The back of this truck I actually find quite compelling. I think the original photographer was thinking of arrows at the top as the eyes and the circular extrusion as the border of the face. But now, having seen the face that the algorithm detected, I can actually see that face more clearly than the one I think the photographer intended.

468

369

181

Totally Other

This last category is the one I find the most fascinating. Sometimes FaceTracker would detect a face in a part of the image totally separate from the face the image was intended to capture. Something in that portion of the image, which frequently looked like an undifferentiated portion of some surface, or a bit of seemingly meaningless detail, triggered the system’s pattern for a face.

These elicit the most complex emotional response of all. It starts off with “huh?”, a sense of mystification about what the algorithm could be responding to. Then there’s a kind of aesthetic of the glitch. “Oh it’s a screw up, how funny and slightly troubling”. But then finally, the more of these I saw, the more the effect started to feel truly other: like a coherent, but alien idea of what faces were. It made me wonder what I was missing. “What is it seeing there?” It’s a feeling akin to having a conversation with someone who’s gradually losing interest in what you’re saying and starting to scan the room over your shoulder.

445

438

436

29

You can see the rest of the 50 photos in my Machine Pareidolia set on Flickr.

So what can we learn from these results? Let’s return to Mr. Jones for a moment. He explained his interest in human pareidolia thusly:

One of the prime materials we work with as interaction designers is human perception. We try to design things that work to take advantage of its particular capabilities and peculiarities.

As designers of the Robot-Readable World we need to have a similar sense of the capabilities and peculiarities of this new computational perception. Hopefully this experiment can give us some sense of the texture of that perception, an idea of how much of its circle overlaps with ours in the venn diagram of vision systems and how the non-overlapping parts look and behave.

Human-machine venn diagram

Posted in Art | 2 Comments

26 Books in 2011

Last year, I read 43 books, a relatively high annual total for me. This was largely due to spending so much time that year working on a stop-motion animated music video which lead to a huge amount of audio book listening. This year, I read much less. The two main factors in this falloff were my busy last semester at ITP and the fact that I spent much of the second half of the year writing a book. The total for this year came out to 26 books. Plus an eight additional comics, an area I’ve started dabbling in due to the influence of Matt Jones and Jack Schulze of BERG London who I had the pleasure to meet this year.

Looking at the list, the topics of this year’s books much resemble the list from last year with sci-fi and special effects behind-the-scenes making up the lionshare. Of these, I wanted to specially point out The Gone-Away World by Nick Harkaway, which I just finished recently. It’s a great weird mix of post-apocalyptic sci-fi, coming-of-age college novel, and Tarrantino-esque madcap kung-fu. But somehow darker and more moving than that description makes it sound. There are also a few tech/business history books: the Steve Jobs bio, Steven Levy on Google, The Toyota Way, and The Gun by CJ Chivers, which is an excellent history of the AK-47 and one of the best books on design I’ve ever read.

Here are the comics I read this year (I would link to these too, but, weirdly enough, I have no clue of the best place to acquire them online having, amazingly, actually bought nearly all of them from in-person “stores” such as Forbidden Planet and St. Marks Comics.):

  • SVK by Warren Ellis
  • Invincible Iron Man: The Five Nightmares by Matt Fraction
  • Invincible Iron Man: Extremis by Warren Ellis
  • Transmetropolitan Vol 1 by Warren Ellis
  • Planetary Vol 1 by Warren Ellis
  • The Punisher: Born by Garth Ennis
  • The Punisher MAX, Vol 1 by Garth Ennis
  • Usagi Yojimbo Book 2: Samurai by Stan Sakai
Posted in Opinion | Leave a comment

A Personal Fabrication Nightmare

Just received the following story from my friend Devin Chalmers. I asked for his permission to publish it because I think it is telling and disturbingly likely to come true.

I had a personal fabrication nightmare last night. I’d just gotten off a roller coaster, and at the photo booth where you can get commemorative prints of your shit-your-pants face they had just gotten a whole 3D printing/lasercutter workflow set up. I was overwhelmed by the choices of materials and patterns: the sample book was like 40 pages long. They could do steins, shot glasses, brass plaques, 3D and 2.5D scene reconstructions, six different sorts of wood, marquetry, choices of how to define figure and ground—it was all very confusing. I came back after an hour to let the crowd die down and I still couldn’t decide what the best way to physicalize my roller coaster adventure would be. I awoke still anxious.

Posted in Opinion | Leave a comment

Announcing ofxaddons.com, a directory of OpenFrameworks extensions

At Art && Code 3D a few weeks back I met James George. We immediately found we had a lot in common, kicking off a wide-ranging conversation about everything from miniature worlds to Portland food carts to ways of making the OpenFrameworks community more accessible. On this last topic, we even conceived a project: an website that searches Github for OpenFrameworks addons written by the community and indexes them for easier discovery. Today, I’m proud to announce the launch of exactly that site: ofxaddons.com.

The site features nearly 300 addons that we’ve divided into 13 categories: Animation, Bridges, Computer Vision, Graphics, GUI, Hardware Interface, iOS, Physics, Sound, Typography, Utilities, Video/Camera, and Web/Networking. We’ve also put together a how-to guide on creating your own addons. That guide includes standards for how to structure an addon so it is easy to install and will work smoothly for all users of OpenFrameworks. It’s based on the emerging standards coming out of the community of addon authors.

While categorizing them, James and I came across a bunch of really remarkable addons. In the rest of this post, I want to highlight a few of the addons that most struck us.

ofxGrabCam

ofxGrabCam by Elliot Woods provides an intuitive interactive camera for 3D apps. It was inspired by the camera in Google Sketchup: it uses the z-buffer to automatically select the object that’s under your mouse when you click as the center of your translations and rotations. Here’s a video Elliot made showing it in action:

And here’s Elliot’s full write-up. Rumor on the street is that this might make it into OF core in a future version, so check it out now.

ofxGifEncoder and ofxGifDecoder

Both by Jesus Gollonet, this pair of libraries lets you create and parse animated GIFs. ofxGifEncoder does the creating and ofxGifDecoder does the parsing. You can create GIFs programmatically to look however you want. The animated GIF above shows an awesome glitch I achieved recently while screwing up some pixel math on one of the sample OF videos.

FUGIFs is an app that use ofxGifEncoder to automatically turn video files into animated GIFs. Sounds like it was made by a frustrated designer of animated flash banners. Useful.

ofxGts

ofxGts is an addon from Karl D.D. Willis that wraps the Gnu Triangulated Surface Library, a useful set of tools for dealing with 3D surfaces. GTS can add vertices to meshes to make them smoother (as shown in the horse model illustrated above), it can simplify models, it can decompose models into triangle strips, etc., etc.

Karl’s version of the addon seems to have some compatibility issues with OF 007 so James put together a fork that fixes those: obviousjim/ofxGts. Merge that pull request Karl!

ofxKyonyu: Kinect Breast Enlarger

This addon by novogrammer was too absurd not to share. It seems (the site (and most of the documentation/code comments) is in Japanese) to use the Kinect to enlarge the breasts of people it detects. I’m sure this will get reused in tons of projects.

ofxSoftKeyboard

ofxSoftKeyboard

Here’s a great addon that could have a lot of application in accessibility and kiosk work: ofxSoftKeyboard by Lensley. This addon provides an onscreen software keyboard that generates key events when the user clicks (or taps, etc.) on a key. It works well and they’ve already accepted James’ pull request updating it to full OF 007 compatibility!

ofxUeye

Last, but not least, we’ve got this addon which provides an interface to the GigE uEye SE, a small form-factor Gigabit Ethernet camera that looks really useful. It’s windows only at the moment so we haven’t been able to actually run it, but it seems quite well put together.

That’s just a sampling of all of the great addons that are available. If you browse around the site for just a few minutes I’ll bet you’ll be amazed at what you find. In fact, I bet, like me, you’ll immediately think of three projects ideas just seeing what kinds of cool things are possible.

Posted in Opinion | Tagged | Leave a comment

Streaming Kinect skeleton data to the web with Node.js

This past weekend, I had the honor of participating in Art && Code 3D, a conference hosted by Golan Levin of the CMU Studio for Creative Inquiry about DIY 3D sensing. It was, much as Matt Jones predicted, “Woodstock for the robot-readable world”. I gave two talks at the conference, but those aren’t what I want to talk about now (I’ll have a post with a report on those shortly). For the week before the start of the actual conference, Golan invited a group of technologists to come work collaboratively on 3D sensing projects in an intensive atmosphere, “conference as laboratory” as he called it. This group included Diederick Huijbers, Elliot Woods, Joel Gethin Lewis, Josh Blake, James George, Kyle McDonald, Matt Mets, Kyle Machulis, Zach Lieberman, Nick Fox-Gieg, and a few others. It was truly a rockstar lineup and they took on a bunch of hard and interesting projects that have been out there in 3D sensing and made all kinds of impressive progress.

One of the projects this group executed was a system for streaming the depth data from the Kinect to the web in real time. This let as many as a thousand people watch some of the conference talks in a 3D interface rendered in their web browser while they were going on. An anaglyphic option was available for those with red-blue glasses.

I was inspired by this truly epic hack to take a shot at an idea I’ve had for awhile now: streaming the skeleton data from the Kinect to the browser. As you can see from the video at the top of this post, today I got that working. I’ll spend the bulk of this post explaining some of the technical details involved, but first I want to talk about why I’m interested in this problem.

As I’ve learned more and more about the making of Avatar, amongst the many innovations, one struck me most. The majority of the performances for the movie were recorded using a motion capture system. The actors would perform on a nearly empty motion capture stage, just them, the director, and a few technicians. After they had successful takes, the actors left the stage, the motion capture data was edited, and James Cameron, the director, returned. Cameron was then able to play the perfect, edited performances back over-and-over ad infinitum as he chose angles using a tablet device that let him position a virtual camera around the virtual actors. The actors performed without the distractions of a camera on a nearly black box set. The director could work for 18 hours on a single scene without having to worry the actors getting tired or screwing up any takes. The performance of the scene and the rendering of it into shots had been completely decoupled.

I think this decoupling is very promising for future creative filmmaking environments. I can imagine an online collaborative community triangulated between a massively multiplier game, an open source project, and a traditional film crew where some people contribute scripts, some contribute motion capture recorded performances of scenes, others build 3D characters, models, and environments, still others light and frame these for cameras, still others edit and arrange the final result. Together they produce an interlocking network of aesthetic choices and contributions that produce not a single coherent work, but a mesh of creative experiences and outputs. Where current films resemble a giant shrink-wrapped piece of proprietary software, this new world would look more like Github, a constantly shifting graph of contributions and related evolving projects.

The first step towards this networked participatory filmmaking model is an application that allows remote real time motion capture performance. This hack is a prototype of that application. Here’s a diagram of its architecture:

Skelestreamer architecture diagram

The source for all of the components is available on Github: Skelestreamer. In explaining the architecture, I’ll start from the Kinect and work my way towards the browser.

Kinect, OpenNI, and Processing

The Processing sketch starts by accessing the Kinect depth image and the OpenNI skeleton data using SimpleOpenNI, the excellent library I’m using throughout my book. The sketch waits for the user to calibrate. When they user has calibrated, it begins capturing the position of each of the user’s 14 joints into a custom class designed for the purpose. The sketch then sticks these objects into a queue, which is consumed by a separate thread. This separate thread takes items out of the queue, serializes them to JSON, and sends them to the server over a persistent socket connection that was created at the time the user was calibrated and we began streaming. This background thread and queue is a hedge against the possibility of latency in the streaming process. Right now as I’ve been running everything on one computer, I haven’t seen any latency, the queue nearly always runs empty. I’m curious to see if this level of throughput will continue once the sketch needs to stream to a remote server rather than simply over localhost.

Node.js and Socket.io

The server’s only job is to accept the stream from the Processing sketch and forward it on to any browsers that connect and ask for the data. In theory I thought this would be a perfect job for Node.js and it turned out I was right. This is my first experience with Node and while I’m not sure I’d want to build a conventional CRUD-y web app in it, it was a joy to work with for this kind of socket plumbing. The Node app has two components: one of them listens on a custom port to accept the streaming JSON data from the Processing sketch. The other component accepts connections on port 80 from browsers. These connections are made using Socket.io. Socket.io is a protocol meant to provide a cross-browser socket API on top of the the rapidly evolving state of adoption of the Web Sockets Spec. It includes both a Node library and a client javascript library, both of which speak the Socket.io protocol transparently, making socket communication between browsers and Node almost embarrassingly easy. Once a browser has connected, Node begins streaming the JSON from Processing to it. Node acts like a simple t-connector in a pipe, taking the stream from one place and splitting it out to many.

Three.js

At this point, we’ve got a real time stream of skeleton data arriving in the browser: 45 floats representing the x-, y-, and z-components of 15 joint vectors arriving 30 times a second. In order to display this data I needed a 3D graphics library for javascript. After the Art && Coders’ success with Three.js, I decided to give it a shot myself. I started from a basic Three.js example and was easily able to modify it to create one sphere for each of the 15 joints. I then used the streaming data arriving from Socket.io to update the position of each sphere as appropriate in the Three.js render function. Pointing the camera at the torso joint brought the skeleton into view and I was off to the races. Three.js is extremely rich and I’ve barely scratched the surface here, but it was relatively straightforward to to build this simple application.

Conclusion

In general I’m skeptical of the browser as a platform for rich graphical applications. I think a lot of the time building these kinds of apps in the browser has mainly novelty appeal, adding levels of abstraction that hurt performance and coding clarity without contributing much to the user experience. However, since I explicitly want to explore the possibilities of collaborative social graphics production and animation, the browser seems a natural platform. That said, I’m also excited to experiment with Unity3D as a potential rich client environment for this idea. There’s ample reason to have a diversity of clients for an application like this where different users will have different levels of engagement, skills, comfort, resources, and roles. The streaming architecture demonstrated here will act as a vital glue binding these diverse clients together.

One next step I’m exploring that should be straightforward is the process of sending the stream of joint positions to CouchDB as they pass through Node on the way to the browser. This will automatically make the app into a recorder as well as streaming server. My good friend Chris Anderson was instrumental in helping me get up and running with Node and has been pointing me in the right direction for this Couch integration.

Interested in these ideas? You can help! I’d especially love to work with someone with advanced Three.js skills who can help me figure out things like model importing and rigging. Let’s put some flesh on those skeletons…

Posted in kinect | 9 Comments