You’ve seen it in just about every classic noir film: the disheveled suspect is man-handled into an old wooden chair before a small table in a nondescript, dimly lit room. A bright, harsh light is suddenly turned on and tilted, full glare, into the unshaven face. Then the interrogation begins. It’s a crude tactic, but one designed to extract the truth and do it fast. “Turn it off! Turn it off! I’ll talk!”
It turns out there’s a lot of truth to the scene: humans do indeed exhibit emotional and psychic responses to light across its spectrum. In fact, the unlikely field of lighting psychology shows the extent to which lighting plays a role in the human experience. But let’s turn this around. If we so readily respond to lighting, can we make the lighting respond to us? That was the question asked by Modar “JR” Alaoui, CEO of a fascinating deep learning company called Eyeris.
Alaoui set up an experiment. He hooked up a Philips Hue bulb to the output of his Emovu analytics platform, a system comprising deep learning-based emotion recognition software that reads facial micro-expressions. Look into the system’s camera with an angry frown, the bulb turns red. Give it an expression of joy, the light turns to green. Show it surprise, the white light grows brighter. Close your eyes, and the bulb dims. Pretty cool.
Given this simple demonstration, it wasn’t much of a leap, then, to ask, if lighting can respond to our state of being, what else might respond to it?
How about your car?
It turns out that this was an idea that resonated with a growing number of automakers, with whom the company is now engaged. And with every car company now working on ADAS and autonomous driving technologies, the prospect of augmenting their solutions with the capacity to understand the driver’s state of mind, well, imagine the possibilities for enhancing safety, reinventing the driver/occupant experience, or providing any number of customized services, each uniquely tailored to every person in the vehicle. It’s all made possible via the elevated contextual awareness enabled by ambient intelligence.
Using AI to monitor driver attention, cognitive awareness, and emotional distraction
“It all has to do with driver state monitoring,” Alaoui explains. “Mood, attention, what the car guys call cognitive load. It’s all relevant. How frequently does a driver become angry within a timeframe? How many episodes of road rage occur? What is the driver expressing? All of that matters to safety.”
The good news, in the context of developing the deep learning algorithms, is that everybody emotes in the same ways. “Seven emotions have been universalized for three decades,” Alaoui says. “And these states are manifested on the face dynamically even before we know it. Emotions are hard-wired into our brains at birth. In fact, people who are born blind emote the same way as everybody else. So we leverage this as ground truth to argue that you can derive a lot of information about a person’s behavior just from their facial micro-expressions.”
Dubbed the Emovu Driver Monitoring System, it tracks the seven emotional states that the company has translated into slightly more actionable descriptions: positive, neutral, negative, attention, distraction, micro-sleep, and yawning. Using a combination of cameras, graphic processing—and deep learning—the system analyzes each of the passengers in a car, determining from their facial expressions which of the seven emotions they are feeling at any moment. And that includes detecting whether the driver is nodding off behind the wheel. “The micro-sleep metric looks at things like the driver’s blinking frequency, and there is a great deal of information in that. If the driver is yawning, that indicates fatigue. You want the vehicle to be aware of the driver’s state and react accordingly.”
The nature of those reactions, Alaoui is quick to point out, are up to the automakers who integrate the system into their autonomous vehicles. “Our system simply outputs a data stream. The automakers can use that data to determine whether to change the car’s mode from, say, Level 2 autonomy to Level 4, or to provide various driver alerts, as the Toyota Concept-i does via its Yui interface. In any case, vehicles that can operate in an autonomous mode need a driver monitoring system like this to determine if the driver is paying attention. It’s particularly critical when switching modes from autonomous to manual and vice versa.”
Deploying the system
Eyeris maintains a lab in San Jose where they receive vehicles from the automakers and optimize the software for the particular vehicle environments. “We leave it up to their UX teams to determine what cameras they want to use, how many, and where they want them positioned. Likewise for the hardware they want to process this on. Our system is hardware agnostic; it can run on any platform and use any camera.”
Once installed, the Eyeris team builds a dedicated data set for the particular automotive environment. It’s a big project, involving hundreds of people (system-training subjects) over a three- to four-month period. “We build the data set with people of five different races, two genders, and four age groups,” Alaoui adds. “And with all these people, we test and train the system under different lighting conditions, different head poses, different emotions, and with different props—mustache, sunglasses, hats—as well as with a variety of facial occlusions, for example, with the driver holding a cellphone to his ear. We simulate all the conditions of a typical driving experience.”
A Taste of the Secret Sauce
Deep learning methods have settled into fairly standard neural network models and architectures. As such, differentiation and value is increasingly shifting to the training side of the equation, along with the many ways the inference engine that runs the algorithm is optimized. And Eyeris has crafted its algorithms accordingly.
“The deep learning library and algorithm makes use of the data sets which are rigorously annotated,” Alaoui explains. “But it’s not just the data; it’s the actual algorithm that can detect under non-ideal conditions by leveraging data collected under those non-controlled environments. The other significant thing we’ve done is add a temporal dimension to the data, as this has proven to be much more useful to the success of a deep learning algorithm versus just static data. Data that is tied to time, say, a three-second period, is better than a one-second or single-frame data. It takes into consideration the previous frame and the following frame, and that helps with predictability.”
And that, of course, improves accuracy. “We’re after that extra 5 or 6 percent accuracy that the traditional machine learning can’t do. That additional accuracy is extremely important, especially under non-frontal head poses, under low-light conditions, or other non-uniform conditions. It’s because of the number of analytics that we use, the methodology we employ, the deep learning, and the way we train the network that we’re able to detect driver behavior clues better than others.”
Embedded versus the Cloud
Where AI is concerned, particularly in the context of the Internet of Things (and the car is a thing), the cloud figures prominently. Intel, for example, hopes to have Xeon and Atom processors and a modem in every vehicle someday. Data generated by the car’s myriad sensors—LiDAR, medium and short-range radar, ultrasonic sensors, cameras, SoundView—would be sent up to Intel processors running in datacenters, correlating that information with data from other cars on the road, and performing big data analytics, the results of which would ultimately be returned to the driver, warning him of, say, a big road construction project just ahead. And certainly, this will be an important part of the mix, along with other cloud-based apps such as maps. But this scenario is also in stark contrast to Alaoui’s approach, which is wholly local.
“Our view is that the class of analytics we’re performing is time-critical. You don’t have time to send something to the cloud to be processed if the car needs to take over control from manual to autonomous or vice versa. So local processing is key.” And this, of course, suggests an embedded solution, which in turn demands optimizing the solution for a very narrow set of signals.
“We’ve carefully selected the set of signals to deploy so that you can run on an embedded system with no problems at 20 to 30 frames/second. For production purposes, 10 FPS is more than enough. That’s what these cars will be running; they don’t want to overload their processing, and that’s sufficient for detecting the signals of interest. We actually have it running on a $30 Raspberry Pi at 15 FPS. We run on an NIVIDIA Drive CX at 30 FPS. And we have it running on the Intel Joule, a new development board focused on robotics and IoT applications. Deploying deep learning models that are ultra-lightweight is the key; we deploy models whose size is very small. We don’t need to deploy the entire neural network.”
The embedded future only looks brighter as Alaoui looks forward to the eventual availability of vision-specialized chips on which to run his software. “Google’s TPU, Intel’s Nervana, as well as other AIOCs—AI on a chip—from Qualcomm and NIVIDIA will all be very interesting. We’ll port to them all. We believe these developments validate our thesis that the future of AI for ambience intelligence lies in embedded.”
For more information, visit www.emovu.com