How do technologies recognize our emotions and why it is so promising?
How the technologies have learned to recognize emotions
A surge of public and scientific interest in the subject of the emotions detection and recognition, as well as a boom based on the data technology solutions occurred in the second half of 2000 - the beginning of the 2010s. At the turn of 2015-2016, many Russian and foreign startups began to use artificial neural networks to create entertainment applications. For example, you can upload a photo and the application will analyze facial expressions and show the result, where emotions a person experiences on the photo will be indicated in percents.
Such applications have one major problem - usually they all recognize emotions only by facial expressions on the face, not taking into consideration other emotional cues - pose estimation, gestures, eye movements, heart rate, voice and so on. For the recognition of true emotions, more channels are required, therefore, more sophisticated technologies are needed.
Now the emotion detection market is booming and according to western experts, by 2021 it will grow, by various estimates, from $19 billion to $37 billion. Let's take a closer look on what emotions are, what is the market volume and its projected growth and which companies already occupied the niche of emotion detection and recognition systems (EDRS).
Emotion detection and recognition market
Nowadays it is difficult to deny the obvious fact: research in the field of artificial intelligence - fundamental and strictly applied, developed under the aegis and in the interests of the state, as well as financed by corporations, foundations and private business - finally gained independent market status, scale and long-term investment in the industry.
We are talking about a complex, involving many areas, knowledge-based industry that has consistently demonstrated impressive growth rates. Emotion detection as well as the physiological states and behavioral patterns recognition - are the most important component of this industry.“Emotional (or affective) technologies have successfully overcome the initial phase of formation, they are no longer out of reach and have embedded in the current market context.”
The interdisciplinary method underlying them is undeniable: the natural and cognitive sciences (biology, psychophysiology, neurolinguistics) are interact with the data science, computer vision, in-depth training, and technologies for processing visual, acoustic and speech information.
Therefore, it is interesting to look at the market of the systems of detection and recognition of emotions (EDRS - Emotion Detection&Recognition Systems, not forgetting about its youth (new emerging market), plasticity and technological horizons, which are yet to be reached by practical application.
Estimating market volume
Emotion Detection & Recognition Systems (EDRS) and affective computing form their own ecosystem in the field of artificial intelligence (AI). Estimating market size and its prospects for the period until 2022 vary a lot, because they are based on dissimilar metrics and calculation formulas.
However, several consolidated reports issued in late 2016 - early 2017 illustrate the general trend: thus, Markets & Markets believe that the volume of the emotion detection market will grow from $6.72 billion in 2016 to $36.07 billion by 2021. According to Reportlinker and Orbis Research the forecast are more conservative - $29.17 billion / 27.4% and $19.96 billion / 21.2% by 2022, respectively.
“Of course, the numbers will soar up if you try to cover the entire affective computing industry in a broad sense - $53.98 billion by 2021 / 34.7% on an annual basis, as suggested by Markets&Markets. But one should not flatter itself, as it takes into account the enormous contribution market leaders - key corporate players like IBM, Google, Apple, Facebook, Microsoft have made”.
There are three main geographic areas for the industry: the Asia-Pacific region, North America (US and Canada) and the European Union. The most attractive rates are still shown by two channels of emotion analysis: recognition of face microexpressions and smart biosensors built into wearable electronic devices. They are followed by a voice / speech and video oculography (eye tracking).
Whatever discrepancies observed between the figures are, they can be treated with a high degree of confidence: hardware and software solutions now allow you to read and cluster the data with much greater accuracy than ten years ago. One can determine the user's state at any time with the usual web camera and specialized software, analyze through it not only emotions, but also physiological and behavioral aspects, record the slightest changes in mood and well-being of the person caught in the lens.
The most striking example
Perhaps the best example in terms of the market is Affectiva. Affectiva is a MIT Media Lab startup with a mission to bring emotional intelligence to our devices and digital experiences. Rana el Kaliouby and Rosalind Picard, who co-founded Affectiva, are the pioneers of affective computing. The company was founded in 2009 and managed to attract more than $25 million of total investments.
The company has the world's largest database of analyzed individuals - more than five million copies, as well as the priceless experience of a pioneer in a number of industries where he technology of recognition of emotions was not taken into consideration. However, the analysis itself is still conducted only within seven basic emotions and one channel (microexpression of the face).
“In addition, the company offers SDKs and APIs for developers, which allows them to integrate, if necessary, the emotional analysis layer into the new applications or devices”.
Affectiva pay great attention to establish partnerships with business and science. The company launch joint projects in various industries (with Uber, Tesla, Qualtrics and others). Affectiva devote great effort to introduction of the concept of Emotion A.I. (emotional artificial intelligence) in the business environment and the minds of consumers.
So, for example, in cooperation with Jury Lab LLC, the Affectiva solution was tried to use in the course of court hearings with the involvement of jurors whose emotion tracking allows the prosecution and defense parties to sharpen their arguments and build an adequate model of interaction with the participants in the meetings, focusing not only on the letter of the law and rational considerations, but also on an emotional call, which is not always, of course, amenable to rational control.
The bottleneck of Affectiva's strategy is the desire to move not in depth, but in breadth, without reconsidering the gradually obsolete idea of a small set of basic emotions, with a focus on a single module - microexpression, - insufficient for a full interpretation of the individual's emotional state.
Who else is on the market?
If we designate the current configuration and the map of the market of emotional technologies in one phrase, then this is a "simple multiplicity." The paradox is that the market for this very young industry is filled with various solutions and products, but many operating companies are typologically similar (difference in marketing policy), and have not advanced technologically in recent years and can be easily distributed into several niches.
“There are some "old" players on the market, as well as numerous small startups with niche or national specifics. Let’s mention some of them to illustrative examples”.Case 1
Microsoft's Project Oxford is a catalog of ready-made APIs (artificial intelligence APIs), focused on computer vision algorithms. The user is given the opportunity to upload a photo on which the face and the emotions will be detected and, as a result, divided into eight baskets (seven basic emotions plus a neutral state) in a percentage of five digits after the comma for each position.
The project gained its viral popularity last year both because of user's excitement, and because of occasional absurdities in the interpretation of expressions. For example, the system suggested, that Keanu Reeves’s sadness on one of the photos was 0.01831.
You can try your own sad or happy face over at Microsoft's Project Oxford page, and see whether the evaluation of the machine is correct and how far it differs from the everyday perception.
A more serious case for the mimetic detection of emotions is FaceReader service from the Netherlands company Noldus Information Technology. The program is able to interpret the person's microexpressions, distributing them to the same eight categories: joy, sadness, anger, surprise, fear, disgust, contempt and neutral.
“FaceReader can accurately "catch" the direction of sight, сan fix the orientation of the head, can determine age, gender and other personal characteristics. The program is based on computer vision technology”.
In particular, we are talking about the Active Template method, which consists in imposing a deformable template on the image of the person, and the Active Appearance Model method, which allows to create an artificial face model by reference points taking into account surface details.
According to the developers, the classification takes place by means of neural networks with training core of 10 thousand frames.
Emovu service from California company Eyeris operates in a similar vein. The solution incorporates computer vision and deep learning algorithms to analyze a large set of symptoms (microexpression, eye tracking, blinking, head inclinations, etc.), which is also used to read the emotional response, involvement and interest of a person with the content that is shown to him in the video stream.
The company closely follows trends in the field of intelligent transportation systems and cooperates with car concerns (such as Toyota and Honda) to introduce technologies for recognizing emotions in fully autonomous unmanned vehicles, where the emphasis in software equipment and services shifts to passengers, their feelings and needs.
Сlinical trials are often resorted to physiology as a source of information about a person's emotions. For example, this method of detecting emotions was built into the biofeedback method when the patient is returned to the screen of the current values of his physiological parameters determined by the clinical protocol: cardiogram, heart rate, electrical activity of the skin, and others.
As for the experiments that address the subcutaneous flow of the face (changing the color of the pixel) it is worth to mention the NuraLogix startup from the University of Toronto, with its highly sensitive transdermal optical technology, tuned into the isolation of hidden emotions, but still highly dependent on the sources of light and other environmental conditions.
Pilot tests in several Toronto department stores are encouraging, and grants from Canadian innovation support centers (> $ 100,000) allow the company to prepare releases of updated versions of the program.
Such techniques have found application in other spheres. For example, the definition of emotions by physiological data is the main function of the MindWave Mobile device from NeuroSky. The device is put on the head and triggers the built-in brain activity sensor. It establishes the degree of concentration, relaxation or anxiety of a person, rating it on a scale of 1 to 100.
MindWave Mobile adapts the method of electroencephalography registration that is adopted in scientific studies. The only difference is that system is equipped with only one electrode, in contrast to laboratory installations, where their number can be much more than ten.
The struggle for a healthy lifestyle in urban contexts of cities has generated a demand for fitness bracelets, and emotional component organically fit in it.
For example, Sentio Solutions has designed a stylish Feel bracelet accessory that tracks, recognizes and collects information about human emotions throughout the day. At the same time, the mobile application offers recommendations that should form positive habits for the user.
Build-in sensors observe multiple physiological parameters, such as pulse, electrodermal activity (EDA), skin temperature, and the system's algorithms translate biological signals into the "language" of emotions.
One of the undisputed leaders of emotional voice technologies is the Israeli company Beyond Verbal Communications. Possessing the knowledge collected during 21 years of academic research, that from 35 to 40% of the minimum of emotional information transmitted in human communications is contained in vocal intonations, the software product of the company analyzes raw ten-second records (samples) and extracts from them the data evidencing about emotions, mood, speech habits, stress, and also the state of health of a person.
The company is proud of its database of approximately 2.3 million unique samples of voice in forty languages. Today's long-term strategy of the project is the intersection of digital medicine and embedded virtual assistants: the definition of emotion over the voice removes the problem of the many blind spots inherent in classic VPAs (like Amazon Echo), intensifying the process of ubiquitous introduction of smart assistants.
What is the reason to track the movement of the eyes?
Of course, we can not miss the topic of tracking emotions based on the movements of the eyes, the main parameters of which are fixations and saccades. The most common method of this registration is called video oculography or eye tracking.
“Video oculography is used in science, in gaming industry and online marketing (neuromarketing). The decisive value when buying online is the location of information about the product that promotes conversion. The positions of banners and other advertising products matters a lot.”
For example, Google is working on designing display surfaces on the issuance page using the tracking software to generate the most effective offers for advertisers. An analytical approach to videooculography help web designers to ensure that information above the fold is better perceived by users.
We created a prototype Eye Catcher 0.1 software tracker, which allows you to extract eye and head movements from video files recorded on a regular camera. This technology opens new horizons in the study of human eye movements in natural conditions and significantly expands the research capabilities.
In general, the eye tracing industry has seen a curious phenomenon of a sharp decline in the number of independent players. The largest manufacturers of oculographic systems - like the Swedish Tobii and the Canadian Eyelink - draw additional resources for usability tests and other needs from external sources and strengthen their semi-monopoly positions.
Along with this, corporations are buying up mid-size companies and startups:
1) Google acquired Eyefluence,
2) Facebook buys EyeTribe,
3) Apple buys the German company SMI with its branded technology to capture and record views in real-time mode with a sampling rate of up to 120 Hz.
There are many upcoming forks in the industry as well as the researches and technological nodes that need to be solved. Yet a long prospect (at least until the middle-end of the 2020s) is obvious.
“R&D teams, competence centers, laboratories working in the field of "emotional computation" and EDRS will certainly find a way to go beyond the narrow limits of mono or bi-channel logic, and therefore, come close to real complex multimodal technologies for detecting and interpreting complex emotional states”.
The best minds of mankind are trying to understand how psychological phenomenons are arranged, what are the neurobiological mechanisms to reproduce them within the computing systems, inspired by the gradually accumulating knowledge of the work of the neocortex.
The acceleration along this path is inevitable, and Kurzweil adheres this point of view. We need to be patient. Anyhow the engineering of EI (emotional intelligence) brings the era of artificial intelligence closer: a truly humanized machine will become intelligent when the logic will be balanced by feelings, emotions, sensations.
In the conclusion, paraphrasing Descartes, the reality of the 21st century often boils down to the concise formula Sentio, ergo sum (I feel, therefore, I exist). Hardly anything will change dramatically here.