Emotional Robots Aim to Read Human Feelings

1. Introduction

Emotions are fundamental aspects of the homo and affect decisions and deportment. They play an important role in advice, and emotional intelligence, i.due east., the ability to sympathize, apply, and manage emotions (Salovey and Mayer, 1990), is crucial for successful interactions. Affective computing aims to endow machines with emotional intelligence (Picard, 1999) for improving natural man-motorcar interaction (HMI). In the context of man-robot interaction (HRI), it is hoped that robots tin exist endowed with human-similar capabilities of observation, estimation, and emotion expression. Emotions accept been considered from three chief points of view as follows:

• Formalization of the robots own emotional land: the inclusion of emotional traits into agents and robots can improve their effectiveness and adaptiveness and enhance their believability (Hudlicka, 2011). Therefore, the pattern of robots in the terminal years has focused on modeling emotions past defining neurocomputational models, formalizing them in existing cognitive architectures, adapting known cerebral models, or defining specialized affective architectures (Cañamero, 2005, 2019; Krasne et al., 2011; Navarro-Guerrero et al., 2012; Reisenzein et al., 2013; Tanevska et al., 2017; Sánchez-López and Cerezo, 2019);

• Emotional expression of robots: in complex interaction scenarios, such every bit assistive, educational, and social robotics (Fong et al., 2003; Rossi et al., 2020), the ability of robots to exhibit recognizable emotional expressions strongly impacts the resulting social interaction (Mavridis, 2015). Several studies focused on exploring which modalities (e.grand., face expression, body posture, movement, vocalization) can convey emotional information from robots to humans and how people perceive and recognize emotional states (Tsiourti et al., 2017; Marmpena et al., 2018; Rossi and Ruocco, 2019);

• Power of robots to infer the human emotional state: robots able to infer and interpret human emotions would be more than effective in interacting with people. Recent works aim to design algorithms for classifying emotional states from different input modalities, such every bit facial expression, torso language, vocalism, and physiological signals (McColl et al., 2016; Cavallo et al., 2018).

In the post-obit, nosotros focus on the third attribute, reporting contempo advances in emotion recognition (ER), in particular in the HRI context. ER is a challenging task, in item when performed in bodily HRI, where the scenario could highly differ from the controlled environment in which most of recognition experiments are unremarkably performed. Moreover, the presence itself of the robot represents a bias, since the robot presence, embodiment, and behavior could affect empathy (Kwak et al., 2013), elicit emotions (Guo et al., 2019; Saunderson and Nejat, 2019; Shao et al., 2020), and impact feel (Cameron et al., 2015). For these reasons, we limit our report to articles that perform ER in actual HRI with physical robots. Our aim is to summarize the state of the art and existing resources for the design of emotion-enlightened robots, discuss the characteristics that are desirable in HRI, and offering a perspective about future developments.

Literature search has been carried out by querying Google Scholar¹, Scopus², and WebOfScience³ databases with basic keywords from HRI and ER domains, and limiting the search to the last years (≥2015). Submitted queries were as follows:

• Google Scholar: allintitle: "homo–robot interaction"| "homo robot interaction" |hri emotion|affective, resulting in 80 documents.

• Scopus: TITLE (("human–robot interaction" OR "human being robot interaction" OR hri) AND (emotion* OR affective)) OR KEY (("human–robot interaction" OR "human being robot interaction" OR hri) AND (emotion* OR affective)), resulting in 629 documents.

• WebOfScience: TI=(("man robot interaction" OR "human–robot interaction" or hri) AND (emotion* or affective)) or AK=(("man robot interaction" or "man–robot interaction" OR hri) AND (emotion* or affective)), resulting in 201 documents.

Past using a simple script based on the edit distance between manufactures titles, nosotros looked for repeated items betwixt search engines results. Notation that 425 papers were only on Scopus, 22 on Google Scholar, eleven on WebOfScience, and 38 documents were returned past all the search engines. Between the remaining articles, 150 were both on Scopus and WebOfScience, 16 on Scopus and Google Scholar, and 1 on WebOfScience and Google Scholar. We merged the results in a single list of 664 items. Since we used loose selection queries, resulting manufactures were highly heterogeneous and most of them were out of the telescopic of our review. Therefore, we after selected published manufactures that addressed ER in HRI, reporting significant results with respect to the recent literature, and that (a) performed emotion recognition in an actual HRI (i.e., where at least a concrete robot and a subject area were included in the testing phase), reporting results; (b) were focused on modalities that could be acquired, during HRI, by using both robot's embedded sensors or external devices: facial expression, body pose and kinematics, voice, encephalon activeness, and peripheral physiological responses; and (c) relied on either discrete or dimensional models of emotions (encounter section 2.1). This phase allowed to select 14 manufactures. During the procedure, however, we besides looked at the references of the selected newspaper in gild to detect other works that fit our inclusion criteria. In this mode, 3 articles were added to this review. Finally, we organized the resulting manufactures by because modalities and emotional models.

2. State of the Art

2.1. Emotional Models

A fundamental concept in ER is the model used to represent emotions, since it affects the formalization of the trouble and the definition and separation of classes. Several models have been proposed for describing emotions, as reported in Table one. The main distinction is betwixt chiselled models, in which emotions consist of discrete entities associated with labels, and dimensional models, in which emotions are defined by continuous values of their describing features, unremarkably represented on axes.

www.frontiersin.org

Table 1. Emotional models.

It is difficult to say which model is better for representing emotions, the debate is still open and it is strictly related to the nature of emotions, with a lack of consensus (Lench et al., 2011; Lindquist et al., 2013). Some scholars claim that emotions are discrete natural kinds associated with categorically singled-out patterns of activation at the level of autonomic nervous system (Kragel and LaBar, 2016; Saarimäki et al., 2018). Others point out intra-emotion differences and the overlap of different emotions with respect to observed beliefs end autonomous activity (Siegel et al., 2018). Since the purpose of this work is non to support 1 or the other thesis, we step back from this discussion and focuses on the usability of the models. From the point of view of the recognition process, the possibility to identify distinct emotions is, ideally, the simplest method. Unfortunately, as the number of emotions considered increases, it becomes more hard to distinguish between classes. On the other hand, despite the usefulness of obtaining information nearly features of emotions (e.grand., valence and arousal), a pocket-size number of dimensions could lead to an over-simplification and to an "overlap" of different emotions that share similar values of features (Liberati et al., 2015). For this reason, the choice of informative and peradventure uncorrelated dimensions is critical (Trnka et al., 2016). Datasets are often annotated past using both chiselled and dimensional models, simply it could exist observed that those employing discrete models practise not always employ the same labels, that is annotated emotions differ both in number and names. Conversely, several dimensional annotated datasets share a common valence-arousal (VA) (Russell, 1980) representation, which allows to compare and merge data from different datasets, in the worst case by ignoring additional axes. When annotating a dataset, a good exercise is to provide at to the lowest degree the VA labels.

Tabular array A1 in Supplementary Cloth reports a non-comprehensive list of datasets that can be used to train and test ER approaches.

two.ii. Facial Expressions

A natural way to find emotions is the analysis of facial expressions (Ko, 2018). Conventional facial emotion recognition (FER) systems aim to detect the face region in images and to compute geometric and appearance features, which are used to railroad train machine learning (ML) algorithms (Kumar and Gupta, 2015). Geometric features are obtained by identifying facial landmarks and by computing their reciprocal positions and action units (AUs) (Ghimire and Lee, 2013; Suk and Prabhakaran, 2014; Álvarez et al., 2018), while appearance-based features are based on texture data (Turan and Lam, 2018).

In recent years, deep learning (DL) approaches have emerged. DL aims to develop finish-to-end systems to reduce the dependency from manus-crafted features, pre-processing, and characteristic extraction techniques (Ghayoumi, 2017). Notably, convolutional neural networks (CNNs) have been proven to be particularly efficient in this task (Mollahosseini et al., 2017; Zhang, 2017; Refat and Azlan, 2019).

When dealing with video-clips, also the temporal components of information can be exploited. In traditional FER, this is usually accomplished by including in the features vector information almost landmarks deportation between frames (Ghimire and Lee, 2013). In DL approaches, temporal data is handled by means of specific architectures and layers, such as recurrent neural network (RNN) and long-short term memory (LSTM) (Ebrahimi Kahou et al., 2015).

In the context of HRI, FER has been performed through conventional and DL approaches.

2.2.i. Discrete Models

In Faria et al. (2017), an emotion classification approach, based on uncomplicated geometrical features, was proposed. Given the position of 68 facial landmarks, their Euclidean distances and the angles of 91 triangles formed past landmarks were considered. A probabilistic ensemble of classifiers, namely dynamic Bayesian mixture model (DBMM), was employed using linear regression (LR), back up vector machine (SVM), and an random forest (RF), combined in a probabilistic weighting strategy. The proposed approach was tested both on Karolinska Directed Emotional Faces (KDEF) (Lundqvist et al., 1998) and during HRI for recognizing 7 discrete emotions. In item, for HRI test, the humanoid robot NAO was programmed either to react to the recognized emotion. The overall accuracy on the KDEF dataset was 85%, while in bodily HRI it was lxxx.6%. It is to note that the test performed on KDEF was express to images of frontal or ±45° orientation of the faces.

In Chen et al. (2017), the authors proposed a method for real-time dynamic emotion recognition according to facial expression and emotional intention agreement. In detail, emotion recognition was performed by using a dynamic model of AUs (Candide3-based dynamic characteristic signal matching) and an algorithm implementing fuzzy rules for each of the 7 basic emotions considered. Experiments were conducted on 30 volunteers experiencing the scenario of drinking at bar. One of the 2 employed mobile robot was used for emotion recognition, achieving lxxx.48% of accurateness.

Candide3-based features were also adopted in Chen et al. (2019). Here, the authors proposed an adaptive feature selection strategy based on the plus-50 minus-R selection (LRS) algorithm in lodge to reduce the model dimensionality. Nomenclature was performed with a set of k-nearest neighbors (kNN) classifiers, integrated into AdaBoost with direct optimization framework. The proposed arroyo was tested both on the JAFFE dataset (Lyons et al., 1998) and on data caused past a mobile robot, equipped with a Kinect. In the latter experiment, the proposed method achieved an average accuracy of 81.42% in the classification of 7 discrete emotions.

In Liu et al. (2019), FER was performed by combining local binary pattern (LBP) and 2D Gabor wavelet transform for feature extraction and past training an extreme learning machine (ELM) for the classification of basic emotion. Experiments were conducted both on public datasets [JAFFE (Lyons et al., 1998), CK+ (Lucey et al., 2010)], and during actual HRI as part of a multimodal organisation setup (Liu et al., 2016). In the latter case, the method was able to recognize between seven emotions with an overall accuracy of 81.9%.

Histogram of oriented gradients (Pig) and LBP were used as features descriptors in Reyes et al. (2020), where a SVM was trained to classify vii discrete emotions. The system was initially fed with images from extended Cohn-Kanade (CK+) dataset, but was fine-tuned, by calculation batches of different sizes of local images (i.e., facial images of participants acquired during the test). Nomenclature of information acquired during the interaction with the robot NAO achieved 87.7% of accuracy.

Reported results propose that FER systems designed for HRI can be developed by using several features and decision strategies. However, differences between HMI and HRI arise. In HMI scenarios, FER is in general easier: the position of the face with respect to the camera is more constrained, the user is close to the camera and the environment conditions exercise not modify abruptly. Due to these differences, it is preferable to train FER organisation on information in-the-wild or during real interaction with robots. Moreover, information technology could be useful to endow robots with the adequacy of recognizing emotions non just from facial information, merely also from contextual and environmental data (Lee et al., 2019). Future developments of FER will probably depend also on emerging technologies: in the last decades, there was a rapid development of relatively cheap depth cameras (RGB-D sensors), and thermal cameras. The work by Corneanu et al. (2016) offers a comprehensive taxonomy of FER approaches based on RGB, 3D, and thermal sensors. For example, 3D information can ameliorate face detection, landmarks localization, and AUs computation (Mao et al., 2015; Szwoch and Pieniażek, 2015; Zhang et al., 2015; Patil and Bailke, 2016).

two.iii. Thermal Facial Images

Changes in the affective country produce the redistribution of the claret in the vessels, due to vasodilatation/vasoconstriction and emotional sweating phenomena. Infra-red thermal cameras tin discover these changes, since they crusade variations in peel temperature (Ioannou et al., 2014). Therefore, thermal images could be used to perform ER (Liu and Wang, 2011; Wang et al., 2014). Usually, this is washed by considering temperature variations of specific regions of interest (ROIs), e.g., tip of the nose, forehead, orbicularis oculi, and cheeks.

2.3.one. Discrete Models

In Boccanfuso et al. (2016), ten subjects played a trivia game with a MyKeepon robot behaving to induce happiness and anger and watched emotional video clips selected to elicit the aforementioned emotions. RGB and thermal facial images were acquired, together with galvanic skin response (GSR) point. In particular, the thermal trends of 5 ROIs were analyzed by combining principal component analysis (PCA) and logistic regression, achieving a prediction success of 100%.

In Goulart et al. (2019b), a system for the classification of 5 emotions during the interaction betwixt children and a robot was proposed. A mobile robot (N-MARIA) was equipped RGB and thermal camera that were used to locate and acquire thermal information of xi ROIs, respectively. Statistical features were computed for each ROI and multiple combinations of characteristic reduction techniques and classification algorithms were tested over a database of 28 developing children (Goulart et al., 2019a). A PCA+linear discriminant assay (LDA) system, trained and tested on the database (accuracy 85%), was used to infer the emotional responses of children during the interaction with N-MARIA, with results consequent with self reported emotions.

Performing ER from thermal images in actual HRI is still a challenge due to the constraints (other than those that touch on simple FER) that are not adjustable to all real-life scenarios, offset of all the necessity to maintain a stable environmental temperature. Nonetheless, contempo results show that thermal images take the potential to facilitate HRI (Filippini et al., 2020).

2.4. Body Pose and Kinematics

As facial expressions, body posture, movements, and gestures are natural and intuitive means to infer the affective state of a person. Emotional trunk gesture recognition (EBGR) has been widely explored (Noroozi et al., 2018). In order to have reward of information conveyed by static or dynamic cues, an EBGR organisation has to model the body position from input signals, usually RGB data, depth maps, or their combination. The first step of the canonical recognition pipeline is the detection of homo bodies: literature offers several approaches for addressing the trouble (Ioffe and Forsyth, 2001; Viola and Jones, 2001; Viola et al., 2005; Wang and Lien, 2007; Nguyen et al., 2016). Then, the pose of the body has to be estimated, past fitting an a priori defined model, typically a skeleton, over the body region. This chore could be performed either by solving an inverse kinematic problem (Barron and Kakadiaris, 2000) or by using DL, if a large amount of skeletonized information are available (Toshev and Szegedy, 2014). Features could include absolute or reciprocal positions and orientations of limbs, likewise as motility information such as speed or acceleration (Glowinski et al., 2011; Saha et al., 2014). Nomenclature can exist performed either by traditional ML or deep learning (Savva et al., 2012; Saha et al., 2014).

2.four.1. Detached Models

An interesting approach was proposed in Elfaramawy et al. (2017), where neural network compages was designed for classifying 6 emotions from torso motion patterns. In particular, the classification was performed past abound when required (GWR) networks, cocky-organizing architectures able to grow nodes whenever the network does not sufficiently match the input. Two GWR network learned samples of pose and motility and, subsequently, a recurrent variant of GWR, namely gamma-GWR, to take in account temporal context. The dataset, that included 19 subjects, was collected by extending a NAO robot with a depth camera (Asus Xtion) and using a 2nd NAO located on the side to allow conquering from two points of view. Subjects performed body motions related to the emotions, elicited past the description of inspiring scenarios. Pose features (positions of joints) and motion features (the difference in pose features betwixt consecutive frames) were considered. The system accomplished an overall accuracy of 88.8%.

two.four.2. Dimensional Models

In Sun et al. (2019), the authors proposed the local articulation transformations for describing body poses, and a two-layered LSTM for estimating the emotional intensity of discrete emotions. The authors tested the organization over the Emotional Trunk Motion Database (Volkova et al., 2014) past because the intensity as the percentage of correctly perceived segments for each scenario. Pearson Correlation Coefficient (PCC) between the ground truth and the estimated intensity was 0.81. In existent HRI experiments with a Pepper robot, the organisation enabled the robot to sense subjects' emotional intensities effectively.

Summarizing these results, we tin can say that body poses and movements are fantabulous to convey emotional cues and that EBGR systems tin can exist successfully employed in HRI scenarios. Moreover, the fact that FER and EBGR rely on the same sensors (RGB, depth cameras) would allow to take advantages of both modalities.

2.5. Brain Activity

Inferring the emotional land encephalon activity represents a challenging and fascinating possibility, since having access to the cerebral activity would allow to avert any filter, voluntary or not, that could interfere with the ER (Kappas, 2010). Several measurement systems could be used for acquiring brain activity. Among them, electroencephalography (EEG) is characterized by loftier temporal resolution, is portable, easy to use, and non expensive. Moreover, it has proven to be suitable for encephalon monitoring also in HMI applications (Lahane et al., 2019). Consumer-grade devices, although not accurate plenty for neuroscience research and critical command tasks, have been reported to exist a viable choice for applications such as affective calculating (Duvinage et al., 2013; Nijboer et al., 2015; Maskeliunas et al., 2016).

ER by EEG has been widely explored in the literature (Alarcao and Fonseca, 2017; Spezialetti et al., 2018). Most commonly used features can be roughly classified by the domain from which they are extracted (time, frequency, time–frequency). Time domain features include statistical values, Hjorth parameters, fractal dimension (FD), and loftier order crossing (HOC) (Jenke et al., 2014). Frequency analyses in EEG are very common, likewise because it is known the clan betwixt frequency bands of EEG point and specific mental tasks. The near intuitive and used frequency feature is ring ability, but other measures, such as high order spectra (HOS) take been also employed (Hosseini et al., 2010). Fourth dimension–frequency analysis aims to notice the frequency content of the point, without losing the information nearly its the temporal development. Among others, wavelet transform (WT) has demonstrated to be particularly suited for analyzing not-stationary signals such as EEG (Akin, 2002). Previously listed features are generally computed in a channel wise manner, but also the topography of EEG signals can be taken into account. Since frontal EEG asymmetry has been proved to be involved in emotional activity (Palmiero and Piccardi, 2017), disproportion indices have been ofttimes employed in emotion recognition.

Some interesting works in traditional literature (Ansari-Asl et al., 2007; Petrantonakis and Hadjileontiadis, 2009; Valenzi et al., 2014) were devoted to test which classification approaches, features, and channel configurations are more suited for EEG-based ER. Results from these studies suggest two significant points. Kickoff, the emotional country of subjects can be inferred with quite practiced accuracy by EEG. Second, using a reduced set up of channels and commercial-course devices even so allows to preserve an adequate level of accuracy. The latter point is disquisitional, since in most of the HRI scenarios, the EEG equipment should be worn continuously for long periods and while moving. Light, like shooting fish in a barrel-to-mount devices would be preferable to research-grade hardware. Until 2017 (Al-Nafjan et al., 2017), the most adopted nomenclature arroyo was SVM, often in conjunction with power spectral density (PSD)-based features. All the same, deep learning approaches are standing out also in this domain, showing the potential to outperform traditional ML techniques (Zheng and Lu, 2015; Li et al., 2016; Wang et al., 2019). Withal, to the best of our knowledge, few works accept addressed EEG ER in real HRI scenarios.

2.5.i. Dimensional Models

In Shao et al. (2019), the authors employed the Softbank Pepper robot as an autonomous exercise facilitator to encourage the user during the concrete activity that can autonomously adapt its emotional behaviors on the basis of user affect. In particular, valence detection was performed past analyzing EEG from a commercial-course device. Selected features were PSD and frontal asymmetry. Among 6 classifiers, a neural network (NN) achieved the highest accuracy over a dataset from 10 subjects obtained past inducing emotions with pictures and videos. When employed in a real HRI scenario, the robot was able to correctly recognize the valence (five levels) for 14 of the 15 subjects.

In Shao et al. (2020), the authors proposed a novel paradigm for eliciting emotions by directly employing non-verbal advice of the robot (Pepper), in order to train a detection model with information from actual HRI. The elicitation methodology was based both on music and body movements of the robot and aimed to elicit two types of affects: positive valence and high arousal, and negative valence and low arousal. EEG was caused by a iv-channel headset in order to extract PSD and frontal asymmetry features and feed an NN and an SVM. The affect detection arroyo was tested on 14 subjects for valence and 12 for arousal, obtaining an overall accuracy of 71.ix and lxx.1% (valence), and 70.six and 69.five% (arousal) using NN and SVM, respectively.

2.6. Vocalism

Everyday experiences tell us that the vocalism, besides as facial expressions, is an informative channel about our interlocutor emotions. We take a natural power to infer the emotional land underlying the semantic content of what the speaker is saying. Changes in emotional states correspond to variations of organs' features, such as larynx position and vocal fold tension, thus in variations of the voice (Johnstone, 2001). In HRI, automatic audio-visual emotion recognition (AER) has to be performed in club to allow robots to perceive human vocal bear on. Commonly, AER does not examine speech and words in the semantic sense, instead information technology analyzes variation with respect to the neutral speak of prosody (east.g., pitch, energy, and formants data), vocalism quality (e.one thousand., voice level and temporal structures), and spectral (e.m., cepstral-based coefficients) features. Features can be extracted either locally, segmenting the bespeak in frames, or globally, considering the whole utterance. In traditional ML approaches, characteristic extraction is followed by classification, performed generally by hidden Markov model (HMM), Gaussian mixture model (GMM), and SVM (El Ayadi et al., 2011; Gangamohan et al., 2016). Also in the AER field, deep learning approaches take rapidly emerged, providing end-to-end mechanisms in contrast with those based on hand-crafted features and demonstrating that they tin perform well-compared with traditional techniques (Khalil et al., 2019). Employed models include deep Boltzmann machine (DBM) (Poon-Feng et al., 2014), RNN (Lee and Tashev, 2015), deep belief network (DBN) (Wen et al., 2017), and CNN (Zheng et al., 2015).

2.6.1. Discrete Models

In Chen et al. (2020), 2-layer fuzzy multiple random forest (TLFMRF) was proposed for speech emotion recognition. Statistic values of 32 features (xvi basic features and their derivative) were extracted from speech samples. And then, clustering past fuzzy C-means (FCM) was adopted to divide the feature data into different subclasses to address differences in identification information such as gender and age. In TLFMRF, a cascade of RF was employed for improving the nomenclature between emotions that are difficult to distinguish. The approach was tested in the nomenclature of six basic emotions from short utterances, spoken past 5 participants in front of a mobile robot. Average accuracy was 80.73%.

two.7. Pheripheral Physiological Responses and Multimodal Approaches

Emotions touch torso physiology, producing meaning modifications to hearth rate, blood volume pressure (BVP), respiration, skin conductivity, and temperature (Kreibig, 2010), which tin contribute to predict the emotional state of a person. A large effort has been made for developing datasets and techniques for ER, frequently by because multiple signal sources together (Koelstra et al., 2011; Soleymani et al., 2011; Chen et al., 2015; Correa et al., 2018). Beside the accuracy obtained with simply peripheral signals, these works prove that the fusion of multiple modalities tin outperform unmarried modality approaches, therefore they can be useful sources of information for improving multimodal organisation performance. Moreover, the rapid evolution and mass production of consumer grade devices, such as smartband and smartwatch (Poongodi et al., 2020), will facilitate the integration of these signals in nigh of HRI systems. For case, in Lazzeri et al. (2014) a multimodal conquering platform, including a humanoid robot capable of expressing emotions, was tested in a social robot-based therapy scenario for children with autism. The arrangement included audio and video sources, together with electrocardiogram (ECG), GSR, respiration, and accelerometer, that are integrated in a sensorized t-shirt, just the platform was designed to be flexible and reconfigurable in society to connect with various hardware devices. Another multimodal platform was described in Liu et al. (2016). It was a circuitous multimodal emotional communication based human–robot interaction (MEC-HRI) system. The whole platform was composed of 3 NAO robots, two mobile robots, a workstation, and several devices such as a Kinect and an EEG headset, and it was designed to allow robots to recognize humans' emotions and reply in accord to them, basing on facial expression, speech, posture, and EEG. The article does non report numerical results of multimodal classification in tested HRI scenarios, but modules achieved promising results when tested on benchmark datasets (Liu et al., 2018a,b) or in unmarried-modality HRI experiments (Liu et al., 2019).

Now, several studies have employed single peripheral measures for ER in HRI (McColl et al., 2016), but the bulk focused on narrow aspects of ER (e.1000., level of stress or fatigue), instead of referring to a broader emotional model.

2.seven.i. Discrete Models

I of the highest accuracy results, we found in the literature, was Perez-Gaspar et al. (2016). Here, the authors developed a multimodal emotion recognition system that integrated expressions and vocalization patterns, based on the evolutionary optimization of NN and HMM. Genetic algorithms (GAs) were used to guess the most suitable structures for ANNs and HMMs for modeling of speech communication and visual emotional features. Speech and visual data were managed separately past two distinct modules and a decision level fusion was performed past averaging the output probabilities from dissimilar modalities of each class. The organisation was trained on a dataset of Mexican people, containing pictures from ix subjects and spoken communication samples from viii subjects. Iv basic emotions were considered (acrimony, sadness, happiness, and neutral). Alive tests were performed with 10 unseen subjects that interacted with a graphical interface and 97% of accuracy was reported. Finally, in all HRI experiments (dialogue with a Bioloid robot), the robot oral communication was consistent with the emotion shown by the users.

In Filntisis et al. (2019), the authors proposed a system that hierarchically fuses torso and facial features from images based on the simultaneous use of residual network (ResNet) and a deep neural network (DNN) to analyze confront and body information, respectively. The system was not incorporated into a robot, only tested on a database containing images of children interacting with 2 different robots (Zeno and Furhat) in a game in which they had to limited vi basic emotions. Classification accurateness was 72%.

An interesting approach was proposed in Yu and Tapus (2019) for multimodal emotion recognition from thermal facial images and gait assay. Hither, interactive robot learning (IRL) was proposed to have advantage of human feedback obtained past the robot during HRI. Starting time, ii RF models for thermal images and gait were trained separately on a dataset of fifteen subjects labeled with 4 emotions. Computed features included PSD of 4 joints angles and angular velocity for gait and mean and variance of 3 ROIs for thermal images. A decision level fusion was performed based on weights computed from the confusion matrices of the two RF classifiers. In the proposed IRL, during the interaction with the robot, if the predicted emotion does not correspond to the homo feedback, the gait and the thermal facial features were used to update the emotion recognition models. The online test included emotion elicitation by movie, followed by gait and thermal images acquisition, and involve a Pepper robot. Results showed the IRL can ameliorate the classification accurateness from 65.half-dozen to 78.1%.

two.7.2. Dimensional Models

Barros et al. (2015) presented a neural compages, named Cross-Channel CNN for multimodal data extraction. Such network is able to extract emotions' features based on face expression and body motion. Among dissimilar experiments, the arroyo was tested on a existent HRI scenario. An iCub robot was used to collaborate with a subject and presented ane emotional state (positive, negative, or neutral). The robot recognized the emotional country and gave feedback by irresolute its oral cavity and eyebrow LEDs, with an average accuracy of 74%.

In Val-Calvo et al. (2020), an interesting analysis of the possibilities of ER in HRI was performed by using facial images, EEG, GSR, and blood pressure. In a realistic HRI scenario, a Pepper robot dynamically drives subjects' emotional responses by story-telling and multimedia stimuli. Acquired data (from 16 participants) was labeled with the emotional score that each subject self reported using 3 levels of valence and arousal. Classification experiments were conducted, together with a population-based statistical analysis. Facial expression estimation is achieved past a CNN strategy. The model was trained on FER2013 (Goodfellow et al., 2013) database in club to map facial images in 7 emotions, grouped into 3 levels of valence. Three independent classifications were used for estimating valence from EEG and arousal from BVP and GSR. The classification process was carried using a fix of 8 standard classifiers and considering statistical features of the signals. Achieved accuracy results obtained on both emotional dimensions were college than 80% on average.

iii. Give-and-take

As information technology is possible to observe by our brief summary about the country of the art, ER is feasible past collecting different kinds of data. Some modalities have been widely explored, both in a broader HMI context and specifically for HRI (FER, EBGR), others should be deeper investigated considering, now, ER has not been tested enough in HRI applications (EEG) or because existing HRI field tests are focused on narrow aspects of emotions (peripheral responses). In our opinion, all the considered modalities correspond promising information sources for futurity developments: innovative and accessible technologies, such every bit depth cameras, consumer-grade EEG, and smart devices, together with advances in ML volition lead to rapid developments of emotions-enlightened robots. Yet, emotion recognition is currently all the same a claiming for robots due to the necessity of reliability of results, to provide a trustworthy interaction, and the fourth dimension constraints required to account for the recognized emotion into the accommodation of the robot behavior. Moreover, many of the used dataset came from general HMI research and and then are not suited for emotion recognition in real settings. There is notwithstanding the demand of dataset from real HRI. Indeed, the visual field of view of the robot may not exist aligned to the images stored in the dataset (i.e., grade face-to-face interaction), the perceived sound may be affected by racket of the ego-move of the robot, and robot movements may fifty-fifty occlude its field of view. In this perspective, multimodal systems volition take a key function by improving the performances of ER with respect to single-modalities approaches, and ML methods and DL architectures have to exist adult to deal with heterogeneous data. Detail attention has to exist paid on information used to train and examination ER: HRI presents some critical and challenging aspects that could brand data nerveless in controlled environments or from unlike contexts, unsuitable for existent HRI applications. Nonetheless, recently published datasets accept the advantage of containing data collected from a large number of sensors. This is a valuable feature, since it will let to develop features-level fusion approaches for multimodal ER.

Author Contributions

SR conceived the study. MS contributed to finding the relevant literature. All authors contributed to manuscript writing, revision, and read and approved the submitted version.

Funding

This work has been supported past PON I&C 2014-2020 within the BRILLO research projection Bartending Robot for Interactive Long Lasting Operations Prog. northward. F/190066/01-02/X44.

Conflict of Interest

The authors declare that the enquiry was conducted in the absenteeism of any commercial or financial relationships that could exist construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/manufactures/10.3389/frobt.2020.532279/full#supplementary-material

Footnotes

References

Alarcao, Southward. M., and Fonseca, Thou. J. (2017). Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. 10, 374–393. doi: x.1109/TAFFC.2017.2714671

CrossRef Full Text | Google Scholar

Al-Nafjan, A., Hosny, M., Al-Ohali, Y., and Al-Wabil, A. (2017). Review and classification of emotion recognition based on EEG encephalon-computer interface system research: a systematic review. Appl. Sci. 7:1239. doi: 10.3390/app7121239

CrossRef Full Text | Google Scholar

Álvarez, Five. K., Sánchez, C. Northward., Gutiérrez, Southward., Domínguez-Soberanes, J., and Velázquez, R. (2018). "Facial emotion recognition: a comparison of different landmark-based classifiers," in 2018 International Conference on Research in Intelligent and Computing in Applied science (RICE) (New York, NY: IEEE), 1–iv.

Google Scholar

Ansari-Asl, M., Chanel, G., and Pun, T. (2007). "A channel selection method for EEG classification in emotion assessment based on synchronization likelihood," in Point Processing Conference, 2007 15th European (New York, NY), 1241–1245.

Google Scholar

Barron, C., and Kakadiaris, I. A. (2000). "Estimating anthropometry and pose from a single prototype," in Proceedings IEEE Conference on Reckoner Vision and Design Recognition, CVPR 2000 (New York, NY: IEEE), 669–676.

Google Scholar

Barros, P., Weber, C., and Wermter, S. (2015). "Emotional expression recognition with a cross-channel convolutional neural network for human being-robot interaction," in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids) (New York, NY), 582–587.

Google Scholar

Boccanfuso, L., Wang, Q., Leite, I., Li, B., Torres, C., Chen, 50., et al. (2016). "A thermal emotion classifier for improved human-robot interaction," in 2016 25th IEEE International Symposium on Robot and Homo Interactive Advice (RO-MAN) (New York, NY: IEEE), 718–723.

Google Scholar

Cameron, D., Fernando, S., Collins, E., Millings, A., Moore, R., Sharkey, A., et al. (2015). "Presence of life-like robot expressions influences children's enjoyment of homo-robot interactions in the field," in Proceedings of the AISB Convention 2015 (The Gild for the Study of Artificial Intelligence and Simulation of Behaviour).

Google Scholar

Cañamero, L. (2019). Embodied robot models for interdisciplinary emotion enquiry. IEEE Trans. Bear on. Comput. 1. doi: ten.1109/TAFFC.2019.2908162

CrossRef Full Text | Google Scholar

Cavallo, F., Semeraro, F., Fiorini, L., Magyar, G., Sinčák, P., and Dario, P. (2018). Emotion modelling for social robotics applications: a review. J. Bionic Eng. 15, 185–203. doi: x.1007/s42235-018-0015-y

CrossRef Full Text | Google Scholar

Chen, J., Hu, B., Xu, 50., Moore, P., and Su, Y. (2015). "Feature-level fusion of multimodal physiological signals for emotion recognition," in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (New York, NY: IEEE), 395–399. doi: 10.1109/BIBM.2015.7359713

CrossRef Full Text | Google Scholar

Chen, 50., Li, Thousand., Su, W., Wu, Chiliad., Hirota, K., and Pedrycz, Due west. (2019). "Adaptive feature selection-based AdaBoost-KNN with direct optimization for dynamic emotion recognition in human-robot interaction," in IEEE Transactions on Emerging Topics in Computational Intelligence (New York, NY). doi: 10.1109/TETCI.2019.2909930

CrossRef Full Text | Google Scholar

Chen, L., Su, W., Feng, Y., Wu, M., She, J., and Hirota, Thou. (2020). Two-layer fuzzy multiple random forest for speech emotion recognition in human being-robot interaction. Inform. Sci. 509, 150–163. doi: x.1016/j.ins.2019.09.005

CrossRef Full Text | Google Scholar

Chen, 50., Wu, Grand., Zhou, M., Liu, Z., She, J., and Hirota, One thousand. (2017). Dynamic emotion understanding in human-robot interaction based on ii-layer fuzzy SVR-TS model. IEEE Trans. Syst. Human being Cybernet. Syst. 50, 490–501. doi: ten.1109/TSMC.2017.2756447

CrossRef Total Text | Google Scholar

Corneanu, C. A., Simón, Yard. O., Cohn, J. F., and Guerrero, Due south. E. (2016). Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and touch-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1548–1568. doi: 10.1109/TPAMI.2016.2515606

PubMed Abstract | CrossRef Full Text | Google Scholar

Correa, J. A. Thousand., Abadi, M. K., Sebe, Northward., and Patras, I. (2018). Amigos: a dataset for bear on, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. i. doi: x.1109/TAFFC.2018.2884461

CrossRef Full Text | Google Scholar

Duvinage, M., Castermans, T., Petieau, Thousand., Hoellinger, T., Cheron, 1000., and Dutoit, T. (2013). Functioning of the emotiv epoc headset for p300-based applications. Biomed. Eng. Online 12:56. doi: 10.1186/1475-925X-12-56

PubMed Abstract | CrossRef Total Text | Google Scholar

Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., and Pal, C. (2015). "Recurrent neural networks for emotion recognition in video," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (New York, NY: ACM), 467–474. doi: x.1145/2818346.2830596

CrossRef Full Text

El Ayadi, M., Kamel, Thou. S., and Karray, F. (2011). Survey on speech emotion recognition: features, nomenclature schemes, and databases. Pattern Recogn. 44, 572–587. doi: 10.1016/j.patcog.2010.09.020

CrossRef Full Text | Google Scholar

Elfaramawy, N., Barros, P., Parisi, 1000. I., and Wermter, S. (2017). "Emotion recognition from body expressions with a neural network architecture," in Proceedings of the 5th International Conference on Human Agent Interaction (New York, NY), 143–149. doi: x.1145/3125739.3125772

CrossRef Full Text | Google Scholar

Faria, D. R., Vieira, M., and Faria, F. C. (2017). "Towards the development of affective facial expression recognition for human-robot interaction," in Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments (New York, NY), 300–304. doi: 10.1145/3056540.3076199

CrossRef Full Text | Google Scholar

Filippini, C., Perpetuini, D., Cardone, D., Chiarelli, A. Yard., and Merla, A. (2020). Thermal infrared imaging-based affective calculating and its awarding to facilitate human robot interaction: a review. Appl. Sci. 10:2924. doi: 10.3390/app10082924

CrossRef Total Text | Google Scholar

Filntisis, P. P., Efthymiou, N., Koutras, P., Potamianos, G., and Maragos, P. (2019). Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robot. Automat. Lett. four, 4011–4018. doi: 10.1109/LRA.2019.2930434

CrossRef Full Text | Google Scholar

Fong, T., Nourbakhsh, I., and Dautenhahn, K. (2003). A survey of socially interactive robots. Robot. Auton. Syst. 42, 143–166. doi: 10.1016/S0921-8890(02)00372-X

CrossRef Full Text | Google Scholar

Gangamohan, P., Kadiri, S. R., and Yegnanarayana, B. (2016). "Analysis of emotional speech–a review," in Toward Robotic Socially Conceivable Behaving Systems-Volume I, eds A. Esposito and L. Jain (Cham: Springer), 205–238. doi: 10.1007/978-3-319-31056-5_11

CrossRef Full Text | Google Scholar

Ghayoumi, M. (2017). A quick review of deep learning in facial expression. J. Commun. Comput. 14, 34–38. doi: x.17265/1548-7709/2017.01.004

CrossRef Full Text | Google Scholar

Ghimire, D., and Lee, J. (2013). Geometric feature-based facial expression recognition in image sequences using multi-grade AdaBoost and back up vector machines. Sensors 13, 7714–7734. doi: 10.3390/s130607714

PubMed Abstruse | CrossRef Full Text | Google Scholar

Glowinski, D., Dael, N., Camurri, A., Volpe, G., Mortillaro, Chiliad., and Scherer, K. (2011). Toward a minimal representation of affective gestures. IEEE Trans. Affect. Comput. two, 106–118. doi: 10.1109/T-AFFC.2011.vii

CrossRef Total Text | Google Scholar

Goodfellow, I. J., Erhan, D., Carrier, P. Fifty., Courville, A., Mirza, M., Hamner, B., et al. (2013). "Challenges in representation learning: a study on three machine learning contests," in International Briefing on Neural Information Processing (Berlin: Springer), 117–124. doi: 10.1007/978-3-642-42051-1_16

PubMed Abstruse | CrossRef Total Text | Google Scholar

Goulart, C., Valadão, C., Delisle-Rodriguez, D., Caldeira, E., and Bastos, T. (2019a). Emotion analysis in children through facial emissivity of infrared thermal imaging. PLoS Ane 14:e0212928. doi: 10.1371/periodical.pone.0212928

PubMed Abstruse | CrossRef Full Text | Google Scholar

Goulart, C., Valadão, C., Delisle-Rodriguez, D., Funayama, D., Favarato, A., Baldo, G., et al. (2019b). Visual and thermal image processing for facial specific landmark detection to infer emotions in a child-robot interaction. Sensors 19:2844. doi: x.3390/s19132844

PubMed Abstract | CrossRef Total Text | Google Scholar

Guo, F., Li, Thou., Qu, Q., and Duffy, V. Chiliad. (2019). The result of a humanoid robot's emotional behaviors on users' emotional responses: evidence from pupillometry and electroencephalography measures. Int. J. Hum. Comput. Interact. 35, 1947–1959. doi: x.1080/10447318.2019.1587938

CrossRef Full Text | Google Scholar

Hosseini, South. A., Khalilzadeh, M. A., Naghibi-Sistani, Yard. B., and Niazmand, V. (2010). "Higher order spectra analysis of EEG signals in emotional stress states," in 2010 2d International Conference on Information Technology and Information science (New York, NY: IEEE), sixty–63. doi: 10.1109/ITCS.2010.21

CrossRef Full Text | Google Scholar

Hudlicka, E. (2011). Guidelines for designing computational models of emotions. Int. J. Synthet. Emot. 2, 26–79. doi: 10.4018/jse.2011010103

CrossRef Full Text | Google Scholar

Ioffe, S., and Forsyth, D. A. (2001). Probabilistic methods for finding people. Int. J. Comput. Vis. 43, 45–68. doi: 10.1023/A:1011179004708

CrossRef Full Text | Google Scholar

Jenke, R., Peer, A., and Buss, M. (2014). Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Bear upon. Comput. 5, 327–339. doi: 10.1109/TAFFC.2014.2339834

CrossRef Total Text | Google Scholar

Johnstone, T. (2001). The issue of emotion on vocalisation product and speech acoustics (Ph.D. thesis), Psychology Department, The Academy of Western Australia, Perth, WA, Commonwealth of australia.

Google Scholar

Kappas, A. (2010). Smile when you read this, whether you like it or not: conceptual challenges to affect detection. IEEE Trans. Affect. Comput. 1, 38–41. doi: x.1109/T-AFFC.2010.6

CrossRef Full Text | Google Scholar

Khalil, R. A., Jones, E., Babar, M. I., January, T., Zafar, M. H., and Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: a review. IEEE Access 7, 117327–117345. doi: x.1109/Admission.2019.2936124

CrossRef Total Text | Google Scholar

Koelstra, S., Muhl, C., Soleymani, Chiliad., Lee, J.-S., Yazdani, A., Ebrahimi, T., et al. (2011). DEAP: a database for emotion analysis; using physiological signals. IEEE Trans. Bear on. Comput. 3, eighteen–31. doi: ten.1109/T-AFFC.2011.15

CrossRef Full Text | Google Scholar

Kumar, South., and Gupta, A. (2015). "Facial expression recognition: a review," in Proceedings of the National Conference on Deject Computing and Big Data (Shanghai), 4–vi.

Google Scholar

Kwak, S. South., Kim, Y., Kim, E., Shin, C., and Cho, K. (2013). "Whatmakes people sympathise with an emotional robot?: The impact of agency and physical embodiment on man empathy for a robot," in 2013 IEEE RO-MAN (New York, NY: IEEE), 180–185. doi: 10.1109/ROMAN.2013.6628441

CrossRef Full Text | Google Scholar

Lahane, P., Jagtap, J., Inamdar, A., Karne, N., and Dev, R. (2019). "A review of contempo trends in EEG based brain-computer interface," in 2019 International Briefing on Computational Intelligence in Information Science (ICCIDS) (New York, NY: IEEE), ane–half-dozen. doi: x.1109/ICCIDS.2019.8862054

CrossRef Full Text | Google Scholar

Lazzeri, N., Mazzei, D., and De Rossi, D. (2014). Development and testing of a multimodal acquisition platform for human being-robot interaction melancholia studies. J. Hum. Robot Interact. 3, ane–24. doi: 10.5898/JHRI.iii.2.Lazzeri

CrossRef Total Text | Google Scholar

Lee, J., Kim, Southward., Kim, S., Park, J., and Sohn, K. (2019). "Context-aware emotion recognition networks," in Proceedings of the IEEE International Conference on Computer Vision (New York, NY), 10143–10152. doi: 10.1109/ICCV.2019.01024

CrossRef Full Text | Google Scholar

Lee, J., and Tashev, I. (2015). "High-level characteristic representation using recurrent neural network for speech emotion recognition," in Sixteenth Annual Conference of the International Spoken communication Communication Association (Baixas).

Google Scholar

Lench, H. C., Flores, S. A., and Demote, S. Due west. (2011). Detached emotions predict changes in cognition, judgment, feel, beliefs, and physiology: a meta-assay of experimental emotion elicitations. Psychol. Balderdash. 137:834. doi: 10.1037/a0024244

PubMed Abstruse | CrossRef Total Text | Google Scholar

Li, 10., Song, D., Zhang, P., Yu, G., Hou, Y., and Hu, B. (2016). "Emotion recognition from multi-channel EEG information through convolutional recurrent neural network," in 2016 IEEE International Briefing on Bioinformatics and Biomedicine (BIBM) (Shenzhen: IEEE), 352–359. doi: 10.1109/BIBM.2016.7822545

CrossRef Full Text | Google Scholar

Liberati, Thousand., Federici, South., and Pasqualotto, Due east. (2015). Extracting neurophysiological signals reflecting users' emotional and affective responses to BCI use: a systematic literature review. Neurorehabilitation 37, 341–358. doi: ten.3233/NRE-151266

PubMed Abstract | CrossRef Full Text | Google Scholar

Lindquist, G. A., Siegel, Eastward. H., Quigley, 1000. South., and Barrett, L. F. (2013). The hundred-yr emotion state of war: are emotions natural kinds or psychological constructions? Annotate on Lench, Flores, and Bench (2011). Psychol. Bull. 139, 255–263. doi: x.1037/a0029038

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Z., and Wang, Southward. (2011). "Emotion recognition using hidden Markov models from facial temperature sequence," in International Conference on Affective Computing and Intelligent Interaction (Berlin: Springer), 240–247. doi: ten.1007/978-3-642-24571-8_26

CrossRef Full Text | Google Scholar

Liu, Z.-T., Li, S.-H., Cao, W.-H., Li, D.-Y., Hao, Grand., and Zhang, R. (2019). Combining 2D Gabor and local binary pattern for facial expression recognition using farthermost learning auto. J. Adv. Comput. Intell. Intell. Inform. 23, 444–455. doi: 10.20965/jaciii.2019.p0444

CrossRef Full Text | Google Scholar

Liu, Z.-T., Pan, F.-F., Wu, G., Cao, W.-H., Chen, L.-F., Xu, J.-P., et al. (2016). "A multimodal emotional communication based humans-robots interaction system," in 2016 35th Chinese Control Conference (CCC) (New York, NY: IEEE), 6363–6368. doi: 10.1109/ChiCC.2016.7554357

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Z.-T., Wu, M., Cao, W.-H., Mao, J.-W., Xu, J.-P., and Tan, Grand.-Z. (2018a). Speech emotion recognition based on feature pick and extreme learning auto decision tree. Neurocomputing 273, 271–280. doi: x.1016/j.neucom.2017.07.050

CrossRef Full Text | Google Scholar

Liu, Z.-T., Xie, Q., Wu, Chiliad., Cao, W.-H., Li, D.-Y., and Li, Due south.-H. (2018b). Electroencephalogram emotion recognition based on empirical fashion decomposition and optimal feature choice. IEEE Trans. Cogn. Dev. Syst. 11, 517–526. doi: ten.1109/TCDS.2018.2868121

CrossRef Full Text | Google Scholar

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., and Matthews, I. (2010). "The extended Cohn-Kanade dataset (ck+): a complete dataset for activity unit and emotion-specified expression," in 2010 IEEE Estimator Society Conference on Reckoner Vision and Blueprint Recognition-Workshops (New York, NY: IEEE), 94–101. doi: 10.1109/CVPRW.2010.5543262

CrossRef Full Text | Google Scholar

Lundqvist, D., Flykt, A., and Öhman, A. (1998). The Karolinska Directed Emotional Faces (KDEF). CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet. doi: 10.1037/t27732-000

CrossRef Full Text

Lyons, M., Akamatsu, S., Kamachi, Grand., and Gyoba, J. (1998). "Coding facial expressions with gabor wavelets," in Proceedings Third IEEE International Conference on Automated Face and Gesture Recognition (New York, NY: IEEE), 200–205. doi: 10.1109/AFGR.1998.670949

CrossRef Total Text | Google Scholar

Mao, Q.-R., Pan, 10.-Y., Zhan, Y.-Z., and Shen, X.-J. (2015). Using kinect for existent-time emotion recognition via facial expressions. Front. Inform. Technol. Electron. Eng. 16, 272–282. doi: 10.1631/FITEE.1400209

CrossRef Full Text | Google Scholar

Marmpena, Yard., Lim, A., and Dahl, T. S. (2018). How does the robot experience? Perception of valence and arousal in emotional body language. Paladyn J. Behav. Robot. 9, 168–182. doi: x.1515/pjbr-2018-0012

CrossRef Total Text | Google Scholar

Mavridis, Due north. (2015). A review of verbal and non-exact human-robot interactive advice. Robot. Auton. Syst. 63, 22–35. doi: 10.1016/j.robot.2014.09.031

CrossRef Total Text | Google Scholar

McColl, D., Hong, A., Hatakeyama, N., Nejat, G., and Benhabib, B. (2016). A survey of autonomous human touch detection methods for social robots engaged in natural HRI. J. Intell. Robot. Syst. 82, 101–133. doi: ten.1007/s10846-015-0259-2

CrossRef Full Text | Google Scholar

Mehrabian, A. (1996). Pleasure-arousal-authorization: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14, 261–292. doi: x.1007/BF02686918

CrossRef Full Text | Google Scholar

Mollahosseini, A., Hasani, B., and Mahoor, Chiliad. H. (2017). Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Bear upon. Comput. x, 18–31. doi: 10.1109/TAFFC.2017.2740923

PubMed Abstract | CrossRef Full Text | Google Scholar

Navarro-Guerrero, Northward., Lowe, R., and Wermter, S. (2012). "A neurocomputational amygdala model of auditory fear conditioning: a hybrid system approach," in The 2012 International Articulation Conference on Neural Networks (IJCNN) (New York, NY: IEEE), 1–8. doi: x.1109/IJCNN.2012.6252392

CrossRef Total Text | Google Scholar

Nguyen, D. T., Li, Westward., and Ogunbona, P. O. (2016). Human detection from images and videos: a survey. Pattern Recogn. 51, 148–175. doi: ten.1016/j.patcog.2015.08.027

CrossRef Full Text | Google Scholar

Nijboer, F., Van De Laar, B., Gerritsen, S., Nijholt, A., and Poel, M. (2015). Usability of iii electroencephalogram headsets for brain-computer interfaces: a within subject comparison. Interact. Comput. 27, 500–511. doi: 10.1093/iwc/iwv023

CrossRef Full Text | Google Scholar

Noroozi, F., Kaminska, D., Corneanu, C., Sapinski, T., Escalera, S., and Anbarjafari, G. (2018). Survey on emotional torso gesture recognition. IEEE Trans. Touch. Comput. doi: ten.1109/TAFFC.2018.2874986

CrossRef Full Text | Google Scholar

Palmiero, One thousand., and Piccardi, L. (2017). Frontal EEG asymmetry of mood: a mini-review. Front. Behav. Neurosci. 11:224. doi: 10.3389/fnbeh.2017.00224

CrossRef Full Text | Google Scholar

Patil, J. V., and Bailke, P. (2016). "Existent time facial expression recognition using realsense camera and ANN," in 2016 International Conference on Inventive Ciphering Technologies (ICICT), Vol. 2 (New York, NY: IEEE), 1–6. doi: 10.1109/INVENTIVE.2016.7824820

CrossRef Full Text | Google Scholar

Perez-Gaspar, 50.-A., Caballero-Morales, S.-O., and Trujillo-Romero, F. (2016). Multimodal emotion recognition with evolutionary computation for human being-robot interaction. Proficient Syst. Appl. 66, 42–61. doi: 10.1016/j.eswa.2016.08.047

CrossRef Full Text | Google Scholar

Petrantonakis, P. C., and Hadjileontiadis, L. J. (2009). Emotion recognition from EEG using higher order crossings. IEEE Trans. Inform. Technol. Biomed. fourteen, 186–197. doi: 10.1109/TITB.2009.2034649

PubMed Abstract | CrossRef Full Text | Google Scholar

Picard, R. W. (1999). "Affective computing for HCI," in Proceedings of HCI International (the 8th International Conference on Human being-Computer Interaction) on Human-Computer Interaction: Ergonomics and User Interfaces-Book I, eds H. J. Bullinger and J. Ziegler (Mahwah, NJ: Fifty. Erlbaum Associates Inc.), 829–833.

Google Scholar

Plutchik, R., and Kellerman, H. (2013). Theories of Emotion, Vol. ane. Cambridge, MA: Bookish Printing.

Google Scholar

Poon-Feng, G., Huang, D.-Y., Dong, Chiliad., and Li, H. (2014). "Acoustic emotion recognition based on fusion of multiple characteristic-dependent deep Boltzmann machines," in The 9th International Symposium on Chinese Spoken Linguistic communication Processing (New York, NY: IEEE), 584–588. doi: x.1109/ISCSLP.2014.6936696

CrossRef Full Text | Google Scholar

Poongodi, T., Krishnamurthi, R., Indrakumari, R., Suresh, P., and Balusamy, B. (2020). "Wearable devices and IoT," in A Handbook of Internet of Things in Biomedical and Cyber Concrete Arrangement (Berlin: Springer), 245–273. doi: x.1007/978-3-030-23983-1_10

CrossRef Full Text | Google Scholar

Refat, C. Thousand. Thou., and Azlan, N. Z. (2019). "Deep learning methods for facial expression recognition," in 2019 7th International Briefing on Mechatronics Technology (ICOM) (New York, NY: IEEE), 1–six. doi: 10.1109/ICOM47790.2019.8952056

CrossRef Full Text | Google Scholar

Reisenzein, R., Hudlicka, E., Dastani, 1000., Gratch, J., Hindriks, Thousand., Lorini, East., et al. (2013). Computational modeling of emotion: Toward improving the inter-and intradisciplinary exchange. IEEE Trans. Affect. Comput. 4, 246–266. doi: 10.1109/T-AFFC.2013.14

CrossRef Total Text | Google Scholar

Reyes, South. R., Depano, K. G., Velasco, A. 1000. A., Kwong, J. C. T., and Oppus, C. Grand. (2020). "Face up detection and recognition of the 7 emotions via facial expression: Integration of automobile learning algorithm into the NAO robot," in 2020 5th International Conference on Command and Robotics Engineering (ICCRE) (New York, NY: IEEE), 25–29. doi: 10.1109/ICCRE49379.2020.9096267

CrossRef Full Text | Google Scholar

Rossi, S., Larafa, M., and Ruocco, M. (2020). Emotional and behavioural distraction past a social robot for children anxiety reduction during vaccination. Int. J. Soc. Robot. 12, 1–13. doi: 10.1007/s12369-019-00616-w

CrossRef Full Text | Google Scholar

Rossi, S., and Ruocco, M. (2019). Better alone than in bad company: effects of incoherent not-exact emotional cues for a humanoid robot. Interact. Stud. twenty, 487–508. doi: x.1075/is.18066.ros

CrossRef Full Text | Google Scholar

Saarimäki, H., Ejtehadian, Fifty. F., Glerean, Due east., Jääskeläinen, I. P., Vuilleumier, P., Sams, M., et al. (2018). Distributed affective infinite represents multiple emotion categories across the man brain. Soc. Cogn. Affect. Neurosci. 13, 471–482. doi: ten.1093/scan/nsy018

PubMed Abstract | CrossRef Full Text | Google Scholar

Saha, S., Datta, Due south., Konar, A., and Janarthanan, R. (2014). "A written report on emotion recognition from body gestures using kinect sensor," in 2014 International Briefing on Communication and Indicate Processing (New York, NY: IEEE), 056–060. doi: ten.1109/ICCSP.2014.6949798

CrossRef Total Text | Google Scholar

Salovey, P., and Mayer, J. D. (1990). Emotional intelligence. Imaginat. Cogn. Pers. nine, 185–211. doi: 10.2190/DUGG-P24E-52WK-6CDG

CrossRef Full Text | Google Scholar

Sánchez-López, Y., and Cerezo, E. (2019). Designing emotional BDI agents: practiced practices and open questions. Knowledge Eng. Rev. 34:e26. doi: ten.1017/S0269888919000122

CrossRef Full Text | Google Scholar

Saunderson, S., and Nejat, G. (2019). How robots influence humans: a survey of nonverbal communication in social man-robot interaction. Int. J. Soc. Robot. 11, 575–608. doi: x.1007/s12369-019-00523-0

CrossRef Full Text | Google Scholar

Savva, N., Scarinzi, A., and Bianchi-Berthouze, N. (2012). Continuous recognition of player's affective body expression as dynamic quality of artful feel. IEEE Trans. Comput. Intell. AI Games iv, 199–212. doi: 10.1109/TCIAIG.2012.2202663

CrossRef Full Text | Google Scholar

Shao, Yard., Alves, South. F. R., Ismail, O., Zhang, X., Nejat, M., and Benhabib, B. (2019). "Y'all are doing bully! only ane rep left: an affect-enlightened social robot for exercising," in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (Bari: Found of Electric and Electronics Engineers), 3811–3817. doi: 10.1109/SMC.2019.8914198

CrossRef Full Text | Google Scholar

Shao, Yard., Snyder, M., Nejat, G., and Benhabib, B. (2020). User affect elicitation with a socially emotional robot. Robotics 9:44. doi: x.3390/robotics9020044

CrossRef Full Text | Google Scholar

Siegel, E. H., Sands, Yard. K., Van den Noortgate, W., Condon, P., Chang, Y., Dy, J., et al. (2018). Emotion fingerprints or emotion populations? A meta-analytic investigation of autonomic features of emotion categories. Psychol. Bull. 144:343. doi: x.1037/bul0000128

PubMed Abstract | CrossRef Full Text | Google Scholar

Soleymani, M., Lichtenauer, J., Pun, T., and Pantic, M. (2011). A multimodal database for affect recognition and implicit tagging. IEEE Trans. Touch on. Comput. 3, 42–55. doi: 10.1109/T-AFFC.2011.25

CrossRef Full Text | Google Scholar

Spezialetti, M., Cinque, L., Tavares, J. G. R., and Placidi, G. (2018). Towards EEG-based bci driven past emotions for addressing BCI-illiteracy: a meta-analytic review. Behav. Inform. Technol. 37, 855–871. doi: 10.1080/0144929X.2018.1485745

CrossRef Full Text | Google Scholar

Suk, Thousand., and Prabhakaran, B. (2014). "Real-time mobile facial expression recognition system-a case report," in Proceedings of the IEEE Briefing on Computer Vision and Pattern Recognition Workshops (New York, NY), 132–137. doi: 10.1109/CVPRW.2014.25

CrossRef Full Text | Google Scholar

Sun, Yard., Mou, Y., Xie, H., Xia, Yard., Wong, Yard., and Ma, X. (2019). "Estimating emotional intensity from trunk poses for human-robot interaction," in 2018 IEEE International Conference on Systems, Man and Cybernetics (SMC) (New York, NY: IEEE), 3811–3817.

Google Scholar

Szwoch, M., and Pieniażek, P. (2015). "Facial emotion recognition using depth information," in 2015 8th International Conference on Homo System Interaction (HSI) (New York, NY: IEEE), 271–277. doi: 10.1109/HSI.2015.7170679

CrossRef Full Text | Google Scholar

Tanevska, A., Rea, F., Sandini, G., and Sciutti, A. (2017). "Tin emotions enhance the robot's cogntive abilities: a study in autonomous HRI with an emotional robot," in Proceedings of AISB Convention (Bath).

Google Scholar

Tomkins, Due south. Due south. (2008). Affect Imagery Consciousness: The Complete Edition: Two Volumes. Berlin: Springer Publishing Company.

Google Scholar

Toshev, A., and Szegedy, C. (2014). "Deeppose: human pose interpretation via deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (New York, NY), 1653–1660. doi: 10.1109/CVPR.2014.214

CrossRef Total Text | Google Scholar

Trnka, R., Lačev, A., Balcar, Chiliad., Kuška, M., and Tavel, P. (2016). Modeling semantic emotion infinite using a 3d hypercube-projection: an innovative analytical approach for the psychology of emotions. Front. Psychol. vii:522. doi: 10.3389/fpsyg.2016.00522

PubMed Abstract | CrossRef Total Text | Google Scholar

Tsiourti, C., Weiss, A., Wac, G., and Vincze, M. (2017). "Designing emotionally expressive robots: a comparative written report on the perception of communication modalities," in Proceedings of the 5th International Conference on Human Amanuensis Interaction (New York, NY: ACM), 213–222. doi: 10.1145/3125739.3125744

CrossRef Full Text | Google Scholar

Turan, C., and Lam, K.-M. (2018). Histogram-based local descriptors for facial expression recognition (FER): a comprehensive study. J. Vis. Commun. Image Represent. 55, 331–341. doi: 10.1016/j.jvcir.2018.05.024

CrossRef Full Text | Google Scholar

Val-Calvo, Chiliad., Álvarez-Sánchez, J. R., Ferrández-Vicente, J. Yard., and Fernández, E. (2020). Affective robot story-telling human-robot interaction: exploratory real-time emotion interpretation analysis using facial expressions and physiological signals. IEEE Access 8, 134051–134066. doi: 10.1109/ACCESS.2020.3007109

CrossRef Full Text | Google Scholar

Valenzi, Southward., Islam, T., Jurica, P., and Cichocki, A. (2014). Private classification of emotions using EEG. J. Biomed. Sci. Eng. 7:604. doi: ten.4236/jbise.2014.78061

CrossRef Full Text | Google Scholar

Viola, P., and Jones, M. (2001). "Rapid object detection using a additional cascade of simple features," in Proceedings of the 2001 IEEE Computer Society Briefing on Computer Vision and Pattern Recognition. CVPR 2001, Vol.one (New York, NY: IEEE). doi: x.1109/CVPR.2001.990517

CrossRef Full Text | Google Scholar

Viola, P., Jones, M. J., and Snowfall, D. (2005). Detecting pedestrians using patterns of movement and advent. Int. J. Comput. Vis. 63, 153–161. doi: ten.1007/s11263-005-6644-8

CrossRef Full Text | Google Scholar

Volkova, E., De La Rosa, Southward., Bülthoff, H. H., and Mohler, B. (2014). The MPI emotional body expressions database for narrative scenarios. PLoS ONE 9:e113647. doi: ten.1371/journal.pone.0113647

PubMed Abstract | CrossRef Total Text | Google Scholar

Wang, C.-C. R., and Lien, J.-J. J. (2007). "AdaBoost learning for human detection based on histograms of oriented gradients," in Asian Conference on Calculator Vision (Berlin: Springer), 885–895. doi: 10.1007/978-3-540-76386-4_84

CrossRef Full Text | Google Scholar

Wang, 1000.-Y., Ho, Y.-L., Huang, Y.-D., and Fang, W.-C. (2019). "Design of intelligent EEG organization for homo emotion recognition with convolutional neural network," in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) (New York, NY: IEEE), 142–145. doi: ten.1109/AICAS.2019.8771581

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, S., He, M., Gao, Z., He, S., and Ji, Q. (2014). Emotion recognition from thermal infrared images using deep Boltzmann auto. Front. Comput. Sci. 8, 609–618. doi: 10.1007/s11704-014-3295-3

CrossRef Total Text | Google Scholar

Wen, K., Li, H., Huang, J., Li, D., and Xun, Eastward. (2017). Random deep belief networks for recognizing emotions from speech signals. Comput. Intell. Neurosci. (London) 2017. doi: x.1155/2017/1945630

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, C., and Tapus, A. (2019). "Interactive robot learning for multimodal emotion recognition," in International Briefing on Social Robotics (Berlin: Springer), 633–642. doi: x.1007/978-3-030-35888-4_59

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, T. (2017). "Facial expression recognition based on deep learning: a survey," in International Briefing on Intelligent and Interactive Systems and Applications (Cham: Springer), 345–352. doi: ten.1007/978-three-319-69096-4_48

CrossRef Total Text | Google Scholar

Zhang, Y., Zhang, L., and Hossain, M. A. (2015). Adaptive 3d facial activeness intensity estimation and emotion recognition. Expert Syst. Appl. 42, 1446–1464. doi: 10.1016/j.eswa.2014.08.042

CrossRef Total Text | Google Scholar

Zheng, Due west.-50., and Lu, B.-L. (2015). Investigating disquisitional frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Mental Dev. vii, 162–175. doi: 10.1109/TAMD.2015.2431497

CrossRef Full Text | Google Scholar

Zheng, West. Q., Yu, J. S., and Zou, Y. X. (2015). "An experimental study of speech emotion recognition based on deep convolutional neural networks," in 2015 International Conference on Melancholia Computing and Intelligent Interaction (ACII) (New York City, NY: Found of Electrical and Electronics Engineers), 827–831. doi: 10.1109/ACII.2015.7344669

CrossRef Total Text | Google Scholar

crenshawbaccustelic.blogspot.com

Source: https://www.frontiersin.org/articles/10.3389/frobt.2020.532279/full

Emotional Robots Aim to Read Human Feelings

1. Introduction

2. State of the Art

2.1. Emotional Models

two.ii. Facial Expressions

2.2.i. Discrete Models

two.iii. Thermal Facial Images

2.3.one. Discrete Models

2.4. Body Pose and Kinematics

2.four.1. Detached Models

two.four.2. Dimensional Models

2.5. Brain Activity

2.5.i. Dimensional Models

2.6. Vocalism

2.6.1. Discrete Models

two.7. Pheripheral Physiological Responses and Multimodal Approaches

2.seven.i. Discrete Models

two.7.2. Dimensional Models

iii. Give-and-take

Author Contributions

Funding

Conflict of Interest

Supplementary Material

Footnotes

References

0 Response to "Emotional Robots Aim to Read Human Feelings"

Mag-post ng isang Komento

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel