WO2001038959A2 - An apparatus and method for determining emotional and conceptual context from a user input - Google Patents

An apparatus and method for determining emotional and conceptual context from a user input Download PDF

Info

Publication number
WO2001038959A2
WO2001038959A2 PCT/US2000/032305 US0032305W WO0138959A2 WO 2001038959 A2 WO2001038959 A2 WO 2001038959A2 US 0032305 W US0032305 W US 0032305W WO 0138959 A2 WO0138959 A2 WO 0138959A2
Authority
WO
WIPO (PCT)
Prior art keywords
output
input
state
emotional
concept
Prior art date
Application number
PCT/US2000/032305
Other languages
French (fr)
Other versions
WO2001038959A3 (en
Inventor
Robert Simon Hairman
Joseph Lawrence Sandmeyer
Original Assignee
Talkie, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Talkie, Inc. filed Critical Talkie, Inc.
Priority to AU16288/01A priority Critical patent/AU1628801A/en
Publication of WO2001038959A2 publication Critical patent/WO2001038959A2/en
Publication of WO2001038959A3 publication Critical patent/WO2001038959A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and system of generating a contextually correct output for an input. An input concept and an indication of an input emotional state are received. A state machine adjusts an output emotional state in response to the input concept and input emotional state. An output is generated factoring in the output emotional state and the input concept.

Description

AN APPARATUS AND METHOD FOR DETERMINING
EMOTIONAL AND CONCEPTUAL CONTEXT
FROM A USER INPUT
BACKGROUND
(1) Field of the Invention
The invention relates to computer/human interfaces. More specifically, the invention relates to increasing the ability of the computer to respond in a humanlike manner to an indeterminate input stream.
(2) Background
Improving the computer human interface has long been a goal, and software designers have tried making computers more user-friendly. Great strides have been made with the invention of graphically user interfaces which obscure from the user most of the underlying complexity of carrying out their desired function. The ability to "point and click" and "drag and drop" vastly facilitated the acceptance of the computer as a mainstay in modern society. But computers remain intimidating to a wide variety of potential users, and almost no one would characterize a computer session as resembling a typical human interaction. In customer service situations, consumers resent being funneled through a computer attendant before reaching a "real person." Even the proliferation of speech recognition software that has permitted computers to take oral instruction to perform certain predefined tasks has not significantly resolved these issues. Such prior art speech recognition typically employs grammar parsers to discern what command had been issued coupled with patterns matching to match the voice command to those existing in the set of possible commands.
Notwithstanding these strides in human computer interaction, computers are largely regarded as a machines, like a car or a washing machine. However, unlike a car, computers can emulate certain human functions, but their ability to emulate those functions is constrained by the command set provided. Moreover, computers have no humanesque embodiment identified with them. The general problem remains how to improve the ease of use in human computer interactions and make the user experience more enjoyable.
BRIEF SUMMARY OF THE INVENTION
A method and system of generating a contextuaUy correct output for an input is disclosed. An input concept and an indication of an input emotional state are received. A state machine adjusts an output emotional state in response to the input concept and input emotional state. An output is generated factoring in the output emotional state and the input concept.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a system of one embodiment of the invention.
Figure 2 is a flow chart of an operation of a natural language filter of one embodiment of the invention.
Figure 3 is a flow chart of operation of the emotional scoring filter of one embodiment of the invention.
Figure 4 is a flow chart of operation of the context filter in one embodiment of the invention.
Figure 5 is a flow chart of the operation of a continuous emotional state machine of one embodiment of the invention.
Figure 6 is a flow chart of discreet state emotion suppression in one embodiment of the invention.
Figure 7 is a flow chart of output concept dialogue generation of one embodiment of the invention.
Figure 8 is a flow chart of operation of the vocal expression module in one embodiment of the invention.
Figure 9 is a flow chart of output generation in the facial animation module of one embodiment of the invention.
Figure 10 is a flow chart of the body animation module of one embodiment of the invention.
DETAILED DESCRIPTION
The invention generally provides a new paradigm for human computer interaction by using as much information as can be derived about the context of the interaction and appropriately defining emotional states. The system uses a data driven algorithm to provide an animated embodiment to the interacting computer, as well as natural language response capability. Some embodiments may provide for only one of audio and graphical output. In any case, providing the computer with appropriate understanding of and response to context improves the overall user experience.
Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it should be appreciated that throughout the present invention, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices, including robotic devices, such as robots and dolls.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magneto-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose machines may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these machines will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
Figure 1 is a block diagram of a system of one embodiment of the invention. In this embodiment of the invention, a context filter 10 accepts certain inputs and controls the generation of emotionally consistent audio and graphical output. A user interface 16 receives an input stream. The input stream may include typed input, spoken input, galvanic skin response or other biofeedback, video or still capture of user expressions, or any other information useful in derivation of context. Depending on the types of input accepted by the user interface 16, the user interface 16 may contain a voice recognition module 56, a voice stress analyzer 54, and an expression identifier 52. Other modules or the user interface 16 are within the scope and contemplation of the invention. The form of the input stream is also likely to vary between different applications of the invention. For example, a phone center computer may receive only audio and touch tone input (and only provide audio output).
The language portion of the input stream, be it spoken or typed, is passed to a natural language filter 18. Natural language filter 18 derives an input concept from the natural language input. As used herein "concept" is abbreviated data reflecting a semantic meaning of the language input stream. In some embodiments, the natural language filter also receives biometric data to assist in the identification of, e.g., irony and sarcasm. The natural language filter 18 may access an idiom database and rule set 24 in the course of matching the natural language input with an input concept. It is well understood in the field of natural language that grammatical rules alone may, in many cases, fail to identify the semantic meaning of a language string. Thus, the use of an idiomatic database to identify input concepts significantly improves the probability that the natural language filter 18 will generate a semantically correct input concept. The natural language filter 18 forwards the input concept to the context filter 10 for later use.
Concurrently, the natural language input and other emotional indicators derived from the input stream are forwarded to an emotional scoring filter 14. In one embodiment, the emotional scoring filter is modeled after the work of Louis Gottschalk, who developed a method of psychological analysis known as the "Gottschalk-Gleiser Method of Content Analysis." This system uses grammatical clauses as a unit of communication from which to derive the psychological message being conveyed by the speaker. More than three decades of empirical data backs the accuracy of the Gottschalk Method, which scores numerous parameters, such as: hostility, social alienation, cognitive and intellectual impairment, depression, hope, and anxiety.
While the Gottschalk parameters work well in a strict pathological context, it is desirable to avoid its psychological bias and generate more basic emotion for use in the preferred embodiment. Thus, the emotional scoring filter 14, in one embodiment of the invention, will score for common emotional states, such as fear, anger, joy, sadness, disgust, and disdain, rather than the wide range of pathology typical of a traditional Gottschalk filter. The score will be representative of the user's mood and permit the context filter 10 to adjust the emotional state of the system responsive to the user's emotional state. Emotional scoring filter 14 may also use other emotional indicators, such as galvanic skin response or other biofeedback mechanisms, voice stress analysis, and expression identification as a contribution to or tags on the emotional score derived from the language input.
The context filter 10 then employs the emotional score to determine emotional state and variable shifts for a plurality of emotional state variables which define the system's current emotional state. The emotional shift caused by any score and input concept, may vary from system to system or character to character. For example, one character may be defined to meet hostility with hostility, while another meets hostility with placation. These emotional variable shifts are provided to a continuous emotional state machine 12. The system may have a default or base emotional state which it will exhibit at a first interaction with the user. In one embodiment, the default emotional state is defined to be random within a particular range. The state variables correspond to primary emotions, such as fear, joy, arrogance, disgust, sadness, anger, and coldness. An arbitrarily large number of other emotional states may be reflected in the state variables. Alternatively, other emotional states can be designed as a combination of the primary emotional states.
The continuous emotional state machine 12 employs a plurality of state vectors, referred to as match vectors, which describe idealized conditions under which a particular event occurs. For example, there might be a match vector corresponding to an angry response, and a match vector corresponding to a joyful response. The concept of match vectors is derived from fuzzy logic and neural network systems that categorize input values by similarity to given patterns. By comparing the existing emotional state to the set of match vectors, a closest match existing emotional state dictates the emotional context of the response. Typically, the emotional variables will decay to their default value or to fall within their default range. A decay function may be linear or non-linear relative to time and may vary from one emotion to the next. For example, fear may decay exponentially with time from the event giving rise to the fear, while joy may decay linearly. Similarly, two emotional state variables decaying linearly may decay at different rates. For example, joy may decay at a rate of 2t, while arrogance decays at a rate of t/2. Decay profiles may vary from one animated character to another within a single system or from system to system.
A vector corresponding to the current existing emotional state is returned to the context filter 10 and retained as part of the current state 62 of the system. The context filter 10 may also employ match vectors to define the current context. In such an embodiment, the vector corresponding to the emotional state is only one of the variables that defines the current context. The current state 62 information may include other information about the user or the current context, such as user's name, gender, etc., as well as the number of I/O pairs and number of conversational threads currently in use.
The context filter 10 may also contain one or more triggers 64 which each cause the system to enter a discreet state 32 in which the emotions are suppressed or partially suppressed and the system is permitted to carry out any standard software function. An example of a possible discreet state is "make the sale" for a sales system. The trigger might be that the purchaser indicates a desire to buy. The underlying software functions may include credit card verification. Discreet state transitions are deemed instantaneous, and the system will never be in more than one discreet at a single time, nor will it be between discreet states. Context filter 10 is also responsible for driving a generation of output responses. In that capacity, the context filter 10 supplies the input concept and state information to an adaptive filter 20. The adaptive filter 20 uses the current state information to assess user understanding and depth of the conversation. The adaptive filter 20 is constantly trying to maintain the levels of conversation consistent with the user's understanding and other contextual factors. Additionally, if after a threshold number of I/O pairs, for example, two or three, the system is unable to match the input concept with an output concept, the adaptive filter 20 may assume control of the conversation by asking the user questions to get back on track. Prior to getting the threshold, if no match is found for an input concept, the adaptive filter 20 may merely provide, for example, a guiding response to attempt to move the user back in an area into which the system has knowledge.
The input concept and state information are tagged by the adaptive filter 20 with, for example, information reflecting a suitable level for output concept, and the adaptive filter checks the output concept database 22. If an output concept is identified in the output concept database 22, it is compared with the idiom database and rule set 24 to select a suitable idiom based on the state information. Idiom database and rule set 24 converts the output concept into a suitable word output (dialogue) and forwards it and the current state to a vocal expression module 30 and a facial animation module 26.
The vocal expression module 30 uses a variation of three variables — pitch, speed, and volume — to inflect emotional content into the audio output. The parameters pitch, speed, and volume are provided to the facial animation module 26 to permit lip synchronization between the animated face and the words based on the pitch, speed, and volume employed by the vocal expression module. The facial animation modules 26 and body animation modules 28 provide a graphical output of the system. The context filter 10 provides the currents state to both modules 26, 28 and indicates the action required for the body animation module 28. In this manner, the facial expressions and body language of animated character are made consistent with the emotional and semantic context of the interaction.
Figure 2 is a flow chart of an operation of a natural language filter of one embodiment of the invention. At functional block 100, the natural language filter receives a natural language input. As previously noted, this may be in the form of a typed character string or the output of speech recognition software. Various third party speech recognition software is available, such as Nuance, available from Nuance Communications of Menlo Park, California, and the L&H Speech Recognition System from Lermont & Hauspie Speech Products N.V. (L&H) of Belgium. Both of the above systems are multi-voice software suitable for a multi-user environment. In one embodiment, the natural language filter uses a pattern matching algorithm that includes anaphoric references and uses the ASL 1600 from L&H. Other embodiments may employ another superior recognizer as it becomes available. In some embodiments, the natural language filter parses the grammatic structure. In one embodiment, the natural language filter provides a mechanism for tagging anaphoric references, such as he, she, me, they, it, etc. Tagging of such anaphoric references is important to generation of contextuaUy correct output concepts. A layer of abstraction separates the natural language filter from the context filter permitting ready adaption as improved technology becomes available.
The natural language input received at functional block 100 is compared to an idiomatic database in an effort to determine its meaning which might not be apparent from parsing its grammatic structure. A determination is made at decision block 104 whether an idiom exists within the idiomatic database. The idiomatic database includes libraries of concepts matching various idioms and an extensive dictionary including common misspellings. If an idiom exists, the natural language filter creates a concept matching the idiom at functional block 106. If no idiom is found, a concept is derived from the original natural language input at functional block 108. In one embodiment, the concept is expressed and tagged English based on recognition and matching of multiple key words within sentences related by word distance. By way of example, in one possible syntactic formation, the question "what is context?" would become "WhatContext." In one embodiment, this may be performed using table lookup in a lookup table that there is a rule with a possible input string. Other syntax is, of course, within the scope and contemplation of the invention. Then, at functional block 110, the natural language filter provides the input concept to the context filter. The natural language filter's function is then complete for the current input /output pair.
Figure 3 is a flow chart of operation of the emotional scoring filter of one embodiment of the invention. At functional block 150, an emotional scoring filter is provided with the emotional content from the input stream. As previously indicated, this may include the natural language input, captured expression, voice stress analysis, and galvanic skin response or other biofeedback mechanism. Other sources of emotional content may also contribute to the input of the emotional scoring filter. The language portion of the stream is scored for emotional context using a modified Gottschalk approach (such as discussed above) at functional block 152. A determination is made at functional block 154 if other data is present in an input stream from which emotional content may be derived. If it is, the score of the language portion may be modified based on that additional data or the score may have tags appended to it reflecting the emotional weight of that additional data at functional block 156. In either case, the score is forwarded to the context filter at functional block 158, after which the emotional scoring filter has completed its function for input/output pair.
Figure 4 is a flow chart of operation of the context filter in one embodiment of the invention. At functional block 200, the context filter receives the input concept and emotional score. At functional block 202, the context filter identifies the conversation thread corresponding to the input concept. It is possible that a user may carry on multiple conversational threads with the system, and tracking the thread to which the particular input concept applies is important to providing an appropriate contextualized output. A determination is made at decision block 204 where the thread has been identified. If no thread has been identified matching the input concept, a new thread is deemed created at functional block 206. If a thread is identified or after a new thread is created, the context filter then defines the adjustment for emotional state variables based on the emotional score at functional block 208. Then at functional block 210, the adjustment is sent to the emotional state machine. The current emotional state is returned to the context filter at function block 212. Then at functional block 214, the input concept and state information are forwarded for generation of a corresponding output concept. The input concept and any output concept corresponding thereto are stored in a neural network table at functional block 216. This has several advantages inasmuch as it permits follow up questions because history is maintained. But also over time, an extensive dataset will be developed that may be data mined to teach other animated character natural language skills. At functional block 218, the context filter provides the voice output subsystem with the then existing emotional context. Similarly, at functional block 220, the context filter provides the emotional context to the graphical animation subsystem which includes both facial animation and body animation.
Figure 5 is a flow diagram of the operation of a continuous emotional state machine of one embodiment of the invention. At functional block 250, the state machine receives state variable adjustments, if any, from the context filter. Then at functional block 252, it adjust those state variables based on adjustments received and any decay which has occurred since the previous cycle. Then at functional block 234, it updates the current emotional state in the context filter to reflect the current emotional state after any adjustments or decay. In one embodiment, the emotional state is determined using match vectors. As adjustments (either positive or negative) are applied to the various defined emotions, the value of that emotion's contribution to the emotional state of the system changes. The aggregate of the various values of the defined emotions is compared with existing match vectors to yield the current emotional state.
Figure 6 is a flow chart of discreet state emotion suppression in one embodiment of the invention. At functional block 270, a signal to enter the discreet state is received. In response to entering the discreet state, the context filter is signaled to suppress one or more emotions, either partially or totally, at functional block 272. Then, at functional block 274, a predefined software function is performed upon completion of the predefined software function, or, if for another reason, the triggering event ceases, the system will leave the discreet state and emotion suppression will be discontinued.
Figure 7 is a flow chart of output concept dialogue generation of one embodiment of the invention. At functional block 300, the adaptive filter receives the input concept and the state information from the context filter. At functional block 302, the adaptive filter assesses user understanding and conversational thread progression from the state information. The adaptive filter then accesses an output concept database for an output concept which is consistent with the input concept and the assessment of understanding at functional block 304. If an output concept is not found at decision block 306, an indication of the I/O pairs during which no match was found is incremented at functional block 308. A determination is then made of the number of times no match was found exceeds the threshold at decision block 310. If the threshold has been exceeded, the adaptive filter enters an adaptive testing mode at functional block 314. In this adaptive testing mode, the adaptive filter formulates questions which steer the user back into a discourse in which matches may be found. By way of example, a first question may be an open ended question. If the user's response to the open-ended question returns to a thread which permits matching, the adaptive test is over. If the system is unable to make sense of the user response, the focus of the question will be narrowed on consecutive questions until a bi-conditional response is called for. In this manner, it is expected that the adaptive testing procedure will permit the system to move the user back into a conversational context suitable to the system. If at decision block 310, no match does not equal threshold, the system will generate a guiding response to try and guide the user back into the context or possibly make some kind of a general comment relevant to the topic. The system will not yet assume control of the conversation, as in the case of the adaptive test.
After an output concept is found, either at decision block 306 or through the guiding response or adaptive test, the output concept is checked against the idiom database to generate a suitable dialogue string corresponding to the conversational context and the output concept at functional block 316. At functional block 318, the generated dialogue is sent to the voice expression module and the facial animation module.
Figure 8 is a flow chart of operation of the vocal expression module in one embodiment of the invention. The vocal expression module receives the dialogue and state information at functional block 340. Using the state information, the vocal expression module establishes a pitch, speed, and volume map to apply to dialogue output to impart emotional and semantic content. By way of example, if the output is a question, the pitch may be elevated at the end of the question. If the emotional state is happy, pitch and speed may be adjusted to effect a lilting vocal output. The pitch, speed, and volume map may be established using library lookups to find closest matches to the existing state. Additionally, each variable within the map may be randomized within a range. This randomization reduces the machine-like quality that might otherwise result. The pitch, speed, and volume map is forwarded to the facial animation module at functional block 344. Then, at functional block 346, the dialogue is then synthesized or played from pre-recorded voice recordings in synchronization with the facial animation using the pitch, speed, and volume map.
Figure 9 is a flow chart of output generation in the facial animation module of one embodiment of the invention. At functional block 370, the facial animation module receives state information dialogue and the PSV map. At functional block 372, the facial vertices are adjusted to reflect the dominant motion reflected in the state information. At functional block 374, the movement is synchronized dialogue and PSV map.
In one embodiment, the face is based on a face template with a fixed number of vertices. When modifying a copy of the template, vertices may be moved but not detached, reattached, deleted or inserted; the order of vertices as they appear in the vertex list and the parts of the face that they represent must be maintained. The following features will be included in the fixed length vertex list describing the face: hairline of the forehead to the whole jaw including the top of the neck, excluding the ears. The ears will be part of the rigid head geometry and will be represented by texturing a simple shape.
The systems employs vertex deformation protocol to deform vertices of the face to create different facial expressions on the fly. Rather than create key frames on a time line, the system algorithmically interpolates between morph targets.
The facial art consists of the following morph targets: Target N — A neutral face that would be characteristic of listening to someone else speaking; Target A — The prototypical face of angry disgust for this character with teeth clenched; and Target B — The sleepy yawn face with eyes closed. It is expected that targets N, A, B will be sufficient for recombining into all of various facial expressions to be represented. Additional targets may be added, as needed.
Contiguous ranges of vertices within the fixed length vertex list will be assigned to distinct facial features. This will allow facial animation to be divided into subchannels. These subchannels will, for example, allow 80% of the left eye from target B to be combined with 50% of the entire target N and 10% of the entire target A. By defining different morph formulae to correspond to various emotional states, an appropriate facial animation of a dominant state is achieved on the fly.
Figure 10 is a flow chart of the body animation module of one embodiment of the invention. At functional block 400, the body animation module receives the current state information and action required. At functional block 402, the body of the animated character assumes the posture of the dominant emotion. At functional block 404, the body animation module accesses an action library to identify details of any action required. The action library is created using motion capture for actions for which such techniques is viable. Smaller actions are defined by key frame animation. In either case, a library lookup permits the character to synthesize motion in real time. At functional block 406, the body is animated to perform the action.
The body animation module controls bodily movements. That module tells body parts how to move in response to commands from the context filter. In one embodiment, 3D character modeling constraints include: (i) all models must be created near the 3D origin (0, 0, 0); (ii) whole units of distance must approximate inches; the default camera must be from the perspective of a child looking slightly up in a positive Z direction at an adult standing at the origin; the child is standing about 100 inches away in the negative Z; (iii) camera position (0, 50, -100); (iv) camera up vector (0, 1. 0); (v) camera look at (0, 60, 0); (vi) camera fustrum will approximate a 35 mm lens; and (vii) the head will be attached to a piece of body geometry named "neckcenter."
In one embodiment, the body parts of each animated character are subject to certain additional constraints. All animation of the characters, except for the faces, will be based on quaternions applied to a hierarchy of segmented meshes. The fingers are all slightly curved at the knuckles so that the only joint that needs to be animateα is tne joint at tne Dase or tne πnger. i nere are tnree separate rigid meshes representing the fingers for each hand, one for each thumb and index finger, and a single rigid mesh that groups the remaining three fingers. This will provide a reasonable approximation of a wide range of finger and hand positions with a minimum number of polygons and joints.
The hierarchy of body parts provides a structure for the relative positioning of body parts as shown in Table 3 below To assist in diagnosing problems that may arise, all body parts will bear labels according to this standard Each relationship between adjacent rigid components in the hierarchy is a joint to which quaternion animation can be applied. The position and animation of each body part is computed relative to the part immediately above it in the hierarchy. Every other part is positioned directly or indirectly relative to the pelvis. An invisible frame of reference with its origin on the ground between the feet determines the position of the pelvis. The origin should appear at the world origin or position x = 0, y = 0, z = 0.
Table 1
Character Root Node - origin on ground between feet, no geometry
Furniture Node - any 3D geometry that the character may interact with (e.g. a chair)
Parent Node - mesh - the parent is located at the center of the pelvis - it should not be confused with the pelvis which may move independently Pelvis Node - mesh
Torso Node - mesh
Neck Node - mesh
Head Node - mesh including ears, excluding face
Face Node - deformations of fixed length vertex list
Eyes Node - attached to head Upper Teeth - attached to head Headwear - i.e. glasses, visors, hats - attached to head Left Upper Arm Node - mesh
Left Forearm Node - mesh
Left Hand Node - mesh
Left Thumb Node - mesh Left Index Finger Node - mesh Left Minor Finger Group Node - mesh Right Upper Arm Node - mesh
Right Forearm Node - mesh
Right Hand Node - mesh
Right Thumb Node - mesh Right Index Finger Node - mesh Right Minor Finger Group Node -mesh Left Thigh Node - mesh
Left Lower Leg Node - mesh Left Foot Node - mesh Right Thigh Node - mesh
Right Lower Leg Node - mesh
While each of the above-described flow charts set forth a particular ordering, it is within the scope and contemplation of the invention that the ordering of certain decisions and /or functional blocks may occur in parallel or in an order other than specified. These flow diagrams are merely exemplary and other orderings are within the scope and contemplation of the invention.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes can be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. Therefore, the scope of the invention should be limited only by the appended claims.

Claims

CLAIMSWhat is claimed is:
1. An apparatus comprising: an input channel receiving an input concept and an indication of an input emotional state from an endpoint; a first state machine to adjust an output emotional state responsive to the input concept and the indication of input emotional state, the state machine to force a selection of an output based on the output emotional state, and the input concept; and an output channel to return an output to the endpoint.
2. The apparatus of claim 1 further comprising: a second state machine continually defining an output emotional state based on adjustments from the first state machine.
3. The apparatus of claim 1 wherein the output emotional state has a default value, the output emotional state decaying to that default value over time in the absence of an emotionally modifying input.
4. The apparatus of claim 1 further comprising: an animation module coupled to the first state machine; an adaptive filter coupled to the first state machine; and a database of output concepts coupled to the adaptive filter.
5. The apparatus of claim 1 further comprising: a current state register; and a register holding at least one trigger point that triggers entry into a discrete state.
6. The apparatus of claim 2 wherein a match vector algorithm is used to determine a current state.
7. The apparatus of claim 1 further comprising: a neural network table storing historical input/output pairs.
8. A method comprising: receiving an input concept indicating semantic content of a user input; receiving an indication of emotional content of the user input; adjusting a state variable to model an emotional shift caused by the user input; and formulating an output responding to the user input consistent with a then existing emotional state and the input concept.
9. The method of claim 8 further comprising: defining a default emotional state; and decaying a plurality of components contributing to the emotional state towards the default emotional state.
10. The method of claim 8 wherein formulating an output comprises: identifying an output concept that matches the input concept; translating the output concept into an idiom selected based on a plurality of criteria, including the then existing emotional state.
11. The method of claim 8 further comprising: maintaining a history of input/output pairs.
12. The method of claim 11 wherein the listing is maintained as a neural net further comprising: data mining the history to automatically encode learned behavior in an animated character.
13. The method of claim 10 further comprising: providing current state information to a facial animation module causing the facial animation module to form an expression on an animated face consistent with the then existing emotional state.
14. A computer readable storage media containing executable computer program instructions which when executed cause a digital processing system to perform a method comprising: receiving an input concept indicating semantic content of a user input; receiving an indication of emotional content of the user input; adjusting a state variable to model an emotional shift caused by the user input; and formulating an output responding to the user input consistent with a then existing emotional state and the concept.
15. The computer readable storage media of claim 14 which when executed cause the digital processing system to perform the method further comprising: defining default system default emotional state; and decaying a plurality of components contributing to the emotional state towards the default emotional state.
16. The computer readable storage media of claim 14 which when executed cause the digital processing system to perform the method further comprising: identifying an output concept that matches the input concept; translating the output concept into an idiom selected base on a plurality of criteria, including the then existing emotional state.
17. The computer readable storage media of claim 14 which when executed cause the digital processing system to perform the method further comprising: maintaining a history of input /output pairs.
18. The computer readable storage media of claim 13 which when executed cause the digital processing system to perform the method further comprising: data mining the history to automatically encode learned behavior in an animated character.
PCT/US2000/032305 1999-11-22 2000-11-21 An apparatus and method for determining emotional and conceptual context from a user input WO2001038959A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU16288/01A AU1628801A (en) 1999-11-22 2000-11-21 An apparatus and method for determining emotional and conceptual context from a user input

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US44453499A 1999-11-22 1999-11-22
US09/444,534 1999-11-22

Publications (2)

Publication Number Publication Date
WO2001038959A2 true WO2001038959A2 (en) 2001-05-31
WO2001038959A3 WO2001038959A3 (en) 2002-01-10

Family

ID=23765319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/032305 WO2001038959A2 (en) 1999-11-22 2000-11-21 An apparatus and method for determining emotional and conceptual context from a user input

Country Status (2)

Country Link
AU (1) AU1628801A (en)
WO (1) WO2001038959A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10154423A1 (en) * 2001-11-06 2003-05-15 Deutsche Telekom Ag Speech controlled interface for accessing an information or computer system in which a digital assistant analyses user input and its own output so that it can be personalized to match user requirements
DE102004001801A1 (en) * 2004-01-05 2005-07-28 Deutsche Telekom Ag System and process for the dialog between man and machine considers human emotion for its automatic answers or reaction
DE102010012427A1 (en) * 2010-03-23 2011-09-29 Zoobe Gmbh Method for assigning speech characteristics to motion patterns
US20130234933A1 (en) * 2011-08-26 2013-09-12 Reincloud Corporation Coherent presentation of multiple reality and interaction models
WO2015041668A1 (en) * 2013-09-20 2015-03-26 Intel Corporation Machine learning-based user behavior characterization
CN106055662A (en) * 2016-06-02 2016-10-26 竹间智能科技(上海)有限公司 Emotion-based intelligent conversation method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577165A (en) * 1991-11-18 1996-11-19 Kabushiki Kaisha Toshiba Speech dialogue system for facilitating improved human-computer interaction
US5918222A (en) * 1995-03-17 1999-06-29 Kabushiki Kaisha Toshiba Information disclosing apparatus and multi-modal information input/output system
US5974262A (en) * 1997-08-15 1999-10-26 Fuller Research Corporation System for generating output based on involuntary and voluntary user input without providing output information to induce user to alter involuntary input

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577165A (en) * 1991-11-18 1996-11-19 Kabushiki Kaisha Toshiba Speech dialogue system for facilitating improved human-computer interaction
US5918222A (en) * 1995-03-17 1999-06-29 Kabushiki Kaisha Toshiba Information disclosing apparatus and multi-modal information input/output system
US5974262A (en) * 1997-08-15 1999-10-26 Fuller Research Corporation System for generating output based on involuntary and voluntary user input without providing output information to induce user to alter involuntary input

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10154423A1 (en) * 2001-11-06 2003-05-15 Deutsche Telekom Ag Speech controlled interface for accessing an information or computer system in which a digital assistant analyses user input and its own output so that it can be personalized to match user requirements
DE102004001801A1 (en) * 2004-01-05 2005-07-28 Deutsche Telekom Ag System and process for the dialog between man and machine considers human emotion for its automatic answers or reaction
DE102010012427A1 (en) * 2010-03-23 2011-09-29 Zoobe Gmbh Method for assigning speech characteristics to motion patterns
DE102010012427B4 (en) * 2010-03-23 2014-04-24 Zoobe Gmbh Method for assigning speech characteristics to motion patterns
US20130234933A1 (en) * 2011-08-26 2013-09-12 Reincloud Corporation Coherent presentation of multiple reality and interaction models
WO2015041668A1 (en) * 2013-09-20 2015-03-26 Intel Corporation Machine learning-based user behavior characterization
CN106055662A (en) * 2016-06-02 2016-10-26 竹间智能科技(上海)有限公司 Emotion-based intelligent conversation method and system

Also Published As

Publication number Publication date
AU1628801A (en) 2001-06-04
WO2001038959A3 (en) 2002-01-10

Similar Documents

Publication Publication Date Title
JP7022062B2 (en) VPA with integrated object recognition and facial expression recognition
Ball et al. Emotion and personality in a conversational agent
Poggi et al. Eye communication in a conversational 3D synthetic agent
Pelachaud et al. Linguistic issues in facial animation
US6731307B1 (en) User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality
Levelt The architecture of normal spoken language use
US6721706B1 (en) Environment-responsive user interface/entertainment device that simulates personal interaction
US6795808B1 (en) User interface/entertainment device that simulates personal interaction and charges external database with relevant data
US6728679B1 (en) Self-updating user interface/entertainment device that simulates personal interaction
CN106653052A (en) Virtual human face animation generation method and device
KR101006191B1 (en) Emotion and Motion Extracting Method of Virtual Human
JP7352115B2 (en) Non-linguistic information generation device, non-linguistic information generation model learning device, non-linguistic information generation method, non-linguistic information generation model learning method and program
Gibbon et al. Audio-visual and multimodal speech-based systems
US20210005218A1 (en) Nonverbal information generation apparatus, method, and program
Hubal et al. Lessons learned in modeling schizophrenic and depressed responsive virtual humans for training
Gibet et al. High-level specification and animation of communicative gestures
WO2001038959A2 (en) An apparatus and method for determining emotional and conceptual context from a user input
Barbulescu et al. A generative audio-visual prosodic model for virtual actors
Poggi et al. Signals and meanings of gaze in animated faces
Gibet et al. Signing Avatars-Multimodal Challenges for Text-to-sign Generation
Mancini Multimodal distinctive behavior for expressive embodied conversational agents
Mairesse Learning to adapt in dialogue systems: data-driven models for personality recognition and generation.
US20230377238A1 (en) Autonomous animation in embodied agents
Amini Learning data-driven models of non-verbal behaviors for building rapport using an intelligent virtual agent
Ranhel et al. Guidelines for creating man-machine multimodal interfaces

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 69 EPC ( EPO FORM 1205A DATED 20/02/03 )

122 Ep: pct application non-entry in european phase