US20060122834A1 - Emotion detection device & method for use in distributed systems - Google Patents
Emotion detection device & method for use in distributed systems Download PDFInfo
- Publication number
- US20060122834A1 US20060122834A1 US11/294,918 US29491805A US2006122834A1 US 20060122834 A1 US20060122834 A1 US 20060122834A1 US 29491805 A US29491805 A US 29491805A US 2006122834 A1 US2006122834 A1 US 2006122834A1
- Authority
- US
- United States
- Prior art keywords
- emotion
- data
- speech
- prosodic
- utterance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/01—Indexing scheme relating to G06F3/01
- G06F2203/011—Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
Definitions
- the invention relates to a system and an interactive method for detecting and processing prosodic elements of speech based user inputs and queries presented over a distributed network such as the Internet or local intranet.
- the system has particular applicability to such applications as remote learning, e-commerce, technical e-support services, Internet searching, etc.
- Prosody the rhythmic and melodic qualities of speech that are used to convey emphasis, intent, attitude and semantic meaning, is a key component in the recovery of the speaker's communication and expression embedded in his or hers speech utterance. Detection of prosody and emotional content in speech is known in the art, and is discussed for example in the following representative references which are incorporated by reference herein: U.S. Pat. No. 6,173,260 to Slaney; U.S. Pat. No. 6,496,799 to Pickering; U.S. Pat. No. 6,873,953 to Lenning; U.S. Publication No.
- An object of the present invention is to provide an improved system and method for overcoming the limitations of the prior art noted above;
- Another object of the present invention is to provide an improved system and method for formulating SQL queries that includes parameters based on user emotional content
- a further object of the present invention is to provide a speech and natural language recognition system that efficiently integrates a distributed prosody interpretation system with a natural language processing system, so that speech utterances can be quickly and accurately recognized based on literal content and user emotional state information;
- a related object of the present invention is to provide an efficient mechanism for training a prosody analyzer so that the latter can operate in real-time.
- a first aspect of the invention concerns a system and method for incorporating prosodic features while performing real-time speech recognition distributed across a client device and a server device.
- the SR process typically transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients.
- this aspect of the invention extracts prosodic features from the utterance to generate extracted prosodic data; transfers the extracted prosodic data with the extracted acoustic feature data to the server device; and recognizes an emotion state of a speaker of the utterance based on at least the extracted prosodic data. In this manner operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.
- the operations are distributed across the client device and server device on a case-by-case basis.
- a parts-of-speech analyzer is also preferably included for identifying a first set of emotion cues based on evaluating a syntax structure of the utterance.
- a preferred embodiment includes a real-time classifier for identifying the emotion state based on the first set of emotion cues and a second set of emotion cues derived from the extracted prosodic data.
- the various operations/features can be implemented by one or more software routines executing on a processor (such as a microprocessor or DSP) or by dedicated hardware logic (i.e., such as an FPGA, an ASIC, PIA, etc.).
- a calibration routine can be stored and used on the client side or server side depending on the particular hardware and system configuration, performance requirements, etc.
- the extracted prosodic features can be varied according to the particular application, and can include data values which are related to one or more acoustic measures including one of PITCH, DURATION & ENERGY.
- the emotion state to be detected can be varied and can include for example at least one of STRESS & NON-STRESS; or CERTAINTY, UNCERTAINTY and/or DOUBT.
- a further aspect concerns a system and method for performing real-time emotion detection which performs the following steps: extracting selected acoustic features of a speech utterance; extracting syntactic cues relating to an emotion state of a speaker of the speech utterance; and classifying inputs from the prosody analyzer and the parts-of-speech analyzer and processing the same to output an emotion cue data value corresponding to the emotion state.
- Another aspect concerns a system/method training a real-time emotion detector which performs the following steps: presenting a series of questions to a first group of persons concerning a first topic (wherein the questions are configured to elicit a plurality of distinct emotion states from the first group of persons); recording a set of responses from the first group of persons to the series of questions; annotating the set of responses to include a corresponding emotion state; and training an emotion modeler based on the set of responses and corresponding emotion state annotations.
- an emotion modeler is adapted to be used in an emotion detector distributed between a client device and a server device.
- visual cues are also used to elicit the distinct emotion states.
- the annotations can be derived from Kappa statistics associated with a second group of reviewers.
- the emotion modeler can be transferred in electronic form to a client device or a server device, where it can be used to determine an emotion state of a speaker of an utterance.
- Still a further aspect of the invention concerns a real-time emotion detector which includes: a prosody analyzer configured to extract selected acoustic features of a speech utterance; a parts-of-speech analyzer configured to extract syntactic cues relating to an emotion state of a speaker of the speech utterance; a classifier configured to receive inputs from the prosody analyzer and the parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to the emotion state.
- a prosody analyzer configured to extract selected acoustic features of a speech utterance
- a parts-of-speech analyzer configured to extract syntactic cues relating to an emotion state of a speaker of the speech utterance
- a classifier configured to receive inputs from the prosody analyzer and the parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to the emotion state.
- the classifier is a trained Classification and Regression Tree classifier, which is trained with data obtained during an off-line training phase.
- the classifier uses a history file containing data values for emotion cues derived from a sample population of test subjects and using a set of sample utterances common to content associated with the real-time recognition system.
- emotion cue data value is in the form of a data variable suitable for inclusion within a SQL construct or some similar form of database query format.
- Systems employing the present invention can also use the emotion state to formulate a response by an interactive agent in a real-time natural language processing system.
- These interactive agents are found online, as well as in advanced interactive voice response systems which communicate over conventional phone lines with assistance from voice browsers, VXML formatted documents, etc.
- the interactive agent may be programmed to respond appropriately and control dialog content and/or a dialog sequence with a user of a speech recognition system in response to the emotion state. For example, callers who are confused or express doubt may be routed to another dialog module, or to a live operator.
- an emotion state can be used to control visual feedback presented to a user of the real-time speech recognition system.
- an emotion state can be used to control non-verbal audio feedback; for example, selection from potential “earcons” or hold music may be made in response to a detected emotion state.
- FIG. 1 is a block diagram of a preferred embodiment of an emotion analyzer distributed across a client/server computing architecture, and can be used as an interactive learning system, an e-commerce system, an e-support system, and the like;
- FIG. 2 illustrates a preferred embodiment of an emotion modeler and classifier of the present invention
- FIG. 4 is a diagram illustrating an activation - evaluation relationship implemented in preferred embodiments of the present invention.
- NLQS Natural Language Query System
- FIG. 3 the processing for NLQS 100 is generally distributed across a client side system 150 , a data link 160 , and a server-side system 180 .
- client side system 150 the processing for NLQS 100 is generally distributed across a client side system 150 , a data link 160 , and a server-side system 180 .
- server-side system 180 the processing for NLQS 100 is generally distributed across a client side system 150 , a data link 160 , and a server-side system 180 .
- These components are well known in the art, and in a preferred embodiment include a personal computer system 150 , an INTERNET connection 160 A, 160 B, and a larger scale computing system 180 .
- client-side system 150 could also be implemented as a computer peripheral, a PDA, as part of a cell-phone, as part of an INTERNET-adapted appliance, an INTERNET linked kiosk, etc.
- client-side system 150 could also be implemented as a computer peripheral, a PDA, as part of a cell-phone, as part of an INTERNET-adapted appliance, an INTERNET linked kiosk, etc.
- INTERNET connection is depicted for data link 160 A, it is apparent that any channel that is suitable for carrying data between client system 150 and server system 180 will suffice, including a wireless link, an RF link, an IR link, a LAN, and the like.
- server system 180 may be a single, large-scale system, or a collection of smaller systems interlinked to support a number of potential network users.
- the output of the partial processing done by SRE 155 is a set of speech vectors that are transmitted over communication channel 160 that links the user's machine or personal accessory to a server or servers via the INTERNET or a wireless gateway that is linked to the INTERNET as explained above.
- the partially processed speech signal data is handled by a server-side SRE 182 , which then outputs recognized speech text corresponding to the user's question.
- a text-to-query converter 184 formulates a suitable query that is used as input to a database processor 186 .
- database processor 186 locates and retrieves an appropriate answer using a customized SQL query from database 188 .
- a Natural Language Engine 190 facilitates structuring the query to database 188 . After a matching answer to the user's question is found, the former is transmitted in text form across data link 160 B, where it is converted into speech by text to speech engine 159 , and thus expressed as oral feedback by animated character agent 157 .
- the present invention features and incorporates cooperation between the following components:
- the key focus of this approach is to use the acoustic features extracted from representative speech samples as the mechanism for identifying the prosodic cues in real-time from a speech utterance and which can then be used to detect emotion states.
- Other components may be included herein without deviating from the scope of the present invention.
- An emotion modeler comprising the above implements the extraction of the speaker's emotion state, and uses the benefits from the optimization of the machine learning algorithms derived from the training session.
- the function of emotion detector 100 is to model the emotion state of the speaker. This model is derived preferably using the acoustic and syntactic properties of the speech utterance. Emotion is an integral component of human speech and prosody is the principal way it is communicated. Prosody—the rhythmic and melodic qualities of speech that are used to convey emphasis, intent, attitude and semantic meaning, is a key component in the recovery of the speaker's communication and expression embedded in a speech utterance.
- a key concept in emotion theory is the representation of emotion as a two-dimensional activation—evaluation space.
- the activation of the emotion state the vertical axis
- the activity of the emotion state e.g. exhilaration represents a high level of activation
- boredom involves a small amount of activation.
- the evaluation of the emotion state the horizontal axis
- the feeling associated with the emotional state For example, happiness is a very positive, whereas despair is very negative.
- Psychologists [see references 1, 2, 3, 4, 5 above] have long used this two dimensional circle to represent emotion states.
- the circumference of the circle defines the extreme limits of emotion intensity such as bliss, and the center of the circle is defined as the neutral point.
- Strong emotions such as those with high activation and very positive evaluation are represented on the periphery of the circle.
- An example of a strong emotion is exhilaration, an emotional state which is associated with very positive evaluation and high activation.
- Common emotions such as bored, angry etc. are placed within the circle at activation-evaluation coordinates calculated from values derived from tables published by Whissell referenced above.
- Pitch the fundamental frequency, FO of a speech utterance is the acoustic correlate of pitch. It is considered to be one of the most important attributes in expressing and detecting emotion. For this we extract FO and compute the mean, maximum, minimum and variance and standard deviation of FO. In some applications, of course, it may not be necessary or desirable to compute all such variables, and in other instances it may be useful to use additional frequency components (or derivatives thereof).
- Duration the duration of the syllables that make up the speech utterance also is a acoustic correlate from which an emotion cue can be extracted.
- the long duration of a syllable may infer an emotional state corresponding to doubt—DOUBT compared to alternate emotional state of certainty—CERTAINTY which in turn may be represented by a shorter time duration of the same syllable.
- FIG. 2 An emotion modeler and classifier system 200 of the present invention is shown in FIG. 2 .
- This system is trained with actual examples from test subjects to improve performance.
- This training data is generated based on Prosodic Feature Vectors calculated by a routine 230 .
- a data experiment is devised as follows: preferably a group of persons (i.e. in one preferred embodiment, students of a representative age comparable to the user group of students expected to use a natural language query system) is presented with a series of questions for which answers are to be articulated by the person. These questions are designed so that the expected elicited answers aided by visual cues exhibit emotions of CERTAINTY, UNCERTAINTY and DOUBT.
- questions that have obvious answers typically will have a response that is closely correlated to the emotion state of CERTAINTY and can be ascribed to be present in more than 90% of the answers, whereas questions which are difficult will elicit answers from which the person is not sure of and therefore contain the UNCERTAINTY emotion also in greater than 90% of the cases.
- the formulation of the questions can be performed using any of a variety of known techniques.
- Speech samples from these representative test subjects are recorded in a controlled environment—i.e. in a localized environment with low background noise.
- the speech as articulated by speakers speaking in different styles but with emphasis on the styles that represent the intended emotion modes that each sample requires.
- the recordings are preferably saved as .wav files and analysis performed using a speech tool such as the Sony Sound Forge and open source speech tools such as PRAAT [11] speech analyzer and the Edinburgh Speech Tools [12].
- PRAAT Sony Sound Forge
- open source speech tools such as PRAAT [11] speech analyzer and the Edinburgh Speech Tools [12].
- Other similar tools for achieving a similar result are clearly useable within the present invention. The analysis is discussed in the next section.
- Tone and Break Indices [13] annotation as illustrated in 210 ( FIG. 2 ) using the definitions and criteria for specific emotional states.
- ToBI Tone and Break Indices is a widely used annotation system for speech intonational analysis; again other annotation systems may be more appropriate for different applications.
- the emotion categories and criteria are as follows: Emotion Description CERTAINTY No disfluencies; fluent answer; high energy UNCERTAINTY Disfluencies present; additional questions asked by the user re clarification - what is meant etc. DOUBT Slower response; heavily disfluent; lower energy
- the emotion states described in the table above can be extended to include other emotion states.
- the CART decision tree algorithm 260 extends the decision tree method to handle numerical values and is particularly less susceptible to noisy or missing data.
- CART Classification and Regression Tree
- the CART technique uses a combination of statistical learning and expert knowledge to construct binary decision trees, which are formulated as a set of yes-no questions about the features in the raw data. The best predictions based on the training data are stored in the leaf nodes of the CART.
- Another approach is to prune the tree—i.e. the tree is first grown out to a large size, then it is cut back or pruned to its best size.
- Other well-known approaches can also be used of course, and may vary from application to application.
- the extracted acoustic features (as described in a following section Prosody Analysis, are extracted in 220 .
- Prosodic Feature Vectors as described previously are formed in 221 .
- the raw data, 290 for the Wagon CART is provided to the input of the Wagon CART.
- the output of the CART is sent to 250 .
- the optimization of the CART tree output results is done in 280 by comparing the CART results 270 with the ToBI labeled speech utterances of 210 . Once optimized, the trained CART trees are then outputted to 250 for later use.
- the emotion detector 100 is integrated with the NLQS system of the prior art ( FIG. 3 ). Specifically as shown in FIG. 1 , the emotion detector is preferably implemented in distributed configuration in which some functions reside at a client 110 , and other functions are at a server side 120 . As noted above, a speech recognition process is also distributed, so that a portion of speech operations is performed by hardware/software routines 115 . Like the NLQS distributed speech recognition process, a significant portion of the emotion modeling and detection is implemented at the client side by a prosody analyzer 118 . Data values that are extracted at the client side are transmitted to the server for incorporation in the SQL construct for the database query process, or incorporated in higher level logic of the dialog manager. In this way the turn-taking and control of the dialogue is significantly shaped by the emotion states extracted from the speaker's utterance.
- emotion detector 100 works in parallel with the speech recognition processes. It consists of three main sections:
- the outputs of the prosody analyzer 118 and the parts-of-speech analyzer 121 are fed preferably to a trained CART classifier 125 .
- This classifier 125 is trained with data obtained during the off-line training phase described previously.
- the data which populate the history file contained within the trained CART trees, 250 represent data values for the emotion cues derived from the sample population of test subjects and using the sample utterances common to the content in question.
- the content would include tutoring materials; in other commercial applications the content will vary of course depending on the designs, objectives and nature of a vendor/operator's business.
- the prosody analysis as noted above is preferably based on three key acoustic features—Fundamental Frequency (FO), Amplitude (RMS) and Duration (DUR), extracted in real-time from the utterance. These features and derivatives of the features as described in Table 1 are used as inputs by the trained classifier 125 . Again this is not intended to be an exhaustive list, and other prosodic parameters could be used in many applications.
- FO Full Frequency
- RMS Amplitude
- DUR Duration
- the calibration routine 130 uses a test utterance from which a baseline is computed for one or more acoustic features that are extracted by the prosody analysis block 118 .
- the test utterance includes a set of phrases, one of which contains significant stress or accent or other emotion indicator from which a large shift in the fundamental frequency (FO), or pitch can be calculated. Other acoustic correlates such as amplitude and duration can also be calculated.
- This test utterance as in the analogous case of the signal-to-noise ratio calibration of speech recognition routines, allows the system to automatically compute a calibration baseline for the emotion detector/modeler while taking into account other environmental variables.
- the parts-of-speech analysis from routine(s) 121 yields syntactic elements from which emotion cues can be derived.
- the acoustic analyzer routine(s) 118 in turn yields separate data values for the prosodic acoustic correlates which are also related to emotion cues. These two categories of data are then inputted to a decision tree 125 where the patterns are extracted to estimate for the emotion state embedded in the of the speaker's utterance.
- the real-time classifier is preferably based on the Classification and Regression Tree (CART) procedure, a widely used decision tree-based approach for extracting and mining patterns from raw data.
- CART Classification and Regression Tree
- This procedure introduced by Breiman, Freidman, Olshen, Stone in 1984, is basically a flow chart or diagram that represents a classification system or model. The tree is structured as a sequence of simple questions, and the answers to these questions trace a path down the tree. The end point reached determines the classification or prediction made by the model.
- the emotion cue data value output from CART decision tree 125 can be in the form of a data variable suitable for inclusion within a SQL construct, such as illustrated in the aforementioned U.S. Pat. No. 6,165,172.
- the detected emotion state can also be used by an interactive agent to formulate a response, control dialog content and/or a dialog sequence, control visual feedback presented to a user, control non-verbal audio feedback such as selecting one or more audio recordings, etc., and as such are correlated/associated with different user emotion states.
- the prosodic data is also preferably sent in a packet stream, which may or may not also include the extracted acoustic feature data for a speech recognition process, i.e., such as cepstral coefficients.
- the prosodic data and acoustic feature data are packaged within a common data stream to be sent to the server, but it may be desirable to separate the two into different sessions depending on channel conditions and capabilities of a server. For example the latter may not include a prosody capability, and therefore emotion detection may need to be facilitated by a separate server device.
- the prosodic data and acoustic feature data can be transmitted using different priorities. For example, if for a particular application prosodic data is more critical, than computations for prosodic data can be accelerated and a higher priority given to packets of such data in a distributed environment. In some instances because of the nature of the data communicated, it may be desirable to format a prosodic data packet with different payload than a corresponding speech recognition data packet (i.e., such as an MFCC packet sent via an RTP protocol for example). Other examples will be apparent to those skilled in the art.
- emotion detection/prosodic analysis operations can be distributed across the client device and server device on a case-by-case basis to achieve a real-time performance, and configured during an initialization procedure (i.e., such as within an MRCP type protocol).
- An amount of prosodic data to be transferred to said server device can be determined on a case-by-case basis in accordance with one or more of the following parameters: a) computational capabilities of the respective devices; b) communications capability of a network coupling the respective devices; c) loading of said server device; d) a performance requirement of a speech recognition task associated with a user.
- This project will yield a computer-based spoken language training system that will begin to approximate the benefits provided by a one-on-one tutor-student session. This system will decrease the costs of tutoring as well as help compensate for the lack of human (tutor) resources in a broad set of educational settings.
- a successful training system will be able to tap into a large commercial training market as well as, adults with retraining needs that result from technology or process changes in the workplace, other employment dislocations or career changes, students with learning disabilities and remedial needs and students who are working on advanced topics beyond the scope of assistance available in their classroom.
- the prosodic information contained in speech can be extracted automatically and used to assist the computer in guiding the tutoring dialog.
- enhanced dialog system capabilities incorporating semantic and prosodic understanding, will here be designed and constructed to enable an intelligent tutoring system to simulate the responsive and efficient techniques associated with good one-on-one human tutoring and to thereby approach an optimal learning environment.
- the mnemonic ITS is used in the literature to signify an Intelligent Tutoring System. In the context of this proposal, ITS is used to refer to our proposed system—an Interactive Training System. We also wish to clarify that training involves tutoring on a one-on-one basis or in a classroom setting.
- the challenge then for a computer-based tutoring system such as the one proposed is to emulate the desirable human one-on-one tutoring environment.
- Human one-on-one tutors interact with students via natural language—they prompt to construct knowledge, give explanations, and assess the student's understanding of the lesson. Most importantly, tutors give and receive additional linguistic cues during the lesson about how the dialogue is progressing.
- the cues received by the one-on-one tutor give the tutor information about the student's understanding of the material and allow the tutor to determine when a tutoring strategy is working or is not working. Natural language therefore is an important modality for the student/one-on-one tutor environment.
- a further important requirement is that the system must not ignore signs of confusion or misconception as the presentation evolves.
- the interactive training system like its human counterpart, must detect and understand cues contained in the student's dialogue and be able to alter or tailor its response and its tutoring strategies.
- Published results on cognition and spoken dialog indicate that human tutors rely on subtle content present in the student's dialog to guide their participation in a meaningful, enjoyable and effective dialogue, thus augmenting the student's learning performance.
- Tsukahara and Ward have described systems which use the user's internal state such as feelings of confidence, confusion and pleasure—as expressed and inferred from the prosody 2 of the user's utterances and the context, and the use of this information to select the most appropriate acknowledgement form at each moment 3.
- the interactive training system in its goal of emulating a human tutor must identify the emotional states contained in the student's dialogue and apply it to the dialogue management in such a way that the specific task at hand—reinforcement of a concept, or spotting a misconception, or evaluation of progress—can be accomplished in real-time during the course of the dialog.
- More recently Litman [Litman, Forbes-Riley, 2004] reports that acoustic-prosodic and lexical features can be used to identify and predict student emotion in computer-human tutoring dialogs. They identify a simple two-way (emotion/non-emotion) and three-way classification schemes (negative/neutral/positive). Additionally, other researchers [Holzapfel et al, 2002] have also explored the use of emotions for dialog management strategies that assist in minimizing the misunderstanding of the user and thus improve user acceptance. 2
- prosody is generally used to refer to aspects of a sentence's pronunciation which are not described by the sequence of phones derived from the lexicon. It includes the whole class of variations in voice pitch and intensity that have linguistic functions.
- the vision that guides this research proposal is the goal of creating a spoken language interactive training system that mimics and captures the strategies of a one-on-one human tutor, because learning gains have been shown to be high for students tutored in this fashion.
- the dialog manager for the proposed interactive training system must therefore be designed to accommodate the unique requirements specific to the tutoring domain. This design stands in contrast to currently existing dialog management strategies for an information-type domain which have discourse plans that are either elaborate or based on form-filling or finite state machine approaches.
- the dialog manager for the proposed ITS must combine low-level responsive dialogue activities with its high level educational plan. Put another way, the dialog manager for our ITS must interweave high-level tutorial strategy with on-the-fly adaptive planning.
- the detection of emotion in the student's utterances is important for the tutorial domain because the detection of any negative emotion - such as confusion, boredom, irritation, intimidation, or conversely positive state such as confidence, enthusiasm in the student can allow the system to provide a more appropriate response, thus better emulating the human one-on-one tutor environment [Forbes-Riley, Litman, 2004].
- each cognitive-based agent will be assigned a task or function such as assessing the student's performance, or creating a profile or characterization of the student before and after the lesson;
- CARMEL Core component for Assessing the Meaning of Explanatory Language is a language understanding framework for intelligent developed by the CIRCLE group—a joint center between Univ. of Pittsburgh and CMU and funded by the National Science Foundation.
- OAA Open Agent Architecture is an open framework developed by the Stanford Research Institute for integrating the various components that comprise a software system such as the proposed spoken language ITS. Specifically it is a piece of middleware that supports C++, Java, Lisp and Prolog and enables one to rapidly prototype components into a system.
- Objective 1 To implement an algorithm for real time prosody modeling based on the prosodic characteristics of speech in order to extract and classify acoustic-prosodic characteristics contained in the student's speech.
- Speech as a rich medium of communication, contains acoustic correlates such as pitch, duration, amplitude which are related to the speaker's emotion.
- the objective in Phase I will focus on developing techniques to extract such acoustic correlates related to two specific conditions—STRESS and NON-STRESS, to classify these conditions using machine learning algorithms with sufficient accuracy and then develop an algorithm for a real-time prosody modeling that can be implemented as a module of the ITS.
- the anticipated outcome of this objective will be a software algorithm which analyzes the student's dialog in real-time, and outputs data values corresponding to the prosody characteristics embedded in the student's speech. This algorithm will be extended in Phase II to cover additional emotion states and the data values used then in the operation of the dialog manager.
- Obective 2 To implement the front end of the ITS comprised of the Speech Recognition, Natural Language and the real-time prosody modeling module (developed in Objective 1), so that the emotional state detection algorithm can be tested in a system setting. This algorithm extracts acoustic-prosodic cues from the speech corpora, and maps these to data-driven values representing emotional states.
- the expected outcome of this objective at the end of Phase I is the prototype of the front-end of the proposed spoken language ITS architecture—i.e. the Speech Recognition, Emotion Detection and Natural Language modules.
- This front-end will be prototyped within the Open Agent Architecture (OAA) environment and will serve as an important step in proving the feasibility of interfacing the spoken language interface with real-time emotion detection and testing the algorithm developed in Objective 1. Additionally, these modules are important to the planned dialog management schemes for this tutoring domain.
- the dialog manager and other modules such as the text to speech synthesis agent and the speech error compensation strategies as well as questions 2 and 3 will be addressed in Phase II.
- Phase II Version 1.0 of the spoken language interactive training system will be completed. Additionally during Phase II, other tasks such as integrating an interface to the widely-used authoring tools, Authorware and Director, and testing the system using live subjects in real situations will be completed.
- the spoken language interactive training system that we propose to build over the course of the SBIR Phase I and Phase II effort serves both a long term objective as well as the immediate Phase I project objective.
- the immediate objectives for this Phase I component of the project are to automatically identify, classify and map in real-time a number of acoustic-prosodic cues to emotional states embedded in a typical student's dialog. These data-driven values corresponding to these states will then be used to assist in formulating the dialog strategies for the ITS.
- the long term objective of the research is to build a spoken language-based ITS system that incorporates dialog control strategies that also incorporate emotional cues contained in the utterances of the student's dialogue.
- Another key long term objective is to develop and incorporate error-handling strategies into the dialog manager to compensate for speech recognition errors that occur within the speech recognition transcription process. This key objective ensures that the dialog remains robust, stable and stays on track so that the user experience is productive, engaging and enjoyable.
- Phase I we will pursue the following two key objectives: (1) development of an algorithm for real-time prosody modeling based on the acoustic-prosodic characteristics of speech; (2) implementation of the front-end of this spoken language ITS—i.e. the Speech Recognition, Natural Language and the real-time prosody modeler.
- the spoken language interactive training system uses traditional components required for implementing a spoken dialogue system.
- Spoken language systems are in general, complex frameworks involving the integration of several components such as speech recognition, speech synthesis, natural language understanding and dialog management as in an information retrieval application using spoken language interfaces.
- the representative functions of each component are:
- AutoTutor domain: computer literacy
- CIRCSIM domain: Newtonian mechanics
- ATLAS-ANDES domain: circulatory system
- AutoTutor's DM is an adaptation of the form-filling approach to tutorial dialogue. It relies on a curriculum script, a sequence of topic formats, each of which contains a main focal question and an ideal answer.
- Speaktomi's proposed architecture for its spoken language ITS is based on a configuration of modular components functioning as software agents and adhering to the SRI Open Agent Architecture (OM) 6 framework as shown in FIG. 1a .
- OM Open Agent Architecture
- OM allows rapid and flexible integration of software agents in a prototyping development environment. Because these components can be coded in different languages, and run on different platforms, the OAA framework is an ideal environment for rapid software prototyping and facilitates ease in adding or removing software components.
- agent refers to a software process that meets the conventions of the OAA framework, where communication between each agent using the Interagent Communication Language (ICL) is via a solvable—a specific query that can be solved by special agents.
- ICL Interagent Communication Language
- FIG. 1b shows a high-level view of specific software agents that comprise the speech-enabled ITS. Although each agent is connected to a central hub or Facilitator, there is a functional hierarchy which describes each agent and the flow of messages between them as shown in FIG. 3 . This diagram illustrates the functional dependencies between the various blocks and the message interfaces between them.
- the architecture will support the following software agents: user interface agents (microphone input and audio speaker output), the speech recognition agent, prosody modeler agent, natural language (NL) agent, dialog manager (DM) agent, synthesis agent, inference, history and knowledge base.
- user interface agents microphone input and audio speaker output
- speech recognition agent speech recognition agent
- prosody modeler agent natural language agent
- NL agent natural language agent
- dialog manager agent synthesis agent
- inference history and knowledge base.
- pitch in English can be defined at four levels—low, mid, high or extra high; and having three terminal contours—fading, rising, or sustained.
- Fundamental frequency measurements will be used to characterize syllable or sub-syllable level pitch contours with this vocabulary. These characterizations along with the other acoustic measures of amplitude and duration, along with pitch range, will be used to classify syllables by phrasal stress levels.
- a speech corpus in the form of a database or a corpus containing a set of files from one of several recognized linguistic repositories 7 .
- the corpus sourced from the Oregon graduate Institute is supplied with a phonetic transcription file with each speech file. If the sourced files are not annotated, the files will be manually marked or linguistically annotated 8 in terms of prosodic stress by a pair of linguistically trained individuals. In order to provide more robustness for the experimental task, each of two subsets of files will be annotated by a manual transcriber. In addition, we will use a Jack-knifing training and testing procedure.
- the WEKA environment is a flexible environment—it provides the capability to do cross validation and comparisons between the various machine learning schemes—for example, we will be able to compare the classifications generated of each machine learning algorithm such as K-Nearest Neighbors, AdaBoost, CART decision trees, and the rule-based RIPPER (Repeated Incremental Pruning to Produce Error Reduction) and to generate optimum parameters for the real-time prosodic modeling algorithm based on acoustic feature importance, acoustic feature usage and accuracy rate. In this way we will assess which classification scheme most accurately predicts the prosodic models. Confusion matrices will then be used to represent and compare the recognition accuracy as a percentage of stress levels and the two-way classifications generated by the different classifiers.
- Phase I The expected outcome of this objective at the end of Phase I is a prototype of the front-end of the ITS architecture defined in Objective 2—i.e. the speech recognition, natural language and prosody modeler modules as shown in FIG. 3 .
- a second task will be to prepare a road-map and plan for Phase II.
- the Speaktomi will implement the front end of the speech-based ITS within the reference architecture discussed previously and based on a configuration of modular components functioning as software agents that adhere to a software framework called the Open Agent Architecture (OAA) 15 .
- the SRI Open Agent Architecture (OAA) is a framework for integrating the various components that comprise a spoken dialogue system such as the FASTER ITS. Specifically it is a piece of middleware that supports C++, Java, Lisp and Prolog and enables one to rapidly prototype components into a system.
- the OAA-based speech recognition agent will be created by writing a software wrapper for the SRI EduSpeak speech recognition engine.
- This engine incorporates specific features required for education and tutoring applications such pronunciation grading and a broad array of interfaces to multimedia development tools and languages—Director, Authorware, Flash, Active X, Java and C/C++.
- the EduSpeak SR engine works for adult and child voices as well as native and non-native speakers.
- the key performance enablers of this SR engine are: high speech recognition accuracy, speaker-independent recognition, requires no user training, has a small, scalable footprint dependent on vocabulary requirements, supports unlimited-size dynamically loadable grammars, and supports statistical language models (SLM).
- SLM statistical language models
- Speaktomi's unique technology is critical to the next stage of e-Learning and computer based training tools.
- the leaders in the e-Learning provider market such as IBM, Docent, WBT and Saba Software are seeing increasing traction in this space mostly through their deployment of Learning Management Systems or LMS which store learning content.
- the next wave of innovation in the space is improving the process for the content creation and improving the ease and effectiveness of student interaction.
- the critical need to improve content is to provide the right kinds of tools for building learning environments that are easier to deploy and easier to use.
- Speaktomi's technology by supporting voice interaction by the student with the e-Learning content, provides the critical ease of use platform that e-Learning tools developers need to make their systems more user-friendly and easier to interact with.
- content creators are able to more intuitively gauge student understanding and concern through a voice interface, it will ease their conceptual workload in creating more engaging content that will not have to create exhaustive cases to gauge user feedback on content that has been presented.
- Speaktomi seeks not only to provide embedded technology to the corporate training software providers, but also to provide this technology for e-Learning for the U.S. education and training market eventually which had an overall market size of $772 billion in 2000 and a growth rate of over 9%. While speech technology may be considered a small component of a training solution, it is a critical user interface and interaction component that is extremely valuable. The size of this addressable market for Speaktomi is conservatively estimated to be $90 million. and $2.3 billion for the wider educational market. Clearly there is a substantial opportunity for a company focused on speech recognition and intelligent learning in both the corporate and wider educational e-Learning business.
- ITS ITS architecture which addresses the issue of student understanding, so as to raise the level of performance by 1 to 2 standard deviation units.
- This level of tutorial performance would allow our system to be adopted by more users and to be used more effectively in the e-learning market.
- Authorware and Director our spoken language ITS will provide a direct and effective mechanism whereby the technology could be rapidly adopted by the existing educational customer base.
- this programming interface will allow legacy educational content to be accessed by the ITS; and in the future be extended to other commercial educational platforms and tools.
- the resulting benefits that would accrue include the features of an advanced intelligent training system that significantly raise the students' performance.
- the business model for marketing and selling Speaktomi's technology will be based on the following:
Abstract
Description
- The present application claims priority to provisional application Ser. No. 60/633,239 filed Dec. 3, 2004 which is hereby incorporated by reference herein.
- The invention relates to a system and an interactive method for detecting and processing prosodic elements of speech based user inputs and queries presented over a distributed network such as the Internet or local intranet. The system has particular applicability to such applications as remote learning, e-commerce, technical e-support services, Internet searching, etc.
- Emotion is an integral component of human speech and prosody is the principal way it is communicated. Prosody—the rhythmic and melodic qualities of speech that are used to convey emphasis, intent, attitude and semantic meaning, is a key component in the recovery of the speaker's communication and expression embedded in his or hers speech utterance. Detection of prosody and emotional content in speech is known in the art, and is discussed for example in the following representative references which are incorporated by reference herein: U.S. Pat. No. 6,173,260 to Slaney; U.S. Pat. No. 6,496,799 to Pickering; U.S. Pat. No. 6,873,953 to Lenning; U.S. Publication No. 2005/0060158 to Endo et al.; 2004/0148172 to Cohen et al; U.S. Publication No. 2002/0147581 to Shriberg et al.; and U.S. Publication No. 2005/0182625 to Azara et al. Training of emotion modelers is also known as set out for example in the following also incorporated by reference herein:
-
- 1. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees, Chapman & Hall, New York, 1984.
- 2. Schlosberg, H., A scale for the judgment of facial expressions, J of Experimental Psychology, 29, 1954, pages 497-510.
- 3. Plutchik, R., The Psychology and Biology of Emotion, Harper Collins, New York 1994.
- 4. Russell, J. A., How shall an Emotion be called, in R. Plutchik & H. Conte (editors), Circumplex Models of Personality and Emotion, Washington, APA, 1997.
- 5. Whissell, C., The Dictionary of Affect in Language, in R. Plutchik & H. Kellerman, Editors, Emotion: Theory, Research & Experience, Vol. 4, Academic Press, New York 1959.
- 6. ‘FEELTRACE’: An Instrument for Recording Perceived Emotion in Real Time, Ellen Douglas-Cowie, Roddy Cowie, Marc Schröder: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research Pages 19-24, Textflow, Belfast, 2000.
- 7. Silverman, K., Beckman, M., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J. & Hirschberg, J. (1992), A standard for labelling english prosody, in ‘Proceedings of the International Conference on Spoken Language Processing (ICSLP)’, Vol. 2, Banff, pp. 867-870.
- 8. Shriberg, E., Taylor, P., Bates, R., Stolcke, A., Ries, K., Jurafsky, D., Coccaro, N., Martin, R., Meteer, M.& Ess-Dykema, C. (1998), ‘Can prosody aid the automatic classification of dialog acts in conversational speech?’, Language and Speech 41(3-4), 439-487.
- 9. Grosz, B. & Hirshberg, J. (1992), Some intonational characteristics of discourse structure, in ‘Proceedings of the International Conference on Spoken Language Processing’, Banff, Canada, pp. 429-432.
- 10. Grosz, B. & Sidner, C. (1986), ‘Attention, intentions, and the structure of discourse’, Computational Linguistics 12, 175-204.
- 11 P. Boersma, D. Weenink, PRAAT, Doing Phonetics by Computer, Institute of Phonetic Sciences, University of Amsterdam, Netherlands, 2004, hhtp://www.praat.org
- 12. Taylor, P., R. Caley, A. W. Black and S. King, Chapter 10, Classification and Regression Trees, Edinburgh Speech Tools Libray, System Documentation, Edition 1.2, hxxp://festvox.org/docs/speech_tools-1.2.0/c16616.htm (replace xx with “tt”) Centre for Speech Technology, Univ. of Edinburgh, (2003)
- 13. Beckman, M. E. & G. Ayers Elam, (1997): Guidelines for ToBI labelling, version 3. The Ohio State University Research Foundation, hxxp://www.ling.ohio-state.edu/research/phonetics/E_ToBI/ (replace xx with “tt”)
- Conversely, real-time speech and natural language recognition systems are also known in the art, as depicted in Applicant's prior patents, including U.S. Pat. No. 6,615,172 which is also incorporated by reference herein. Because of the significant benefits offered by prosodic elements in identifying a meaning of speech utterances (as well as other human input), it would be clearly desirable to integrate such features within the aforementioned Bennett et al. speech recognition/natural language processing architectures. Nonetheless to do this a prosodic analyzer must also operate in real-time and be distributable across a client/server architecture. Furthermore to improve performance, a prosodic analyzer should be trained/calibrated in advance.
- An object of the present invention, therefore, is to provide an improved system and method for overcoming the limitations of the prior art noted above;
- A primary object of the present invention is to provide a prosody and emotion recognition system that is flexibly and optimally distributed across a client/platform computing architecture, so that improved accuracy, speed and uniformity can be achieved for a wide group of users;
- Another object of the present invention, therefore, is to provide an improved system and method for formulating SQL queries that includes parameters based on user emotional content;
- A further object of the present invention is to provide a speech and natural language recognition system that efficiently integrates a distributed prosody interpretation system with a natural language processing system, so that speech utterances can be quickly and accurately recognized based on literal content and user emotional state information;
- A related object of the present invention is to provide an efficient mechanism for training a prosody analyzer so that the latter can operate in real-time.
- A first aspect of the invention concerns a system and method for incorporating prosodic features while performing real-time speech recognition distributed across a client device and a server device. The SR process typically transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients. In a preferred embodiment this aspect of the invention extracts prosodic features from the utterance to generate extracted prosodic data; transfers the extracted prosodic data with the extracted acoustic feature data to the server device; and recognizes an emotion state of a speaker of the utterance based on at least the extracted prosodic data. In this manner operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.
- In other embodiments the operations are distributed across the client device and server device on a case-by-case basis. A parts-of-speech analyzer is also preferably included for identifying a first set of emotion cues based on evaluating a syntax structure of the utterance. In addition a preferred embodiment includes a real-time classifier for identifying the emotion state based on the first set of emotion cues and a second set of emotion cues derived from the extracted prosodic data.
- In a system employing this aspect of the invention, the various operations/features can be implemented by one or more software routines executing on a processor (such as a microprocessor or DSP) or by dedicated hardware logic (i.e., such as an FPGA, an ASIC, PIA, etc.). A calibration routine can be stored and used on the client side or server side depending on the particular hardware and system configuration, performance requirements, etc.
- The extracted prosodic features can be varied according to the particular application, and can include data values which are related to one or more acoustic measures including one of PITCH, DURATION & ENERGY. Correspondingly, the emotion state to be detected can be varied and can include for example at least one of STRESS & NON-STRESS; or CERTAINTY, UNCERTAINTY and/or DOUBT.
- A further aspect concerns a system and method for performing real-time emotion detection which performs the following steps: extracting selected acoustic features of a speech utterance; extracting syntactic cues relating to an emotion state of a speaker of the speech utterance; and classifying inputs from the prosody analyzer and the parts-of-speech analyzer and processing the same to output an emotion cue data value corresponding to the emotion state.
- Another aspect concerns a system/method training a real-time emotion detector which performs the following steps: presenting a series of questions to a first group of persons concerning a first topic (wherein the questions are configured to elicit a plurality of distinct emotion states from the first group of persons); recording a set of responses from the first group of persons to the series of questions; annotating the set of responses to include a corresponding emotion state; and training an emotion modeler based on the set of responses and corresponding emotion state annotations. In this fashion, an emotion modeler is adapted to be used in an emotion detector distributed between a client device and a server device.
- In certain preferred embodiments visual cues are also used to elicit the distinct emotion states. The annotations can be derived from Kappa statistics associated with a second group of reviewers. The emotion modeler can be transferred in electronic form to a client device or a server device, where it can be used to determine an emotion state of a speaker of an utterance.
- Still a further aspect of the invention concerns a real-time emotion detector which includes: a prosody analyzer configured to extract selected acoustic features of a speech utterance; a parts-of-speech analyzer configured to extract syntactic cues relating to an emotion state of a speaker of the speech utterance; a classifier configured to receive inputs from the prosody analyzer and the parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to the emotion state. In this manner an emotion state is determined by evaluating both individual words and an entire sentence of words uttered by the user.
- In preferred embodiments the classifier is a trained Classification and Regression Tree classifier, which is trained with data obtained during an off-line training phase. The classifier uses a history file containing data values for emotion cues derived from a sample population of test subjects and using a set of sample utterances common to content associated with the real-time recognition system. In the end emotion cue data value is in the form of a data variable suitable for inclusion within a SQL construct or some similar form of database query format.
- Systems employing the present invention can also use the emotion state to formulate a response by an interactive agent in a real-time natural language processing system. These interactive agents are found online, as well as in advanced interactive voice response systems which communicate over conventional phone lines with assistance from voice browsers, VXML formatted documents, etc. The interactive agent may be programmed to respond appropriately and control dialog content and/or a dialog sequence with a user of a speech recognition system in response to the emotion state. For example, callers who are confused or express doubt may be routed to another dialog module, or to a live operator.
- In some preferred embodiments an emotion state can be used to control visual feedback presented to a user of the real-time speech recognition system. Alternatively, in an application where display space is limited or non-existent, an emotion state can be used to control non-verbal audio feedback; for example, selection from potential “earcons” or hold music may be made in response to a detected emotion state.
- In other preferred embodiments an amount of prosodic data to be transferred to the server device is determined on a case by case basis in accordance with one or more of the following parameters: a) computational capabilities of the respective devices; b) communications capability of a network coupling the respective devices; c) loading of the server device; d) a performance requirement of a speech recognition task associated with a user query. The both prosodic data and acoustic feature data may or may not be packaged within a common data stream as received at the server device, depending on the nature of the data, the content of the data streams, available bandwidth, prioritizations required, etc. Different payloads may be used for transporting prosodic data and acoustic feature data for speech recognition within their respective packets.
- It will be understood from the Detailed Description that the inventions can be implemented in a multitude of different embodiments. Furthermore, it will be readily appreciated by skilled artisans that such different embodiments will likely include only one or more of the aforementioned objects of the present inventions. Thus, the absence of one or more of such characteristics in any particular embodiment should not be construed as limiting the scope of the present inventions. Furthermore, while the inventions are presented in the context of certain exemplary embodiments, it will be apparent to those skilled in the art that the present teachings could be used in any application where it would be desirable and useful to implement fast, accurate speech recognition, and/or to provide a human-like dialog capability to an intelligent system.
-
FIG. 1 is a block diagram of a preferred embodiment of an emotion analyzer distributed across a client/server computing architecture, and can be used as an interactive learning system, an e-commerce system, an e-support system, and the like; -
FIG. 2 illustrates a preferred embodiment of an emotion modeler and classifier of the present invention; -
FIG. 3 is a block diagram of a prior art natural language query system (NLQS); -
FIG. 4 is a diagram illustrating an activation - evaluation relationship implemented in preferred embodiments of the present invention. - Brief Overview of Natural Language Query Systems As alluded to above, the present inventions are intended to be integrated as part of a Natural Language Query System (NLQS) such as that shown in
FIG. 3 which is configured to interact on a real-time basis to give a human-like dialog capability/experience for e-commerce, e-support, and e-learning applications. As seen inFIG. 3 the processing forNLQS 100 is generally distributed across aclient side system 150, adata link 160, and a server-side system 180. These components are well known in the art, and in a preferred embodiment include apersonal computer system 150, anINTERNET connection scale computing system 180. It will be understood by those skilled in the art that these are merely exemplary components, and that the present invention is by no means limited to any particular implementation or combination of such systems. For example, client-side system 150 could also be implemented as a computer peripheral, a PDA, as part of a cell-phone, as part of an INTERNET-adapted appliance, an INTERNET linked kiosk, etc. Similarly, while an INTERNET connection is depicted fordata link 160A, it is apparent that any channel that is suitable for carrying data betweenclient system 150 andserver system 180 will suffice, including a wireless link, an RF link, an IR link, a LAN, and the like. Finally, it will be further appreciated thatserver system 180 may be a single, large-scale system, or a collection of smaller systems interlinked to support a number of potential network users. - Initially speech input is provided in the form of a question or query articulated by the speaker at the client's machine or personal accessory as a speech utterance. This speech utterance is captured and partially processed by NLQS client-
side software 155 resident in the client's machine. To facilitate and enhance the human-like aspects of the interaction, the question is presented in the presence of an animated character 157 visible to the user who assists the user as a personal information retriever/agent. The agent can also interact with the user using both visible text output on a monitor/display (not shown) and/or in audible form using a text tospeech engine 159. The output of the partial processing done bySRE 155 is a set of speech vectors that are transmitted overcommunication channel 160 that links the user's machine or personal accessory to a server or servers via the INTERNET or a wireless gateway that is linked to the INTERNET as explained above. - At
server 180, the partially processed speech signal data is handled by a server-side SRE 182, which then outputs recognized speech text corresponding to the user's question. Based on this user question related text, a text-to-query converter 184 formulates a suitable query that is used as input to adatabase processor 186. Based on the query,database processor 186 then locates and retrieves an appropriate answer using a customized SQL query fromdatabase 188. A Natural Language Engine 190 facilitates structuring the query todatabase 188. After a matching answer to the user's question is found, the former is transmitted in text form acrossdata link 160B, where it is converted into speech by text tospeech engine 159, and thus expressed as oral feedback by animated character agent 157. - Because the speech processing is broken up in this fashion, it is possible to achieve real-time, interactive, human-like dialog consisting of a large, controllable set of questions/answers. The assistance of the animated agent 157 further enhances the experience, making it mote natural and comfortable for even novice users. To make the speech recognition process more reliable, context-specific grammars and dictionaries are used, as well as natural language processing routines at NLE 190, to analyze user questions lexically. By optimizing the interaction and relationship of the
SR engines FIG. 3 , please see U.S. Pat. No. 6,615,172. - Overview of System for Real Time Emotion Detection
- The present invention features and incorporates cooperation between the following components:
- 1. a data acquisition component which utilizes speech utterances from test subjects.
- 2. a prosodic extraction component for extracting prosodic related acoustic features in real-time preferably from speech utterances.
- 3. a comparator component which applies machine learning to the datasets—i.e. the dataset corresponding to the features extracted from the speech samples are fed to a decision tree-based machine learning algorithm.
- 4. Decision trees implemented using algorithms learned from the dataset effectuate the decision tree used in the real-time emotion detector.
- The key focus of this approach is to use the acoustic features extracted from representative speech samples as the mechanism for identifying the prosodic cues in real-time from a speech utterance and which can then be used to detect emotion states. Other components may be included herein without deviating from the scope of the present invention.
- An emotion modeler comprising the above implements the extraction of the speaker's emotion state, and uses the benefits from the optimization of the machine learning algorithms derived from the training session.
- Emotion Detector
- The function of emotion detector 100 (
FIG. 1 ) is to model the emotion state of the speaker. This model is derived preferably using the acoustic and syntactic properties of the speech utterance. Emotion is an integral component of human speech and prosody is the principal way it is communicated. Prosody—the rhythmic and melodic qualities of speech that are used to convey emphasis, intent, attitude and semantic meaning, is a key component in the recovery of the speaker's communication and expression embedded in a speech utterance. - A key concept in emotion theory is the representation of emotion as a two-dimensional activation—evaluation space. As seen in
FIG. 4 , the activation of the emotion state—the vertical axis, represents the activity of the emotion state, e.g. exhilaration represents a high level of activation, whereas boredom involves a small amount of activation. The evaluation of the emotion state—the horizontal axis, represents the feeling associated with the emotional state. For example, happiness is a very positive, whereas despair is very negative. Psychologists [see references 1, 2, 3, 4, 5 above] have long used this two dimensional circle to represent emotion states. The circumference of the circle defines the extreme limits of emotion intensity such as bliss, and the center of the circle is defined as the neutral point. Strong emotions such as those with high activation and very positive evaluation are represented on the periphery of the circle. An example of a strong emotion is exhilaration, an emotional state which is associated with very positive evaluation and high activation. Common emotions such as bored, angry etc. are placed within the circle at activation-evaluation coordinates calculated from values derived from tables published by Whissell referenced above. - Representative Prosodic Features
- Pitch—the fundamental frequency, FO of a speech utterance is the acoustic correlate of pitch. It is considered to be one of the most important attributes in expressing and detecting emotion. For this we extract FO and compute the mean, maximum, minimum and variance and standard deviation of FO. In some applications, of course, it may not be necessary or desirable to compute all such variables, and in other instances it may be useful to use additional frequency components (or derivatives thereof).
- Energy—the energy of the speech utterance is an acoustic correlate of the loudness of the speech utterance of the speaker. For example, high energy in a speech utterance is associated with high activation of the emotion state. Conversely, low energy levels of the speech utterance are associated with emotion states with low activation values.
- Duration—the duration of the syllables that make up the speech utterance also is a acoustic correlate from which an emotion cue can be extracted. For example, the long duration of a syllable, may infer an emotional state corresponding to doubt—DOUBT compared to alternate emotional state of certainty—CERTAINTY which in turn may be represented by a shorter time duration of the same syllable.
- In some applications, of course, it may not be necessary or desirable to compute all such variables, and in other instances it may be useful to use additional frequency, energy and/or duration components (or derivatives thereof). For example in many cases it may be useful to incorporate certain acoustic features (such as MFCCs, Delta MFCCs) changes in energy, and other well-known prosodic related data.
- Data Acquisition
- An emotion modeler and
classifier system 200 of the present invention is shown inFIG. 2 . This system is trained with actual examples from test subjects to improve performance. This training data is generated based on Prosodic Feature Vectors calculated by a routine 230. - To implement a training session, a data experiment is devised as follows: preferably a group of persons (i.e. in one preferred embodiment, students of a representative age comparable to the user group of students expected to use a natural language query system) is presented with a series of questions for which answers are to be articulated by the person. These questions are designed so that the expected elicited answers aided by visual cues exhibit emotions of CERTAINTY, UNCERTAINTY and DOUBT. For example, questions that have obvious answers typically will have a response that is closely correlated to the emotion state of CERTAINTY and can be ascribed to be present in more than 90% of the answers, whereas questions which are difficult will elicit answers from which the person is not sure of and therefore contain the UNCERTAINTY emotion also in greater than 90% of the cases. The formulation of the questions can be performed using any of a variety of known techniques.
- Speech samples from these representative test subjects are recorded in a controlled environment—i.e. in a localized environment with low background noise. The speech as articulated by speakers speaking in different styles but with emphasis on the styles that represent the intended emotion modes that each sample requires. The recordings are preferably saved as .wav files and analysis performed using a speech tool such as the Sony Sound Forge and open source speech tools such as PRAAT [11] speech analyzer and the Edinburgh Speech Tools [12]. Other similar tools for achieving a similar result are clearly useable within the present invention. The analysis is discussed in the next section.
- The recorded speech data is then played back and each sample is manually annotated preferably using Tone and Break Indices (ToBI) [13] annotation as illustrated in 210 (
FIG. 2 ) using the definitions and criteria for specific emotional states. ToBI=Tone and Break Indices is a widely used annotation system for speech intonational analysis; again other annotation systems may be more appropriate for different applications. - By using the ToBI annotation, one is able to derive the intonational events in speech from the human perception of speech intonation. Kappa statistics are then used to evaluate the consistency between the annotators. Kappa Coefficients are well known: K=[P(A)−P(E)]/1−P(E) where P(A), observed agreement, represents the proportion of times the transcribers agree, and P(E), agreement expected by chance. Again any number of statistical approaches may be employed instead.
- The emotion categories and criteria are as follows:
Emotion Description CERTAINTY No disfluencies; fluent answer; high energy UNCERTAINTY Disfluencies present; additional questions asked by the user re clarification - what is meant etc. DOUBT Slower response; heavily disfluent; lower energy - The emotion states described in the table above can be extended to include other emotion states.
- Feature Extraction
- Acoustic features are extracted by a routine shown as 220. Before the initiation of the feature extraction process, the speech samples are preferably re-sampled at a 44 kHz sampling rate to ensure higher fidelity speech sample and higher quality source data for the speech feature extraction tools. The PRAAT speech analysis tool and the Edinburgh Speech Tools (EST) are the preferred tools used to extract the training session speech features. Using scripts the PRAAT tool automatically extracts and archives of a large number of speech and spectrographic features from each speech sample. The EST library also contains a number of speech analysis tools from which other speech features such as linear predictive coefficients (LPC), cepstrum coefficients, mel-frequency cepstrum coefficients (MFCC), area, energy and power can be extracted. Most importantly the EST library includes Wagon, a CART
decision tree tool 260 which is used to extract prosodic patterns from the speech data. - Decision Tree Classifier Training
- Decision tree classifiers, such as shown in
FIG. 2 , are probabilistic classifiers that transform data inputted to it into a binary question based on the attributes of the data that is supplied. At each node of the decision tree, the decision tree will select the best attribute and question to be asked about the attribute for that particular node. The selection is based on the particular attribute and question about it so that it gives the best predictive value for the classification or bin. When the tree reaches the leaf nodes, the probability about the distribution of all instances in the branch is calculated, which is then used as predictors for the new raw data. The selection of the node splitting is based on an information theory-based concept called entropy—a measure of how much information some data contains. In the decision tree, entropy can be measured by looking at the purity of the resulting subsets of a split. For example, if a subset contains only one class it is purest; conversely, the largest impurity is defined as when all classes are equally mixed in the subset. See e.g., Breiman et al.,1984 referenced above). - The CART
decision tree algorithm 260 extends the decision tree method to handle numerical values and is particularly less susceptible to noisy or missing data. CART (Classification and Regression Tree) introduced by Breiman, Freidman, Olshen, Stone referenced above is a widely used decision tree-based procedure for data mining. The CART technique uses a combination of statistical learning and expert knowledge to construct binary decision trees, which are formulated as a set of yes-no questions about the features in the raw data. The best predictions based on the training data are stored in the leaf nodes of the CART. - During the training phase of the
CART decision tree 260, data is fed to the tree from aProsodic Description File 240 and training data from Prosodic Feature Vectors 230 and the values of key parameters such as stop value and balance are optimized so that the output results of the tree have maximum correspondence with the results of the manual annotations. - The specific and preferred CART used in the present invention is the Wagon CART of the Edinburgh Speech Tools library. Wagon CART consists of two separate applications—wagon for building the trees, and wagon_test for testing the decision trees with new data. Wagon supports two variables used in the tree-building process: a stop values for fine-tuning the tree to the training data set; the lower the value (i.e. the number of vectors in a node before considering a split), the more fine tuned and the larger the risk of an over-trained tree. If a low stop value is used, the over trained tree can be pruned using the hold out option, where a subset is removed from the training set and then used for pruning to build a smaller CART. The Wagon Cart requires a special structure of input—a prosodic feature vector (PFV)—i.e a vector that contains prosodic features in both predictor and predictees. Each row of this prosodic feature vector represents one predictee (a part of the PFV that has information about the class value, e.g. the accented class), and one or more predictors, each row having the same order of the predictors with the predictee as the first element in the row. The predictors are the values of the different prosodic cues that are selected. The size of the CART tree is optimized by means of the stopping criteria, which define the point when splitting of the nodes stops, i.e. when the purity of the node is highest. Another approach is to prune the tree—i.e. the tree is first grown out to a large size, then it is cut back or pruned to its best size. Other well-known approaches can also be used of course, and may vary from application to application. Referring to
FIG. 2 , the extracted acoustic features (as described in a following section Prosody Analysis, are extracted in 220. Then Prosodic Feature Vectors as described previously are formed in 221. The raw data, 290 for the Wagon CART is provided to the input of the Wagon CART. Then the output of the CART is sent to 250. The optimization of the CART tree output results is done in 280 by comparing the CART results 270 with the ToBI labeled speech utterances of 210. Once optimized, the trained CART trees are then outputted to 250 for later use. - Structure/Operation of Real-Time, Client Server Emotion Detector
- The
emotion detector 100 is integrated with the NLQS system of the prior art (FIG. 3 ). Specifically as shown inFIG. 1 , the emotion detector is preferably implemented in distributed configuration in which some functions reside at aclient 110, and other functions are at aserver side 120. As noted above, a speech recognition process is also distributed, so that a portion of speech operations is performed by hardware/software routines 115. Like the NLQS distributed speech recognition process, a significant portion of the emotion modeling and detection is implemented at the client side by aprosody analyzer 118. Data values that are extracted at the client side are transmitted to the server for incorporation in the SQL construct for the database query process, or incorporated in higher level logic of the dialog manager. In this way the turn-taking and control of the dialogue is significantly shaped by the emotion states extracted from the speaker's utterance. - Accordingly
emotion detector 100 as shown works in parallel with the speech recognition processes. It consists of three main sections: - 1. A
prosody analyzer 118 which operates based on extracted acoustic features of the utterance. - 2. A parts-of-
speech analyzer 121 which yields syntactic cues relating to the emotion state. - 3. A trained
classifier 125 that accepts inputs from theprosody analyzer 118 and the parts-of-speech analyzer and outputs data values which correspond to the emotion state embedded in the utterance. - The outputs of the
prosody analyzer 118 and the parts-of-speech analyzer 121 are fed preferably to a trainedCART classifier 125. Thisclassifier 125 is trained with data obtained during the off-line training phase described previously. The data which populate the history file contained within the trained CART trees, 250 represent data values for the emotion cues derived from the sample population of test subjects and using the sample utterances common to the content in question. For example, in an educational application, the content would include tutoring materials; in other commercial applications the content will vary of course depending on the designs, objectives and nature of a vendor/operator's business. - Prosody Analysis
- The prosody analysis as noted above is preferably based on three key acoustic features—Fundamental Frequency (FO), Amplitude (RMS) and Duration (DUR), extracted in real-time from the utterance. These features and derivatives of the features as described in Table 1 are used as inputs by the trained
classifier 125. Again this is not intended to be an exhaustive list, and other prosodic parameters could be used in many applications. As in the initialization of the speech recognition process at the client side, there is an analogous calibration procedure used to calibrate the speech and silence components of the speaker's utterance. The user initially articulates a sentence that is displayed visually, and thecalibration process 130 estimates the noise and other parameters required to find the silence and speech elements of future utterances. - Specifically, the
calibration routine 130 uses a test utterance from which a baseline is computed for one or more acoustic features that are extracted by theprosody analysis block 118. For example, the test utterance includes a set of phrases, one of which contains significant stress or accent or other emotion indicator from which a large shift in the fundamental frequency (FO), or pitch can be calculated. Other acoustic correlates such as amplitude and duration can also be calculated. This test utterance, as in the analogous case of the signal-to-noise ratio calibration of speech recognition routines, allows the system to automatically compute a calibration baseline for the emotion detector/modeler while taking into account other environmental variables.TABLE 1 Acoustic Feature Description F0 Fundamental frequency F0_MAX Maximum F0 F0_MIN Minimum F0 F0_MEAN Mean F0 F0_RANGE Difference between the highest F0 and lowest F0 F0_STDV Standard deviation in F0 F0_ABOVE Ratio of F0 100 ms from the median of F0compared to F0 in the 100 ms range previous RMS Amplitude RMS_MIN Minimum amplitude RMS_MAX Maximum amplitude RMS_RANGE Difference between the highest and lowest amplitudes RMS_STDV Standard deviation from the amplitude mean DUR Duration - i.e. maximum time of word duration; The word duration is preferably normalized by the number of syllables contained in that word DUR_MIN Word duration minimum DUR_MAX Word duration maximum DUR_MEAN Word duration mean DUR_STDV Word duration standard deviation (F0_RANGE) × Combination of above (DUR) (F0_RANGE) × Combination of above (RMS) × (DUR)
Parts of Speech (POS) Analysis - A NLQS system typically includes a parts-of-
speech module 121 to extract parts-of-speech from the utterance. In the present invention this same speech module is also used in a prosodic analysis. Further processing results in tagging and grouping of the different parts-of-speech. In the present invention this same routine is extended to detect a syntactic structure at the beginning and the end of the utterance so as to identify the completeness and incompleteness of the utterance and/or any other out-of-grammar words that indicate emotion state such as DOUBT. For instance the sentences: - “This shape has a larger number of.”
- “This shape has a larger number of sides than the slot.”
- The previous sentence ending in ‘of’, is incomplete indicating DOUBT, whereas the second sentence is complete and indicates CERTAINTY. Other examples will be apparent from the present teachings. Thus this additional POS analysis can be used to supplement a prosodic analysis. Those skilled in the art will appreciate that other POS features may be exploited to further determine syntax structures correlative with emotion states. In this fashion an emotion state can be preferably determined by evaluating both individual words (from a prosodic/POS analysis) and an entire sentence of words uttered by the user (POS analysis).
- Real-Time Classifier
- The parts-of-speech analysis from routine(s) 121 yields syntactic elements from which emotion cues can be derived. The acoustic analyzer routine(s) 118 in turn yields separate data values for the prosodic acoustic correlates which are also related to emotion cues. These two categories of data are then inputted to a
decision tree 125 where the patterns are extracted to estimate for the emotion state embedded in the of the speaker's utterance. - Again, the real-time classifier is preferably based on the Classification and Regression Tree (CART) procedure, a widely used decision tree-based approach for extracting and mining patterns from raw data. This procedure introduced by Breiman, Freidman, Olshen, Stone in 1984, is basically a flow chart or diagram that represents a classification system or model. The tree is structured as a sequence of simple questions, and the answers to these questions trace a path down the tree. The end point reached determines the classification or prediction made by the model.
- In the end the emotion cue data value output from
CART decision tree 125 can be in the form of a data variable suitable for inclusion within a SQL construct, such as illustrated in the aforementioned U.S. Pat. No. 6,165,172. The detected emotion state can also be used by an interactive agent to formulate a response, control dialog content and/or a dialog sequence, control visual feedback presented to a user, control non-verbal audio feedback such as selecting one or more audio recordings, etc., and as such are correlated/associated with different user emotion states. - In a distributed environment, the prosodic data is also preferably sent in a packet stream, which may or may not also include the extracted acoustic feature data for a speech recognition process, i.e., such as cepstral coefficients. Typically the prosodic data and acoustic feature data are packaged within a common data stream to be sent to the server, but it may be desirable to separate the two into different sessions depending on channel conditions and capabilities of a server. For example the latter may not include a prosody capability, and therefore emotion detection may need to be facilitated by a separate server device.
- Moreover in some instances the prosodic data and acoustic feature data can be transmitted using different priorities. For example, if for a particular application prosodic data is more critical, than computations for prosodic data can be accelerated and a higher priority given to packets of such data in a distributed environment. In some instances because of the nature of the data communicated, it may be desirable to format a prosodic data packet with different payload than a corresponding speech recognition data packet (i.e., such as an MFCC packet sent via an RTP protocol for example). Other examples will be apparent to those skilled in the art. Furthermore the emotion detection/prosodic analysis operations can be distributed across the client device and server device on a case-by-case basis to achieve a real-time performance, and configured during an initialization procedure (i.e., such as within an MRCP type protocol). An amount of prosodic data to be transferred to said server device can be determined on a case-by-case basis in accordance with one or more of the following parameters: a) computational capabilities of the respective devices; b) communications capability of a network coupling the respective devices; c) loading of said server device; d) a performance requirement of a speech recognition task associated with a user.
- The attached Appendix is taken from Applicant's provisional application referenced above.
- Part 1: Identification and Significance of the Innovation
- Introduction
- This project will yield a computer-based spoken language training system that will begin to approximate the benefits provided by a one-on-one tutor-student session. This system will decrease the costs of tutoring as well as help compensate for the lack of human (tutor) resources in a broad set of educational settings.
- If the next generation of computer-based training systems with spoken language interfaces is going to be successful, they must also provide a comfortable, satisfying and user-friendly environment. This project helps to approach the important goal of improving user experience so that the student experiences a satisfying, effective and enjoyable tutoring session comparable to that offered by one-on-one human tutors.
- A successful training system will be able to tap into a large commercial training market as well as, adults with retraining needs that result from technology or process changes in the workplace, other employment dislocations or career changes, students with learning disabilities and remedial needs and students who are working on advanced topics beyond the scope of assistance available in their classroom.
- It is widely accepted that students achieve large gains in learning when they receive good one-on-one human tutoring. [Cohen et al, 1982]. One of the success factors of human tutors is their ability to use prosodic information embedded in a students' unconstrained speech in order to draw inferences about the student's understanding of the lesson as it progresses, and to structure the tutor/student dialog accordingly. Current intelligent tutoring systems1 are largely text-based, and thus lack the capability to use both semantic understanding and prosodic cues to fully interpret the spoken words contained in the student's dialog. Recently, researchers have demonstrated that spoken language interfaces, with semantic understanding, can be implemented with computer-based tutoring systems. Furthermore, the prosodic information contained in speech can be extracted automatically and used to assist the computer in guiding the tutoring dialog. Accordingly, enhanced dialog system capabilities, incorporating semantic and prosodic understanding, will here be designed and constructed to enable an intelligent tutoring system to simulate the responsive and efficient techniques associated with good one-on-one human tutoring and to thereby approach an optimal learning environment.
1The mnemonic ITS is used in the literature to signify an Intelligent Tutoring System. In the context of this proposal, ITS is used to refer to our proposed system—an Interactive Training System. We also wish to clarify that training involves tutoring on a one-on-one basis or in a classroom setting.
- What We Will Do
- In the course of Phase I and Phase II the proposed research will focus on three key strategies to investigate the hypothesis that a pure text-based computer-based training can be improved to levels approaching the best one-on-one human tutors by:
-
- Investigating how to extract conversational cues from a student's dialog using prosodic information, and apply data derived from these cues to the dialog manager to recognize misconceptions and clarify issues that the student has with the lesson.
- Developing an architecture for Speaktomi's Spoken Language Interactive Training System that combines spoken language interfaces, and real-time prosody modeling together with a dialog manager implemented with cognitive reasoning agents. The spoken language interface is a stable, widely deployed speech recognition engine, designed and targeted for educational applications. Cognitive reasoning models implemented within cognitive reasoning agents will be used to create models of tutors that can be embedded in the interactive learning environment which will monitor and assess the student's performance, coach and guide the student as needed, and keep a record of what knowledge or skills the student has demonstrated and areas where there is need for improvement.
- Testing the rudimentary system on previously developed corpora.
Part 2: Background & Phase|Technical Objectives
Background
- The goal of computer-based tutoring environments has been to create an optimum educational tool which emulates the methods of good human tutors. One-on-one human tutoring has repeatedly been shown to be more effective than other types of instruction. An analysis of 65 independent evaluations indicated that one-on-one tutoring raised student's performance by 0.4 standard deviation units [Cohen et al, 1982]. Other studies report that the average student tutored by a ‘good’ one-on-one tutor scored 2.0 standard deviation units above average students receiving standard class instruction [Bloom, 1984]. Cognitive psychologists believe that important, “deep” learning occurs when students encounter obstacles and work around them, explaining to themselves what worked and what did not work, and how the new information fits in with what they already know [Chi et al, 1989; Chi et al, 1994; VanLeyn, 1990].
- The challenge then for a computer-based tutoring system such as the one proposed is to emulate the desirable human one-on-one tutoring environment. Human one-on-one tutors interact with students via natural language—they prompt to construct knowledge, give explanations, and assess the student's understanding of the lesson. Most importantly, tutors give and receive additional linguistic cues during the lesson about how the dialogue is progressing. The cues received by the one-on-one tutor give the tutor information about the student's understanding of the material and allow the tutor to determine when a tutoring strategy is working or is not working. Natural language therefore is an important modality for the student/one-on-one tutor environment.
- A further important requirement is that the system must not ignore signs of confusion or misconception as the presentation evolves. This means that the interactive training system, like its human counterpart, must detect and understand cues contained in the student's dialogue and be able to alter or tailor its response and its tutoring strategies. Published results on cognition and spoken dialog indicate that human tutors rely on subtle content present in the student's dialog to guide their participation in a meaningful, enjoyable and effective dialogue, thus augmenting the student's learning performance. Other researchers such as Tsukahara and Ward [Tsukahara & Ward, 2001; Ward & Tsukahara, 2003] have described systems which use the user's internal state such as feelings of confidence, confusion and pleasure—as expressed and inferred from the prosody2 of the user's utterances and the context, and the use of this information to select the most appropriate acknowledgement form at each moment3. Thus, in addition to the key challenge of correctly understanding the student's dialog—as transcribed by the speech-recognizer-based spoken language interface—the interactive training system in its goal of emulating a human tutor must identify the emotional states contained in the student's dialogue and apply it to the dialogue management in such a way that the specific task at hand—reinforcement of a concept, or spotting a misconception, or evaluation of progress—can be accomplished in real-time during the course of the dialog. This research proposal for an ITS is based in part on the assumptions regarding the acoustic-prosodic characteristics of speech and published results [Litman, Forbes-Riley, 2004; Shriberg, 1998; Rosalio et al, 1999] that emotion cues contained in the human speech can be extracted and applied in a data-driven manner to the dialog manager—the subsystem that controls the dialog between the student and system. This research proposal builds on the considerable work in the area of detecting emotion states in natural human-computer dialog. Silipo & Greenberg [Sillipo,Greenberg, 2000] found that amplitude and duration are the primary acoustic parameters associated with patterns of stress-related cues. More recently Litman [Litman, Forbes-Riley, 2004] reports that acoustic-prosodic and lexical features can be used to identify and predict student emotion in computer-human tutoring dialogs. They identify a simple two-way (emotion/non-emotion) and three-way classification schemes (negative/neutral/positive). Additionally, other researchers [Holzapfel et al, 2002] have also explored the use of emotions for dialog management strategies that assist in minimizing the misunderstanding of the user and thus improve user acceptance.
2The term prosody is generally used to refer to aspects of a sentence's pronunciation which are not described by the sequence of phones derived from the lexicon. It includes the whole class of variations in voice pitch and intensity that have linguistic functions.
3Ward and Tsukahara, “A Study in Responsiveness in Spoken Dialog”, International Journal of Human-Computer Studies, March 2003. Tsukahara and Ward, “Responding to Subtle, Fleeting Changes in the User's Internal State”, SIGCHI, March 2001, Seattle, Wash.
- Vision and Research Goals
- The vision that guides this research proposal is the goal of creating a spoken language interactive training system that mimics and captures the strategies of a one-on-one human tutor, because learning gains have been shown to be high for students tutored in this fashion. The dialog manager for the proposed interactive training system must therefore be designed to accommodate the unique requirements specific to the tutoring domain. This design stands in contrast to currently existing dialog management strategies for an information-type domain which have discourse plans that are either elaborate or based on form-filling or finite state machine approaches. The dialog manager for the proposed ITS must combine low-level responsive dialogue activities with its high level educational plan. Put another way, the dialog manager for our ITS must interweave high-level tutorial strategy with on-the-fly adaptive planning. The detection of emotion in the student's utterances is important for the tutorial domain because the detection of any negative emotion - such as confusion, boredom, irritation, intimidation, or conversely positive state such as confidence, enthusiasm in the student can allow the system to provide a more appropriate response, thus better emulating the human one-on-one tutor environment [Forbes-Riley, Litman, 2004].
- For our proposed research we will use a speech corpus that contains emotion-related utterances such as the one available from the Oregon Graduate Institute The main thrust of this research proposal is the development of a spoken language interactive training system with a unified architecture which combines spoken language interfaces, real-time emotion detection, cognitive-based reasoning agents and a dialog manager. The architecture will be tailored for the special requirements of the tutoring domain with a dialog manager that enables smooth and robust conversational dialogs between the student and tutor, while allowing for better understanding of the student during the student-system dialog. What is new and innovative to this architecture is:
- (1) prosody-based modeling of the student's dialog, and its use in managing the dialog so as to recognize misconceptions and clarify issues the student has with the lesson;
- (2) the innovative use of multiple cognitive agents—each cognitive-based agent will be assigned a task or function such as assessing the student's performance, or creating a profile or characterization of the student before and after the lesson;
- (3) the use of spoken language interfaces and the flexibility of the natural language modality that makes it possible to extract additional information contained in prosody of speech;
- (4) an architecture that is tailored to the special requirements of the tutoring domain; and
- (5) the incorporation of an application programming interface for compatibility with two widely deployed and popular software products used in the educational and multimedia market—Authorware and Director respectively, so as to accelerate adoption of the Speaktomi spoken language interactive training system in the targeted commercial market segment.
- One of the overarching goals behind this research is embodied in our design approach which emphasizes the use of rapid and flexible prototyping natural language tools and environments such as the CARMEL4 language understanding framework and the Open Agent Architecture environment. The CARMEL framework, facilitates the rapid development of deep sentence-level language understanding interfaces required by the ITS without requiring that we address complex computational linguistic aspects, while being flexible enough to allow the developer to be involved in these issues. Similarly the OAA5 environment allows flexible and more rapid prototyping and debugging than alternate schemes. This proposal anticipates that the approach taken will save time and will allow us to focus on issues such as the ‘tutoring domain’-specific architectural issues, speech recognition imperfections and other key system integration issues.
4CARMEL=Core component for Assessing the Meaning of Explanatory Language is a language understanding framework for intelligent developed by the CIRCLE group—a joint center between Univ. of Pittsburgh and CMU and funded by the National Science Foundation.
5OAA=Open Agent Architecture is an open framework developed by the Stanford Research Institute for integrating the various components that comprise a software system such as the proposed spoken language ITS. Specifically it is a piece of middleware that supports C++, Java, Lisp and Prolog and enables one to rapidly prototype components into a system.
- Challenges
- One of the key challenges is in the speech transcription process—i.e. the transcription of speech to text by the speech recognizer is not ideal or error free, and speech recognition errors that result from using even the best speech recognizer will give rise to misunderstandings and non-understandings by the system, thus leading to non-robust and brittle performance. A key goal of the proposed research is to develop indicators of speech recognition errors that lead to these misunderstanding and non-understanding events, and to develop strategies for handling errors of this kind so that the resulting system performance is as robust as possible. We recognize this issue and the implementation of this component of the work will be done in Phase II.
- Key Questions and Technical Objectives
- The time required to develop a Version 1.0 of the commercially-ready spoken language ITS is projected to span Phase I and Phase II. In Phase I, exploratory work will confirm or not confirm the technical and commercial feasibility of the system by answering the first three key questions. The implementation of a solution to the fourth question will be deferred to Phase II.
-
- 1. How do we extract the acoustic-prosodic cues embedded in the utterances of a typical tutoring speech corpora?
- 2. What reference architecture can be defined for the interactive training system to make it suitable for the tutoring domain, and combines spoken language interfaces, real-time prosody modeler, dialog manager and cognitive-based reasoning agents?
- 3. What can be done or incorporated in the design of this ITS to accelerate the product adoption in the commercial market?
- 4. What is the road map or plan for detecting speech recognition transcription errors, and strategies to compensate for problems that arise from such speech recognition errors?
- 5.
- Questions 1 will be answered fully and Question 2 partially by Objectives 1 and 2 below. The two technical objectives are:
- Objective 1: To implement an algorithm for real time prosody modeling based on the prosodic characteristics of speech in order to extract and classify acoustic-prosodic characteristics contained in the student's speech.
- We will develop techniques to model prosody characteristics from the corpus of a typical tutorial dialog. Speech, as a rich medium of communication, contains acoustic correlates such as pitch, duration, amplitude which are related to the speaker's emotion. The objective in Phase I will focus on developing techniques to extract such acoustic correlates related to two specific conditions—STRESS and NON-STRESS, to classify these conditions using machine learning algorithms with sufficient accuracy and then develop an algorithm for a real-time prosody modeling that can be implemented as a module of the ITS. The anticipated outcome of this objective will be a software algorithm which analyzes the student's dialog in real-time, and outputs data values corresponding to the prosody characteristics embedded in the student's speech. This algorithm will be extended in Phase II to cover additional emotion states and the data values used then in the operation of the dialog manager.
- Obective 2: To implement the front end of the ITS comprised of the Speech Recognition, Natural Language and the real-time prosody modeling module (developed in Objective 1), so that the emotional state detection algorithm can be tested in a system setting. This algorithm extracts acoustic-prosodic cues from the speech corpora, and maps these to data-driven values representing emotional states.
- The expected outcome of this objective at the end of Phase I is the prototype of the front-end of the proposed spoken language ITS architecture—i.e. the Speech Recognition, Emotion Detection and Natural Language modules. This front-end will be prototyped within the Open Agent Architecture (OAA) environment and will serve as an important step in proving the feasibility of interfacing the spoken language interface with real-time emotion detection and testing the algorithm developed in Objective 1. Additionally, these modules are important to the planned dialog management schemes for this tutoring domain. The dialog manager and other modules such as the text to speech synthesis agent and the speech error compensation strategies as well as questions 2 and 3 will be addressed in Phase II. In Phase II, Version 1.0 of the spoken language interactive training system will be completed. Additionally during Phase II, other tasks such as integrating an interface to the widely-used authoring tools, Authorware and Director, and testing the system using live subjects in real situations will be completed.
- Part 3: Phase I Research Plan
- Introduction
- The spoken language interactive training system that we propose to build over the course of the SBIR Phase I and Phase II effort serves both a long term objective as well as the immediate Phase I project objective. The immediate objectives for this Phase I component of the project are to automatically identify, classify and map in real-time a number of acoustic-prosodic cues to emotional states embedded in a typical student's dialog. These data-driven values corresponding to these states will then be used to assist in formulating the dialog strategies for the ITS. The long term objective of the research is to build a spoken language-based ITS system that incorporates dialog control strategies that also incorporate emotional cues contained in the utterances of the student's dialogue. Another key long term objective is to develop and incorporate error-handling strategies into the dialog manager to compensate for speech recognition errors that occur within the speech recognition transcription process. This key objective ensures that the dialog remains robust, stable and stays on track so that the user experience is productive, engaging and enjoyable.
- Specific Aims
- In Phase I, we will pursue the following two key objectives: (1) development of an algorithm for real-time prosody modeling based on the acoustic-prosodic characteristics of speech; (2) implementation of the front-end of this spoken language ITS—i.e. the Speech Recognition, Natural Language and the real-time prosody modeler.
- Background & Research Methodology
- Overview of the Reference Architecture
- The spoken language interactive training system uses traditional components required for implementing a spoken dialogue system. Spoken language systems are in general, complex frameworks involving the integration of several components such as speech recognition, speech synthesis, natural language understanding and dialog management as in an information retrieval application using spoken language interfaces. The representative functions of each component are:
-
- Speech Recognizer (SR)—receives the acoustic signal from the user and generates a text string or other representation containing the utterances most likely to have been pronounced.
- Natural Language Understanding—generates a particular natural language representation of the syntax and semantics of the text received from the speech recognizer.
- Dialogue Manager (DM)—the core of the system—it controls the interaction with the user and coordinates other components.
- Response Generator—produces the appropriate system replies using the information from the database.
- Speech Synthesis—constructs the acoustic form of the system replies produced by the response generator.
- The dialogue manager is the key component in dialog systems. Approaches such as Finite State Machines is not appropriate in an environment for dealing with unplanned events. FSM technology is usually found in limited domain environments. This DM must be an agent that monitors the execution of dialogue strategies and is able to change plans as unplanned events occur. In general, the dialog manager for a tutorial type domain must interweave high-level tutorial planning with adaptive on-the-fly plans. The environment for supporting such dialog management and control strategies must also be flexible enough to add agents that carry out tasks such as intention understanding and inference.
- The AutoTutor (domain: computer literacy), CIRCSIM (domain: Newtonian mechanics) and the ATLAS-ANDES (domain: circulatory system) are representative examples of ITS that have been implemented. Each of these systems utilize DM models that implement a combination of different strategies: for example, AutoTutor's DM is an adaptation of the form-filling approach to tutorial dialogue. It relies on a curriculum script, a sequence of topic formats, each of which contains a main focal question and an ideal answer.
- Speaktomi's proposed architecture for its spoken language ITS is based on a configuration of modular components functioning as software agents and adhering to the SRI Open Agent Architecture (OM)6 framework as shown in
FIG. 1a . OM allows rapid and flexible integration of software agents in a prototyping development environment. Because these components can be coded in different languages, and run on different platforms, the OAA framework is an ideal environment for rapid software prototyping and facilitates ease in adding or removing software components. The term agent refers to a software process that meets the conventions of the OAA framework, where communication between each agent using the Interagent Communication Language (ICL) is via a solvable—a specific query that can be solved by special agents. Each application agent as shown can be interfaced to an existing legacy application such as a speech recognition engine or a library via a wrapper that calls a pre-existing application programming interface (API). Meta-agents assist the facilitator agent in coordinating their activities. The FacilitatorAgent is a specialized server that is responsible for coordinating agent communications and cooperative problem solving. OM agents employ ICL via solvables to perform queries, execute actions, exchange information, set triggers and manipulate data in the agent community.
6The SRI Open Agent Architecture (OAA) is a framework for integrating the various components that comprise a spoken dialogue system such as the FASTER ITS. Specifically it is a piece of middleware that supports C++, Java, Lisp and Prolog and enables one to rapidly prototype components into a system.
-
FIG. 1b shows a high-level view of specific software agents that comprise the speech-enabled ITS. Although each agent is connected to a central hub or Facilitator, there is a functional hierarchy which describes each agent and the flow of messages between them as shown inFIG. 3 . This diagram illustrates the functional dependencies between the various blocks and the message interfaces between them. - The architecture will support the following software agents: user interface agents (microphone input and audio speaker output), the speech recognition agent, prosody modeler agent, natural language (NL) agent, dialog manager (DM) agent, synthesis agent, inference, history and knowledge base. This community of agents will be required for the full implementation of the ITS (Version 1.0) to be completed in Phase II.
- The following paragraphs describe the brief background of each agent:
- Objective 1: To Develop a Real-Time Algorithm that Builds a Dialog Prosody Model.
- This objective has two goals:
- 1. To develop an algorithm for detecting prosodic structure of dialog in real-time & with sufficiently reliable performance for use in an interactive training system.
- 2. To assess the effectiveness of the selected acoustic features for prosodic cue detection. Because of the connection between learning and social interaction, we are motivated to enhance the capabilities and performance of the ITS by detecting interactional characteristics contained in speech in real-time, and then use data derived from the detected interactional dialog model to tailor the response of the system so that the system takes into account the student's interactional characteristics during the tutoring session.
- Para-linguistic states including emotion, attention, motivation, interest level, degree of comprehension, degree of interactivity, and responsiveness are integral determinants of prosodic aspects of human speech, and prosody is the important mechanism through which the speaker's emotional and other states are expressed. Hence the prosodic information contained in speech is important if we want to ascertain these qualities in the speaker [cf. Shriberg, 1998]. Prosody is a general term for those aspects of speech that span groups of syllables [Par 86], and we incorporate in the concept dialog prosody: characteristics spanning not just one but multiple conversational turns. Prosody conveys information between the speaker and the listener on several layers. Prosodic features spanning several speech units that are larger than phonemes—i.e. syllables, words and turns—can be built up incrementally from characteristics of smaller units. Thus the prosody of phrases incorporates the characteristics of the syllables that make it up; the linguistic stress levels of the syllables, their syllable-length melodic characteristics, can be combined to form phrase-level prosodic structures, similarly smaller phrases can be combined to form utterance-level models, and utterance sequences along with timing and other relationships between turns are combined into a dialog level prosodic model. We will proceed incrementally from bottom up in this work, keeping in mind the higher level modeling structures which are to be developed. The first level of post-speech-recognition modeling, which is our objective in this Phase I proposal, incorporates syllabification or grouping of phones into syllables, syllable-level pitch contour characterization, and syllable stress level classification. For this objective, in addition to dictionary entries for syllabification and lexical stress levels we will measure the three key speech signal acoustic features—pitch, duration and energy. Duration of segments, syllables, and phrases, fundamental frequency—Fo—the acoustic correlate of pitch, and to a lesser degree, energy or amplitude, the correlate of loudness, are the observational basis of prosody in human speech. The variation of pitch over a sentence, also called intonation, is used in spoken language to shape sentences and give additional meaning or emotion to the verbal message during human communication [Mom02, Abe 01]. Simplifying for analytic purposes, pitch in English can be defined at four levels—low, mid, high or extra high; and having three terminal contours—fading, rising, or sustained. Fundamental frequency measurements will be used to characterize syllable or sub-syllable level pitch contours with this vocabulary. These characterizations along with the other acoustic measures of amplitude and duration, along with pitch range, will be used to classify syllables by phrasal stress levels. To facilitate the successful outcome of the objective, the research will be broken out into the following four key sections:
-
- 1. Preparation of the corpus.
- 2. Extraction of the acoustic correlates from which prosodic cues can be derived.
- 3. Classification of the acoustic correlates using machine learning algorithms & verification with the manually annotated corpus.
- 4. Development of the real-time prosodic modeling algorithm.
The desired outcome for this research objective is an algorithm implemented in software that extracts, classifies and verifies in real-time the prosodic structure contained in the spoken dialog. What follows is a description of the research methodology for each of the above sections.
Preparation of the Corpus
- Before the extraction can begin, we will acquire a speech corpus in the form of a database or a corpus containing a set of files from one of several recognized linguistic repositories7. The corpus sourced from the Oregon Graduate Institute is supplied with a phonetic transcription file with each speech file. If the sourced files are not annotated, the files will be manually marked or linguistically annotated8 in terms of prosodic stress by a pair of linguistically trained individuals. In order to provide more robustness for the experimental task, each of two subsets of files will be annotated by a manual transcriber. In addition, we will use a Jack-knifing training and testing procedure. With this procedure, two thirds of the files used as the training set and one third of the files used as the test set will be cyclically exchanged so that three different pairs of training and test sets are created for the entire research measurements. Before going to the next step we will compare the annotations made by each transcriber to ascertain the agreement between the two transcribers in annotating the files that are common to each of the two subsets of files. We will initially aim to annotate syllables into two categories—STRESS and UNSTRESSED9. Once we develop and confirm experimental procedures for classifying these two levels, we can proceed to prosodic modeling of larger units. At the level of turn-taking our experimental procedures will be provide information that would enable a tutoring system to infer paralinguistic characteristics of the dialog participants. The possibility is raised of emotion classification as in the work of Litman [Litman et al, 2004]—for example, Positive (confident, enthusiastic); Negative (confused, bored, uncertain) and Neutral (neither Positive or Negative).
7Linguistic Data Consortium, Univ. of Pennsylvania; Oregon Graduate Institute (OGI); and the Berkeley International Computer Science Institute (ICSI).
8ToBI—tone and break indices—is a method used in linguistics to annotate English utterances with intonation patterns & other aspects of the prosody.
9Although many levels of prosodic stress are claimed to exist by some phonologists, at most three levels of stress can be detected in speech recordings by trained linguists with even moderate reliability—primary stress, absence of stress and weak stress. To achieve good reliability, at most two levels can be used [Veatch 1991].
- The KAPPA Coefficient K10 will be used as the metric that measures the pair wise agreement between the two transcribers. This metric represents the ratio of the proportion of times that the transcribers agreed to the maximum proportion of times that the transcribers could have agreed. Using the criteria established by Carletta [1996], Kappa values greater than 0.8 imply good reproducibility, while those within the range of 0.67-0.8 imply that firm conclusions cannot be made regarding the labeling agreement between the transcribers.
10Kappa Coefficient, K=[P(A)−P(E)]/1−P(E) where P(A), observed agreement, represents the proportion of times the transcribers agree, and P(E), agreement expected by chance.
- Extraction of Acoustic Features
- In addition to the segmentally time-stamped transcript of the dialog provided with the corpus or provided in live usage by the speech recognizer, we will extract the following primary acoustic features—pitch, duration and energy. For this latter purpose we will use the PRAAT11 software, a widely available and accurate speech analysis tool to extract the following measures for each syllabic unit:
11PRAAT web page: http://www.praat.org. The PRAAT speech analysis tool incorporates an accurate fundamental frequency algorithm developed by Professor Boersma of the University of Amsterdam, Holland.
-
- 1. Pitch frequency and related correlates: F0 12-maximum (F0—MAX), minimum (F0—MIN), mean (F0—MEAN) & standard deviation (F0—STDV), difference between highest and lowest (F0—RANGE), ratio of those above center of F0 range to those below the center (F0—ABOVE).
12F0=fundamental frequency also called pitch, is the periodicity of voiced speech reflecting the rate of vibration of the vocal folds.
- 2. Duration and related correlates: duration (DUR), maximum duration (DUR_MAX), minimum duration (DUR_MIN), mean duration (DUR_MEAN), standard deviation from duration mean (DUR_STDV).
- 3. Amplitude and related correlates: Amplitude (RMS), Minimum amplitude (RMS_MIN), Maximum amplitude (RMS_MAX), mean amplitude (RMS_MEAN), difference between the highest and lowest amplitudes (RMS_RANGE), standard deviation from amplitude mean (RMS_STDV).
- 1. Pitch frequency and related correlates: F0 12-maximum (F0—MAX), minimum (F0—MIN), mean (F0—MEAN) & standard deviation (F0—STDV), difference between highest and lowest (F0—RANGE), ratio of those above center of F0 range to those below the center (F0—ABOVE).
- Combinations of the above primary features such as [F0—RANGE×DUR] or [F0—RANGE×RMS×DUR] will be calculated and used in the analysis.
- Classification of Acoustic Features for Prosodic Modeling
- After the above acoustic features are extracted for each syllable in each voice file in each subset, we will employ machine learning algorithms to classify the acoustic data and map acoustic correlates to the prosodic structure, specifically stress levels. The main goal of this part of the experiment is in using machine learning algorithms to automatically determine which acoustic-prosodic features are the most informative in identifying and mapping the two stress levels from these features. Machine learning has been established to be a valuable tool for data exploration for a number of data classification problems in fields such as linguistics. Some of these schemes [Witten & Frank, Data Mining, Morgan Kaufmann, 2000] are more efficient or better with certain types of data than others, and some are more suited for classifying certain data distributions that have many subtle features. Since we do not know the structure of the data and the relevance or irrelevance of some of the features, it behooves us to attempt to classify the extracted data with more than a few learning schemes. We will use a representative number of these data-driven algorithms such as boosting (AdaBoost), classification and regression trees (CART), artificial neural networks (ANN), support vector machines (SVM) and nearest neighbor methods. For each machine learning algorithm such as the ANN, there will be a training phase for example—the input vector will consist of four parameters—duration, amplitude, average pitch and pitch range, and the output will consist of two normal units—one for STRESS and the other for UNSTRESSED. After the network is trained, the acoustic measurements from the test files contained in one third of the subset will be inputted to the ANN. The implementation of the machine learning classifier-based experiments will be performed within the WEKA13 machine learning software environment and with the Stuttgart Neural Network Simulator (SNNS)14. All of the software that will be used in this objective—PRAAT, WEKA and SNNS is already installed and working on our workstations. The WEKA environment is a flexible environment—it provides the capability to do cross validation and comparisons between the various machine learning schemes—for example, we will be able to compare the classifications generated of each machine learning algorithm such as K-Nearest Neighbors, AdaBoost, CART decision trees, and the rule-based RIPPER (Repeated Incremental Pruning to Produce Error Reduction) and to generate optimum parameters for the real-time prosodic modeling algorithm based on acoustic feature importance, acoustic feature usage and accuracy rate. In this way we will assess which classification scheme most accurately predicts the prosodic models. Confusion matrices will then be used to represent and compare the recognition accuracy as a percentage of stress levels and the two-way classifications generated by the different classifiers.
13WEKA software—a public domain and widely used data mining and machine learning software available from the University of Waikato, New Zealand, http://www.cs.waikato.ac.nz/ml/
14SNNS software: http://www-ra.informatik.uni-tuebingen.de/SNNS/
- Development of the Real-Time Prosodic Modeling Algorithm
- At this point in the analysis, we will discover by experiment which combination of acoustic features—amplitude, pitch, pitch range or others derived from the base set, will most accurately classify syllables into STRESS or NON-STRESS categories. Specifically, we will develop an algorithm based on the combination of measured acoustic features such as amplitude, average pitch, pitch range, duration. As shown in
FIG. 1 , the result is an evidence variable that represents the combination of the above correlates that leads to a local maximum for stress level classification accuracy. Additionally, receiver operator characteristic curves will be plotted using the key acoustic parameters to ascertain which acoustic parameter or combination of parameters play the dominant role in recognizing the stress level. We will also use this evidence variable, combining acoustic features to formulate an algorithm from which the prosodic structure can be detected in real-time. - Objective 2: To Implement the Front End of the Spoken Language ITS (Comprised of the Speech Recognition, Natural Language and the Prosody Modeler (Developed in Objective 1) Using the Rapid Prototyping Open Agent Architecture (OAA) Environment.
- The expected outcome of this objective at the end of Phase I is a prototype of the front-end of the ITS architecture defined in Objective 2—i.e. the speech recognition, natural language and prosody modeler modules as shown in
FIG. 3 . A second task will be to prepare a road-map and plan for Phase II. - The integration of this front-end of the system will serve as an important step in proving the feasibility of interfacing the spoken language interface with real-time emotion detection and would be critical to the strategy for the dialog management for this tutoring domain.
- Speaktomi will implement the front end of the speech-based ITS within the reference architecture discussed previously and based on a configuration of modular components functioning as software agents that adhere to a software framework called the Open Agent Architecture (OAA)15.
15The SRI Open Agent Architecture (OAA) is a framework for integrating the various components that comprise a spoken dialogue system such as the FASTER ITS. Specifically it is a piece of middleware that supports C++, Java, Lisp and Prolog and enables one to rapidly prototype components into a system.
- Research Plan: The front end for the ITS will be implemented by creating software agents for each of the Speech Recognition, Emotion Detector and Natural Language software modules. We will employ the SRI Eduspeak engine as the speech recognition module. The emotion detector will be the software application as described under Objective 1. For the Natural Language module we will use the CARMEL Workbench for language understanding.
- Speech Recognition Agent
- The OAA-based speech recognition agent will be created by writing a software wrapper for the SRI EduSpeak speech recognition engine. This engine incorporates specific features required for education and tutoring applications such pronunciation grading and a broad array of interfaces to multimedia development tools and languages—Director, Authorware, Flash, Active X, Java and C/C++. The EduSpeak SR engine works for adult and child voices as well as native and non-native speakers. The key performance enablers of this SR engine are: high speech recognition accuracy, speaker-independent recognition, requires no user training, has a small, scalable footprint dependent on vocabulary requirements, supports unlimited-size dynamically loadable grammars, and supports statistical language models (SLM). This last feature is of importance since SLMs can be exploited to provide the broadest speech recognition dialog coverage. Optionally, if dialogs are written as finite state grammars, the UNIANCE compiler [Bos16] can be used to add a general semantic component to the Grammar Specification Language (GSL) before the grammar is compiled to a finite state machine needed for the language mode. In this way, the speech recognition engine can provide an output that is a syntactic or semantic representation of the student's utterance and be directly used with the dialog manager.
16Bos, J., Compilation of Unification Grammars with Compositional Semantics to Speech Recognition Packages, COLING 2002. Proceedings of the 19th International Conference on Computational Linguistics, 106-112. Taipei, Taiwan.
- The research methodology and representative sub-tasks required to build and test the speech recognition interface and associated grammar development for the ITS are to:
-
- Acquire speech recognition engine license from SRI.
- Identify syntactical constructs of GSL and the elements in the ITS dialog design that drive the grammar.
- Build statistical-based grammars so as to improve accuracy and compatibility with the dialog manager; Create statistical grammar models (SLM) using SRILM (publicly available from SRI) from a dialog training corpus.
- Train SLM using SLM.EXE or other tools.
Compare Speech Recognition and Sialog Performance Using Finite State and Statistical Language Models: - Build data sets for standalone speech recognition tests.
- [The option of investigating the use of the UNIANCE compiler with the EduSpeak speech engine for providing a semantic output representation is not required in Phase I, but is a valuable option during Phase II when interfacing with the dialog manager.]
- Write wrapper to transform engine to OAA agent.
- Configure agent to be part of the community of OAA agents.
- Once configured and debugged, carry out in-system tests to calculate sentence-level accuracy using the standard US NIST FOM metric: % Correct=H/N×100%, and the Accuracy=[H−I]/N×100% where H=# of correct labels, S=# of substitutions, I=# of insertions and N=# of Labels. Other speech recognition error metrics include: Sentence Error=H/N×100% where H is the number of sentences with totally correct transcriptions, or with totally correct semantic interpretations; and Word Error=[Insertion+Deletion+Substitutions/NumWords]×100%.
Emotion Detector Agent
- We will implement the algorithm of the emotional state detector developed in Objective 1 in C++ Software code. Within the scope of Objective 1, the functionality of this agent will be tested as a standalone unit, and assuming that the standalone implementation meets the specifications and technical requirements we will convert it to conform to an agent running within the OAA environment and proceed to integrate it with the speech recognition agent and the natural language agent.
- Natural Language Agent
- We will source the software for the CARMEL framework and follow the procedure to convert it to an OAA agent. Similarly, we will convert the CARMEL deep language understanding framework to a software agent running within the OAA environment.
- The following describes the steps that will be followed as we assemble and run the community of OAA agents for speech recognition, emotion detection and natural language functions:
- We expect that by fully achieving the goals in the objectives as described above, we will have laid a solid foundation going into Phase 2. The final task will be to develop a plan and road map for the full implementation of the architecture for an intelligent tutoring system having the specifications and requirements described in this proposal.
- Part 5. Commercial Potential
- The Problem
- Demand for knowledge sharing and learning in the U.S. has increased due to several factors including:
-
- Competition from an increasingly skilled global workforce.
- Virtualization and outsourcing of highly skilled projects and services to more cost competitive—i.e. lower cost, human resources in areas outside the U.S.
- Increased technological complexity in the workplace
- Greater collaboration between businesses and their partners requires that increased knowledge and learning be brought to not only an internal audience but to external audiences as well.
- The acceptance of e-Learning systems which have been effective in providing training but have not yet achieved the improvement in performance provided by human tutors.
- A student can access the training at a time convenient to him/her.
- There are some key problems that must be solved before computer-based learning systems are fully accepted and able to penetrate the training and tutoring market. The first and most important is to provide a more user friendly way to access this training and learning content. Currently most learning systems have content developed and deployed with very little interactivity. Speaktomi aims to enhance this user system interaction so as to be more intuitive for less experienced workers.
- The Opportunity
- Speaktomi's unique technology is critical to the next stage of e-Learning and computer based training tools. The leaders in the e-Learning provider market such as IBM, Docent, WBT and Saba Software are seeing increasing traction in this space mostly through their deployment of Learning Management Systems or LMS which store learning content. The next wave of innovation in the space is improving the process for the content creation and improving the ease and effectiveness of student interaction. The critical need to improve content is to provide the right kinds of tools for building learning environments that are easier to deploy and easier to use. Speaktomi's technology, by supporting voice interaction by the student with the e-Learning content, provides the critical ease of use platform that e-Learning tools developers need to make their systems more user-friendly and easier to interact with. As content creators are able to more intuitively gauge student understanding and concern through a voice interface, it will ease their conceptual workload in creating more engaging content that will not have to create exhaustive cases to gauge user feedback on content that has been presented.
- Speaktomi's platform for improving user interaction will allow educational content tool developers such as Macromedia to offer a wider array of modes of interaction to content developers and reduce the cost of creating engaging content which is one of the major concerns in the emerging e-Learning space. Gartner19 has found that 74% of organizations that create content for e-Learning are spending a greater amount on content creation than before and 37% believe that the cost of delivering the content is greater than their traditional methods. Improvement in tools and effectiveness of e-Learning results are critical to continue to drive the market. Speaktomi will provide the core technology for human interaction to make this possible.
19“Academic E-Learning Must Confront Content Development Costs”, Gartner study April 2003
- The Market
- The e-Learning market is poised for explosive growth through 2005. The global e-Learning market was projected to grow to approximately $4.2 billion. By 2005, it will hit approximately $33.6 billion. e-Learning is still a relatively small part of the worldwide training market (estimated at more than $100 billion), but by the middle of this decade, it will make up almost one-third of all training deployed. Larger numbers of enterprises recognize that e-Learning is an obvious benefit in their technology infrastructure. Just as most e-mail projects were never cost justified, e-Learning will become a standard way of deploying knowledge transfer programs20.
20 “E-Learning in 2002: Growth, Mergers, Mainstream Adoption”, Gartner study December 2001
- The realities of an explosive growth in e-Learning within companies has put more pressure on companies to provide content that is more accessible to a wider array of their staff members. Some 63 percent of all training in corporations from external providers (none e-Learning) is for new software applications that are critical for job functions. It is imperative that e-Learning tools and software developers provide a mechanism to allow better interaction with class participants. It is this market for improved tools and software that Speaktomi will seek to penetrate. Much as the current crop of speech recognition systems have been used as platforms for developing a broad array of customer service applications over the phone, Speaktomi will provide a platform to software and tools developers so that they have the technology to provide voiced-enabled learning for their e-Learning software products.
- Speaktomi seeks not only to provide embedded technology to the corporate training software providers, but also to provide this technology for e-Learning for the U.S. education and training market eventually which had an overall market size of $772 billion in 2000 and a growth rate of over 9%. While speech technology may be considered a small component of a training solution, it is a critical user interface and interaction component that is extremely valuable. The size of this addressable market for Speaktomi is conservatively estimated to be $90 million. and $2.3 billion for the wider educational market. Clearly there is a substantial opportunity for a company focused on speech recognition and intelligent learning in both the corporate and wider educational e-Learning business.
- The Product
- The focus of our investigation is to implement an ITS architecture which addresses the issue of student understanding, so as to raise the level of performance by 1 to 2 standard deviation units. This level of tutorial performance would allow our system to be adopted by more users and to be used more effectively in the e-learning market. Additionally, by interfacing our system to Macromedia's widely-used authoring tools—Authorware and Director—our spoken language ITS will provide a direct and effective mechanism whereby the technology could be rapidly adopted by the existing educational customer base. Most importantly, this programming interface will allow legacy educational content to be accessed by the ITS; and in the future be extended to other commercial educational platforms and tools. The resulting benefits that would accrue include the features of an advanced intelligent training system that significantly raise the students' performance.
- Competition
- The market for e-Learning software is just evolving and is currently led by six major companies: Docent and Click2learn (now SumTotal), Saba Software, IBM, Pathlore and WBT systems. Other companies include Sun Microsystems, IBM, Siebel, SAP, PeopleSoft, KnowledgePlanet, THINQ, Plateau. Currently these providers have focused on the software for creating and managing learning content rather than featured technologies for improving the experience of students in a training environment. The main competitive thrust will come from companies already entrenched in the embedded speech technology for telephony. The leaders in this space are Nuance Communications, Speechworks/Scansoft and IBM and 5-10 others. It is quite likely that some of these companies will offer competitive voice-enabled e-Learning products. Clearly our proposed technology will substantially differentiate us in the tutoring and learning environments where the interactive process with students is extremely important. Also it is highly likely that these companies will also license our technology for deployment in telephony applications thus providing another channel for Speaktomi to sell products.
- Business Model
- The business model for marketing and selling Speaktomi's technology will be based on the following:
-
- Focus on generating revenue through technology licensing
- The development of an interface with Macromedia.
- Start by winning customer acceptance and build market penetration within the existing corporate/education/eLearning markets.
- Replicate strategy and build relationships with other key software vendors, schools, training institutions such as Kaplan, Educational Testing Service (ETS), Thomson and other training companies—key business strategy will be licensing.
- Collaborate with key technical partners such as SRI International and CHI Systems to leverage advanced technology for driving the development of sophisticated interactive training systems.
- Our focus on a license-based, embedded technology is important as our first point of entry. As the company grows, Speaktomi will focus on enhancing its relationship with content creators and providing services to these creators so that they may better use speech technologies in their learning/tutoring applications. In the long run, Speaktomi intends to maintain core competence in automated speech learning environment providing core technology, consulting services and eventually outsource speech-enabled courseware development. The initial plan for Speaktomi is to work with e-Learning educational authoring tools providers to integrate its technology into their platforms. License revenue will be focused on between 2-5% of the ASP of the finished product or tool—client based tools and support, and server-based products for the corporate environment with competitive pricing.
-
- American Society for Trainers and Development (ASTD), A Vision for E-Learning for America's Workforce, referencing Moe, Michael, and Henry Blodgett, The Knowledge Web, Merrill Lynch & Co., Global Securities Research & Economics Group, 2000.
- Ang J., Dhillon R., Krupski A., Shriberg E., and Stolcke A., Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog, ICSLP-2002, Denver, Colo., USA, September 2002
- Ang J., Prosodic Cues For Emotion Recognition In Communicator Dialogs, M.S. Thesis, University of California at Berkeley, December 2002.
- Bahl, L. R., Jeninek, F., Mercer, R. L., A maximum likelihood approach to continuous speech recognition, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-5: 179-190, 1983.
- Baker, Collin F., Fillmore, Charles J., and Lowe, John B. (1998): The Berkeley FrameNet project. In Proceedings of the COLING-ACL, Montreal, Canada.
- Baker, J. H., The Dragon system—An Overview, IEEE Trans. on ASSP Proc. ASSP-23(1): 24-29, February 1975.
- Baum L. E., An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities 3: 1- 8, 1972.
- Baum, L. E., Petrie, T., Statistical inference for probabilistic functions for finite state Markov chains, Ann. Math. Stat., 37: 1554-1563, 1966.
- Bennett, C. et al, Building VoiceXML-based Applications, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Bennett, C., Font Llitjos, A., Shriver, S., Rudnicky, A., Black, A., Building VoiceXML-Based Applications, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Boyce, S. (2000). Natural Spoken Dialogue Systems for Telephony Applications. Communications of the ACM., Vol. 43, No. 9, pp. 29-34.
- Business Week On-line, Web Training Explodes, May 22, 2000
- Chi, M. T. H., Slofta, J. D., & de Leeuw, N. (1994). From things to processes: A theory of conceptual change for learning science concepts, Learning and Instruction, 4, 27-43.
- Classroom Lessons: Integrating Cognitive Theory and Classroom Practice (pp. 51-74). Cambridge: MIT
- Collins M., Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain, July 1997.
- de Kleer, J. & Brown, J. S. (1984), A qualititative physics based on confluences, Artificial Intelligence, 24, 7-83.
- Docio-Ferandez, L., Garcia-Mateo, C., Distributed Speech Recognition Over IP Networks on the Aurora 3 Database, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Education, 1, 205-221.
- Ferandez and Garcia-Mateo, Distributed Speech Recognition over IP networks on the Aurora 3 Database, ICSLP-2002 Proceedings, Denver, Colo., USA.
- Ferguson, J. D., Hidden Markov Analysis: An Introduction, in Hidden Markov Models for Speech, Institute of Defense Analyses, Princeton, N.J. 1980.
- Fillmore, C. J. 1971. ‘Some problems for case grammar’. In: O'Brien, R. J. (ed.) Report of the 22nd Annual Round Table Meeting on Linguistics and Language Studies. Washington: Georgetown UP. 35-56.
- Fillmore, Charles J. (1976): Frame semantics and the nature of language; Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, Volume 280 (pp. 20-32).
- Finscheidt, T., Aalburg, S., Stan, S., Beaugeant, C., Network-Based vs.Distributed Speech Recognition in Adaptive Multi-Rate Wireless Systems, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Forbes, Master of the Knowledge Universe, Sep. 10, 2001
- Forbes, Special E-Learning Section, referencing Corporate E-Learning: Exploring a New Frontier by Hambrecht, W.R. & Company, March 2000.
- FrameNet: Theory and Practice. Christopher R. Johnson et al, http://www.icsi.berkeley.edu/˜framenet/book/book.html
- FRG: Institute for Science Education.
- Gildea D and Jurafsky D., 2002. Automatic Labeling of Semantic Roles. Computational Linguistics 28:3, 245-288.
- Graesser, A., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R., & the Tutoring Research Group (2000). AutoTutor: A simulation of a human tutor, Journal of Cognitive Systems Research, 1,35-51.
- Grosz, B., and Sidner, C., Attention, intention and the structure of discourse, Computational Linguistics, 12(3), 1986.
- Guinn, C., & Montoya, R. (1997), Natural Language Processing in Virtual Reality Training Environments, Proceedings of the 19th Interservice/Industry Training Systems and Education Conference (I/ITSEC '97), Orlando, Fla.
- Guinn, C., & Montoya, R. (1997). Natural Language Processing in Virtual Reality Training Environments, Proceedings of the 19th Interservice/Industry Training Systems and Education Conference (I/ITSEC '97), Orlando, Fla.
- Hake, R. R. (under review), Interactive-engagement vs. traditional methods: A six-thousand student survey of mechanics test data for introductory physics courses.
- Halloun, I. A. & Hestenes, D. (1985), Common sense concepts about motion, American Journal of Physics, 53(11), 1056-1065.
- Hambrecht, W. R. & Company, A Vision for E-Learning for America's Workforce, American Society for Trainers and Development (ASTD), referencing Corporate E-Learning: Exploring a New Frontier, 2000.
- Henton, C. (2002), Fiction and reality of TTS, Speech Technology Magazine, January-February, pp. 36-39.
- Hestenes, D., Wells, M., & Swackhamer, G. (1992), Force concept inventory, Physics Teacher, 30, 141—
- Holzmann, G. J., Design and Validation of Computer Protocols, Prentice Hall, New Jersey, 1991, ISBN 0-13-539925-4.
- Hunt, E. & Minstrell, J. (1994), A cognitive approach to the teaching of physics, In K. McGilly (Ed.), In K. McGilly (Ed.), Classroom Lessons: Integrating Cognitive Theory and Classroom Practice (pp. 51-74). Cambridge: MIT Press.
- Jeninek, F., et al, Continuous Speech Recognition: Statistical methods in Handbook of Statistics, II, P. R. Kristnaiad, Ed. Amsterdam, The Netherlands, North-Holland, 1982.
- Johnston, M., Bangalore, S., Stent, A., Vasireddy, G., Ehlen, P., Multimodal Language Processing for Mobile Information Access, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Jordan, P. , Makatchev, M., and VanLehn, K., 2003. Abductive Theorem Proving for Analyzing Student Explanations. In Proceedings of Artificial Intelligence in Education Conference.
- Karat, C., Halverson, C., and Karat, J. (1999), Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. Proceedings of CHI'99: Human Factors in Computing Systems, New York, N.Y., May 15-20, pp. 568-575.
- Lea, W. A. (ed.), Trends in speech recognition, Englewood Cliffs, N.J., Prentice Hall, 1980.
- Litman D. and Forbes-Riley K., Predicting Student Emotions in Computer-Human Tutoring Dialogues. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, July 2004.
- Litman D. and Silliman S., ITSPOKE: An Intelligent Tutoring Spoken Dialogue System. In Proceedings of the Human Language Technology Conference: 4th Meeting of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL) (Companion Proceedings), Boston, Mass., May 2004.
- Litman D., and Forbes K., Recognizing Emotions from Student Speech in Tutoring Dialogues. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), St. Thomas, Virgin Islands, November-December, 2003
- Litman, D., and Allen, J. F., A plan recognition model for sub dialogues in conversation, Cognitive Science, 11(2): 163-200.
- Macho, D., et al, Evaluation of a Noise-Robust DSR Front-End on Aurora Databases, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, Sept. 2002, Denver, Colo., USA.
- Maeireizo B., Litman D., and Hwa R., Co-training for Predicting Emotions with Spoken Dialogue Data. In Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, July 2004
- Martinovic, Miroslav (2002) Integrating statistical and linguistic approaches in building intelligent question answering systems. A presentation at the International Conference on Advances in Infrastructure for 3-busines, e-Education, e-Science, and e-Medicine on the Internet, SSGRR 2002W
- Mazur, E. (1993). Peer Instruction: A User's Manual, Cambridge, Mass.: Harvard University Press
- McCloskey, M., Caramazza, A., & Green, B. (1980), Curvilinear motion in the absence of external forces: Naive beliefs about the motion of objects. Science, 210(5), 1139-1141
- Meng, H., et al, ISIS: A Multi-Modal, Trilingual, Distributed Spoken Dialog System developed with CORBA, Java, XML and KQML, ICSLP 2002 Proceedings, Denver, Colo., USA.
- Mindlever.com, Market Trends and E-Learning, a white paper referencing IDC.
- Morgan, N., Bourlard, H., Renals, S., Cohen, M., and Franco, H., (1993), Hybrid Neural Network/Hidden Markov Model Systems for Continuous Speech Recognition, Journal of Pattern Recognition and Artificial Intelligence, Vol. 7, No. 4 pp. 899-916.
- Ortiz, C. L. and Grosz, B., Interpreting Information Requests in Context: A Collaborative Web Interface for Distance Learning. To appear, Autonomous Agents and Multi-Agent Systems, 2002.
- Pfundt, H. & Duit, R. (1991). Bibliography: Students' Alternative Frameworks and Science Education, Kiel,
- Ploetzner, R. & VanLehn, K. (1997). The acquisition of informal physics knowledge during formal physics
- Poison, M., & Richardson, J. (Eds.) (1988). Foundations of intelligent tutoring systems. Hillsdale, N.J.: Erlbaum Press.
- Profit Magazine, The E-Learning Curve, referencing Information Week, quoting IDC and W.R. Hambrecht, May 2001
- Rabiner, H. R., and Juang, B. H., Fundamentals of Speech Recognition, Prentice Hall, 1993.
- Rabiner, H. R., Digital Processing of Speech Signals, Prentice Hall, 1978.
- Rosé C., Litman D., Bhembe D., Silliman S., Srivastava R., and Van Lehn K., A Comparison of Tutor and Student Behavior in Speech versus Text-Based Tutoring, Proceedings of the HLT/NAACL Workshop: Building Educational Applications Using NLP, June, 2003.
- Ryder, J, Santarelli, T, Scolaro, J., Hicinbothom, J., & Zachary, W. (2000). Comparison of cognitive model uses in intelligent training systems. In Proceedings of IEA2000/HFES2000 (pp. 2-374 to 2-377). Santa Monica, Calif.: Human Factors Society.
- Ryder, J. M., Graesser, A. C., McNamara, J., Karnavat, A., & Popp, E. (2002). A dialog-based intelligent tutoring system for practicing command reasoning skills. Proceedings of the 2002 Interservice/Industry Training Simulation, and Education Conference [CD-ROM]. Arlington, Va.: National Defense Industrial Association.
- SALT Forum at: http://www.saltforum.org
- Shneiderman, B. (2000), The Limits of Speech Recognition. Communications of the ACM., Vol. 43, No. 9, pp. 63-65.
- Shriberg E. and Stolcke A., Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing, Proc. International Conference on Speech Prosody, Nara, Japan, March 2004
- Shriberg E., Stolcke A., Prosody Modeling for Automatic Speech Recognition and Understanding, Mathematical Foundations of Speech and Language Modeling, M. Johnson, M. Ostendorf, S. Khudanpur, R. Rosenfeld (eds.), Volume 138 in IMA Volumes in Mathematics and its Applications, pp. 105-114, Springer-Verlag.
- Shute, V. J., & Psotka, J. (1995). Intelligent tutoring systems: Past, present, and future. In D. Janassen (Ed.), Handbook of Research on Educational Communications and Technology, Scholastic Publications.
- Slotta, J. D., Chi, M. T. H., & Joram, E. (1995), Assessing students' misclassifications of physics concepts: An ontological basis for conceptual change, Cognition and Instruction, 13(3), 373-400.
- Smith, J. P., diSessa, A. A., & Roschelle, J. (1993), Misconceptions reconceived: A constructivist analysis of knowledge in transition, Journal of the Learning Sciences, 2(2), 115-164.
- Speech technology and natural language research at the Microsoft Corp. Redmond and Shanghai Research Laboratories applied to the Pocket PC, MiPad and other mobile appliances, see http://www.microsoft.com/research/speech
- Steven Abney, Partial Parsing via Finite-State Cascades. J. of Natural Language Engineering, 2(4): 337-344. 1996.
- The VoiceXML Forum at http://www.voicexml.org
- Thomas K. Landauer, Peter W. Foltz, and Darrell Laham. An introduction to latent semantic analysis,.Discourse Processes, 25:259-284, 1998.
- Training, Cognition and Instruction, 15(2), 169-206.
- Tversky, A. & Kahneman, D. (1974), Judgments under uncertainty: Heuristics and biases, Science, 185,
- U.S. Bancorp Piper Jaffray, 20001
- Van Valin R. (ed.). 1993. Advances in role and reference grammar. Amsterdam John Benjamins.P166 A34
- Viennot, L. (1979), Spontaneous reasoning in elementary dynamics, European Journal of Science
- Walker, M., et al., DARPA Communicator Evaluation: Progress from 2000 to 2001, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Walker, M., et al., DARPA Communicator: Cross-System Results for the 2001 Evaluation, ICSLP-2002 Proceedings, 7th International Conference On Spoken Language Processing, Sept. 2002, Denver, Colo., USA.
- Wang, K., SALT: A Spoken Language Interface for Web-Based Multimodal Dialog Systems, 7th International Conference On Spoken Language Processing, September 2002, Denver, Colo., USA.
- Weld, D. & de Kleer, J. (1990), Readings in Qualitative Reasoning about Physical Systems, Menlo Park, Calif.: Morgan Kaufmann.
- Wenger, E. (1987). Artificial intelligence and tutoring systems, Los Altos, Morgan Kaufmann, 1987.
- WordNet, A Lexical Database for English. Cognitive Science Laboratory, Princeton University. http://www.cogsci.princeton.edu/˜wn/.
- Young, S. J., and C E Proctor, C. E., The design and implementation of dialogue control in voice operated database inquiry systems, Computer Speech and Language, vol. 3, no. 4, pp. 329-353,1989.
- Zachary, W. Santarelli, T., Lyons, D., Bergondy, M. and Johnston, J. (2001). Using a Community of Intelligent Synthetic Entities to Support Operational Team Training. In Proceedings of the Tenth Conference on Computer Generated Forces and Behavioral Representation. Orlando: Institute for Simulation and Training.
- Zachary, W., Le Mentec, J-C., & Ryder, J. (1996). Interface agents in complex systems. In C. Ntuen & E. H. Park (Eds.), Human interaction with complex systems: Conceptual Principles and Design Practice. Norwell, Mass.: Kluwer Academic Publishers.
- Zachary, W. W., Ryder, J. M., & Hicinbothom, J. H. (2000). Building cognitive task analyses and models of a decision-making team in a complex real-time environment. In J. M. Schraagen, S. F. Chipman, & V. L. Shalin (Eds.), Cognitive Task Analysis. Mahwah, N.J.: Erlbaum.
- Zachary, W. W., Ryder, J. M., Ross, L., & Weiland, M. Z. (1992). Intelligent computer-human interaction in real-time multi-tasking process control and monitoring systems. In M. Helander and M. Nagamachi (Eds.), Design for Manufacturability. New York: Taylor and Francis.
- Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., and Hetherington, L., Jupiter: A telephone-based conversational interface for weather information, IEEE Trans Acoustics, Speech and Signal Processing, vol. 8, no. 1, pp. 85-96, 2000.
- While the preferred embodiment is directed specifically to integrating the prosody analyzer with embodiments of a NLQS system of the type noted above, it will be understood that it could be incorporated within a variety of statistical based NLQS systems. Furthermore, the present invention can be used in both shallow and deep type semantic processing systems of the type noted in the incorporated patents. The microcode and software routines executed to effectuate the inventive methods may be embodied in various forms, including in a permanent magnetic media, a non-volatile ROM, a CD-ROM, or any other suitable machine-readable format. Accordingly, it is intended that all such alterations and modifications be included within the scope and spirit of the invention as defined by the following claims.
Claims (34)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/294,918 US20060122834A1 (en) | 2004-12-03 | 2005-12-05 | Emotion detection device & method for use in distributed systems |
PCT/US2006/061558 WO2007067878A2 (en) | 2005-12-05 | 2006-12-04 | Emotion detection device & method for use in distributed systems |
US12/579,233 US8214214B2 (en) | 2004-12-03 | 2009-10-14 | Emotion detection device and method for use in distributed systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63323904P | 2004-12-03 | 2004-12-03 | |
US11/294,918 US20060122834A1 (en) | 2004-12-03 | 2005-12-05 | Emotion detection device & method for use in distributed systems |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/579,233 Continuation-In-Part US8214214B2 (en) | 2004-12-03 | 2009-10-14 | Emotion detection device and method for use in distributed systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060122834A1 true US20060122834A1 (en) | 2006-06-08 |
Family
ID=38123599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/294,918 Abandoned US20060122834A1 (en) | 2004-12-03 | 2005-12-05 | Emotion detection device & method for use in distributed systems |
Country Status (2)
Country | Link |
---|---|
US (1) | US20060122834A1 (en) |
WO (1) | WO2007067878A2 (en) |
Cited By (363)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050187932A1 (en) * | 2004-02-20 | 2005-08-25 | International Business Machines Corporation | Expression extraction device, expression extraction method, and recording medium |
US20050198597A1 (en) * | 2004-03-08 | 2005-09-08 | Yunshan Zhu | Method and apparatus for performing generator-based verification |
US20050261905A1 (en) * | 2004-05-21 | 2005-11-24 | Samsung Electronics Co., Ltd. | Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same |
US20060129403A1 (en) * | 2004-12-13 | 2006-06-15 | Delta Electronics, Inc. | Method and device for speech synthesizing and dialogue system thereof |
US20060136207A1 (en) * | 2004-12-21 | 2006-06-22 | Electronics And Telecommunications Research Institute | Two stage utterance verification device and method thereof in speech recognition system |
US20070033050A1 (en) * | 2005-08-05 | 2007-02-08 | Yasuharu Asano | Information processing apparatus and method, and program |
US20070055526A1 (en) * | 2005-08-25 | 2007-03-08 | International Business Machines Corporation | Method, apparatus and computer program product providing prosodic-categorical enhancement to phrase-spliced text-to-speech synthesis |
US20070061139A1 (en) * | 2005-09-14 | 2007-03-15 | Delta Electronics, Inc. | Interactive speech correcting method |
US20070150281A1 (en) * | 2005-12-22 | 2007-06-28 | Hoff Todd M | Method and system for utilizing emotion to search content |
US20070206017A1 (en) * | 2005-06-02 | 2007-09-06 | University Of Southern California | Mapping Attitudes to Movements Based on Cultural Norms |
US20070208569A1 (en) * | 2006-03-03 | 2007-09-06 | Balan Subramanian | Communicating across voice and text channels with emotion preservation |
US20070225975A1 (en) * | 2006-03-27 | 2007-09-27 | Kabushiki Kaisha Toshiba | Apparatus, method, and computer program product for processing voice in speech |
US20070271098A1 (en) * | 2006-05-18 | 2007-11-22 | International Business Machines Corporation | Method and apparatus for recognizing and reacting to user personality in accordance with speech recognition system |
US20070276659A1 (en) * | 2006-05-25 | 2007-11-29 | Keiichi Yamada | Apparatus and method for identifying prosody and apparatus and method for recognizing speech |
US20080033994A1 (en) * | 2006-08-07 | 2008-02-07 | Mci, Llc | Interactive voice controlled project management system |
US20080050014A1 (en) * | 2006-08-22 | 2008-02-28 | Gary Bradski | Training and using classification components on multiple processing units |
US20080052080A1 (en) * | 2005-11-30 | 2008-02-28 | University Of Southern California | Emotion Recognition System |
WO2008033095A1 (en) * | 2006-09-15 | 2008-03-20 | Agency For Science, Technology And Research | Apparatus and method for speech utterance verification |
WO2008092473A1 (en) * | 2007-01-31 | 2008-08-07 | Telecom Italia S.P.A. | Customizable method and system for emotional recognition |
US20080269958A1 (en) * | 2007-04-26 | 2008-10-30 | Ford Global Technologies, Llc | Emotive advisory system and method |
US20080270133A1 (en) * | 2007-04-24 | 2008-10-30 | Microsoft Corporation | Speech model refinement with transcription error detection |
US20090003549A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Permutation of an IVR Menu |
US20090003539A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Random Personal Codes |
US20090003548A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Contextual-Based Degradation |
US20090004633A1 (en) * | 2007-06-29 | 2009-01-01 | Alelo, Inc. | Interactive language pronunciation teaching |
US20090055175A1 (en) * | 2007-08-22 | 2009-02-26 | Terrell Ii James Richard | Continuous speech transcription performance indication |
US20090113297A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Requesting a second content based on a user's reaction to a first content |
US20090112696A1 (en) * | 2007-10-24 | 2009-04-30 | Jung Edward K Y | Method of space-available advertising in a mobile device |
US20090112656A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Returning a personalized advertisement |
US20090112694A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Targeted-advertising based on a sensed physiological response by a person to a general advertisement |
US20090112713A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Opportunity advertising in a mobile device |
US20090112693A1 (en) * | 2007-10-24 | 2009-04-30 | Jung Edward K Y | Providing personalized advertising |
US20090138544A1 (en) * | 2006-11-22 | 2009-05-28 | Rainer Wegenkittl | Method and System for Dynamic Image Processing |
US20090177300A1 (en) * | 2008-01-03 | 2009-07-09 | Apple Inc. | Methods and apparatus for altering audio output signals |
US20090192964A1 (en) * | 2008-01-30 | 2009-07-30 | Aptima, Inc. | System and method for comparing system features |
US20090222305A1 (en) * | 2008-03-03 | 2009-09-03 | Berg Jr Charles John | Shopper Communication with Scaled Emotional State |
US20090228796A1 (en) * | 2008-03-05 | 2009-09-10 | Sony Corporation | Method and device for personalizing a multimedia application |
US20090248372A1 (en) * | 2008-03-25 | 2009-10-01 | Electronics And Telecommunications Research Institute | Method of modeling composite emotion in multidimensional vector space |
US20090287678A1 (en) * | 2008-05-14 | 2009-11-19 | International Business Machines Corporation | System and method for providing answers to questions |
US20090306979A1 (en) * | 2008-06-10 | 2009-12-10 | Peeyush Jaiswal | Data processing system for autonomously building speech identification and tagging data |
US20090313018A1 (en) * | 2008-06-17 | 2009-12-17 | Yoav Degani | Speaker Characterization Through Speech Analysis |
US20090313019A1 (en) * | 2006-06-23 | 2009-12-17 | Yumiko Kato | Emotion recognition apparatus |
US20100030714A1 (en) * | 2007-01-31 | 2010-02-04 | Gianmario Bollano | Method and system to improve automated emotional recognition |
US20100049525A1 (en) * | 2008-08-22 | 2010-02-25 | Yap, Inc. | Methods, apparatuses, and systems for providing timely user cues pertaining to speech recognition |
US20100082329A1 (en) * | 2008-09-29 | 2010-04-01 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US20100100218A1 (en) * | 2006-10-09 | 2010-04-22 | Siemens Aktiengesellschaft | Method for Controlling and/or Regulating an Industrial Process |
US20100114556A1 (en) * | 2008-10-31 | 2010-05-06 | International Business Machines Corporation | Speech translation method and apparatus |
US7809663B1 (en) | 2006-05-22 | 2010-10-05 | Convergys Cmg Utah, Inc. | System and method for supporting the utilization of machine language |
US7912720B1 (en) * | 2005-07-20 | 2011-03-22 | At&T Intellectual Property Ii, L.P. | System and method for building emotional machines |
US20110110534A1 (en) * | 2009-11-12 | 2011-05-12 | Apple Inc. | Adjustable voice output based on device status |
US20110125734A1 (en) * | 2009-11-23 | 2011-05-26 | International Business Machines Corporation | Questions and answers generation |
US20110172992A1 (en) * | 2010-01-08 | 2011-07-14 | Electronics And Telecommunications Research Institute | Method for emotion communication between emotion signal sensing device and emotion service providing device |
US20110208522A1 (en) * | 2010-02-21 | 2011-08-25 | Nice Systems Ltd. | Method and apparatus for detection of sentiment in automated transcriptions |
US20110282666A1 (en) * | 2010-04-22 | 2011-11-17 | Fujitsu Limited | Utterance state detection device and utterance state detection method |
US20110307423A1 (en) * | 2010-06-09 | 2011-12-15 | Microsoft Corporation | Distributed decision tree training |
US20120072217A1 (en) * | 2010-09-17 | 2012-03-22 | At&T Intellectual Property I, L.P | System and method for using prosody for voice-enabled search |
WO2012058691A1 (en) * | 2010-10-31 | 2012-05-03 | Speech Morphing Systems, Inc. | Speech morphing communication system |
US8204751B1 (en) * | 2006-03-03 | 2012-06-19 | At&T Intellectual Property Ii, L.P. | Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input |
US20120166198A1 (en) * | 2010-12-22 | 2012-06-28 | Industrial Technology Research Institute | Controllable prosody re-estimation system and method and computer program product thereof |
US20120173464A1 (en) * | 2009-09-02 | 2012-07-05 | Gokhan Tur | Method and apparatus for exploiting human feedback in an intelligent automated assistant |
US20120239393A1 (en) * | 2008-06-13 | 2012-09-20 | International Business Machines Corporation | Multiple audio/video data stream simulation |
US20120243694A1 (en) * | 2011-03-21 | 2012-09-27 | The Intellisis Corporation | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information |
US20120253807A1 (en) * | 2011-03-31 | 2012-10-04 | Fujitsu Limited | Speaker state detecting apparatus and speaker state detecting method |
US8289283B2 (en) | 2008-03-04 | 2012-10-16 | Apple Inc. | Language input interface on a device |
US8296383B2 (en) | 2008-10-02 | 2012-10-23 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
WO2012151786A1 (en) * | 2011-05-11 | 2012-11-15 | 北京航空航天大学 | Chinese voice emotion extraction and modeling method combining emotion points |
US8332394B2 (en) | 2008-05-23 | 2012-12-11 | International Business Machines Corporation | System and method for providing question and answers with deferred type evaluation |
US8345665B2 (en) | 2001-10-22 | 2013-01-01 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US8352272B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for text to speech synthesis |
US8352268B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for selective rate of speech and speech preferences for text to speech synthesis |
US8355919B2 (en) | 2008-09-29 | 2013-01-15 | Apple Inc. | Systems and methods for text normalization for text to speech synthesis |
US8364694B2 (en) | 2007-10-26 | 2013-01-29 | Apple Inc. | Search assistant for digital media assets |
US20130030812A1 (en) * | 2011-07-29 | 2013-01-31 | Hyun-Jun Kim | Apparatus and method for generating emotion information, and function recommendation apparatus based on emotion information |
US8379830B1 (en) | 2006-05-22 | 2013-02-19 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US8396714B2 (en) | 2008-09-29 | 2013-03-12 | Apple Inc. | Systems and methods for concatenation of words in text to speech synthesis |
US20130085760A1 (en) * | 2008-08-12 | 2013-04-04 | Morphism Llc | Training and applying prosody models |
US20130103398A1 (en) * | 2009-08-04 | 2013-04-25 | Nokia Corporation | Method and Apparatus for Audio Signal Classification |
US20130110510A1 (en) * | 2011-10-28 | 2013-05-02 | Cellco Partnership D/B/A Verizon Wireless | Natural language call router |
US8452668B1 (en) | 2006-03-02 | 2013-05-28 | Convergys Customer Management Delaware Llc | System for closed loop decisionmaking in an automated care system |
US8458278B2 (en) | 2003-05-02 | 2013-06-04 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
US8493410B2 (en) | 2008-06-12 | 2013-07-23 | International Business Machines Corporation | Simulation method and system |
US8510296B2 (en) | 2010-09-24 | 2013-08-13 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
US8527861B2 (en) | 1999-08-13 | 2013-09-03 | Apple Inc. | Methods and apparatuses for display and traversing of links in page character array |
US8543407B1 (en) | 2007-10-04 | 2013-09-24 | Great Northern Research, LLC | Speech interface system and method for control and interaction with applications on a computing system |
EP2645364A1 (en) | 2012-03-29 | 2013-10-02 | Honda Research Institute Europe GmbH | Spoken dialog system using prominence |
US8583569B2 (en) * | 2007-04-19 | 2013-11-12 | Microsoft Corporation | Field-programmable gate array based accelerator system |
US20130304686A1 (en) * | 2012-05-09 | 2013-11-14 | Yahoo! Inc. | Methods and systems for personalizing user experience based on attitude prediction |
US20130311185A1 (en) * | 2011-02-15 | 2013-11-21 | Nokia Corporation | Method apparatus and computer program product for prosodic tagging |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8614431B2 (en) | 2005-09-30 | 2013-12-24 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US20140025385A1 (en) * | 2010-12-30 | 2014-01-23 | Nokia Corporation | Method, Apparatus and Computer Program Product for Emotion Detection |
US8639516B2 (en) | 2010-06-04 | 2014-01-28 | Apple Inc. | User-specific noise suppression for voice quality improvements |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8660849B2 (en) | 2010-01-18 | 2014-02-25 | Apple Inc. | Prioritizing selection criteria by automated assistant |
US20140058735A1 (en) * | 2012-08-21 | 2014-02-27 | David A. Sharp | Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US20140136196A1 (en) * | 2012-11-09 | 2014-05-15 | Institute For Information Industry | System and method for posting message by audio signal |
US8738617B2 (en) | 2010-09-28 | 2014-05-27 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US8767978B2 (en) | 2011-03-25 | 2014-07-01 | The Intellisis Corporation | System and method for processing sound signals implementing a spectral motion transform |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US20140188876A1 (en) * | 2012-12-28 | 2014-07-03 | Sony Corporation | Information processing device, information processing method and computer program |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US20140244249A1 (en) * | 2013-02-28 | 2014-08-28 | International Business Machines Corporation | System and Method for Identification of Intent Segment(s) in Caller-Agent Conversations |
US8825584B1 (en) | 2011-08-04 | 2014-09-02 | Smart Information Flow Technologies LLC | Systems and methods for determining social regard scores |
US20140249823A1 (en) * | 2013-03-04 | 2014-09-04 | Fujitsu Limited | State estimating apparatus, state estimating method, and state estimating computer program |
CN104078045A (en) * | 2013-03-26 | 2014-10-01 | 联想(北京)有限公司 | Identifying method and electronic device |
US20140297551A1 (en) * | 2013-04-02 | 2014-10-02 | Hireiq Solutions, Inc. | System and Method of Evaluating a Candidate Fit for a Hiring Decision |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8892550B2 (en) | 2010-09-24 | 2014-11-18 | International Business Machines Corporation | Source expansion for information retrieval and information extraction |
US8898159B2 (en) | 2010-09-28 | 2014-11-25 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
US20150006170A1 (en) * | 2013-06-28 | 2015-01-01 | International Business Machines Corporation | Real-Time Speech Analysis Method and System |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US8943051B2 (en) | 2010-09-24 | 2015-01-27 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US20150100312A1 (en) * | 2013-10-04 | 2015-04-09 | At&T Intellectual Property I, L.P. | System and method of using neural transforms of robust audio features for speech processing |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US20150170644A1 (en) * | 2013-12-16 | 2015-06-18 | Sri International | Method and apparatus for classifying lexical stress |
US20150206543A1 (en) * | 2014-01-22 | 2015-07-23 | Samsung Electronics Co., Ltd. | Apparatus and method for emotion recognition |
US20150213800A1 (en) * | 2014-01-28 | 2015-07-30 | Simple Emotion, Inc. | Methods for adaptive voice interaction |
US9104670B2 (en) | 2010-07-21 | 2015-08-11 | Apple Inc. | Customized search or acquisition of digital media assets |
WO2015123332A1 (en) * | 2013-02-12 | 2015-08-20 | Begel Daniel | Method and system to identify human characteristics using speech acoustics |
US20150254061A1 (en) * | 2012-11-28 | 2015-09-10 | OOO "Speaktoit" | Method for user training of information dialogue system |
US9183850B2 (en) | 2011-08-08 | 2015-11-10 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
US9195641B1 (en) * | 2011-07-01 | 2015-11-24 | West Corporation | Method and apparatus of processing user text input information |
US20150348569A1 (en) * | 2014-05-28 | 2015-12-03 | International Business Machines Corporation | Semantic-free text analysis for identifying traits |
EP2812897A4 (en) * | 2012-02-10 | 2015-12-30 | Intel Corp | Perceptual computing with conversational agent |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
WO2015184196A3 (en) * | 2014-05-28 | 2016-03-17 | Aliphcom | Speech summary and action item generation |
WO2016014597A3 (en) * | 2014-07-21 | 2016-03-24 | Feele, A Partnership By Operation Of Law | Translating emotions into electronic representations |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US9317586B2 (en) | 2010-09-28 | 2016-04-19 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US20160118050A1 (en) * | 2014-10-24 | 2016-04-28 | Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi | Non-standard speech detection system and method |
US9330381B2 (en) | 2008-01-06 | 2016-05-03 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160132482A1 (en) * | 2014-11-10 | 2016-05-12 | Oracle International Corporation | Automatic ontology generation for natural-language processing applications |
US20160148616A1 (en) * | 2014-11-26 | 2016-05-26 | Panasonic Intellectual Property Corporation Of America | Method and apparatus for recognizing speech by lip reading |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9390706B2 (en) * | 2014-06-19 | 2016-07-12 | Mattersight Corporation | Personality-based intelligent personal assistant system and methods |
US20160210117A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for recommending dialogue sticker based on similar situation detection |
US20160210985A1 (en) * | 2013-09-25 | 2016-07-21 | Intel Corporation | Improving natural language interactions using emotional modulation |
US20160210963A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for determining ranking of dialogue sticker based on situation and preference information |
US20160210279A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for analyzing communication situation based on emotion information |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9443515B1 (en) * | 2012-09-05 | 2016-09-13 | Paul G. Boyce | Personality designer system for a detachably attachable remote audio object |
US9449275B2 (en) | 2011-07-12 | 2016-09-20 | Siemens Aktiengesellschaft | Actuation of a technical system based on solutions of relaxed abduction |
US9472185B1 (en) * | 2011-01-05 | 2016-10-18 | Interactions Llc | Automated recognition system for natural language understanding |
US9473866B2 (en) | 2011-08-08 | 2016-10-18 | Knuedge Incorporated | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US9472207B2 (en) | 2013-06-20 | 2016-10-18 | Suhas Gondi | Portable assistive device for combating autism spectrum disorders |
US9485597B2 (en) | 2011-08-08 | 2016-11-01 | Knuedge Incorporated | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US20160328623A1 (en) * | 2014-05-09 | 2016-11-10 | Samsung Electronics Co., Ltd. | Liveness testing methods and apparatuses and image processing methods and apparatuses |
US9495481B2 (en) | 2010-09-24 | 2016-11-15 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
WO2016183229A1 (en) * | 2015-05-11 | 2016-11-17 | Olsher Daniel Joseph | Universal task independent simulation and control platform for generating controlled actions using nuanced artificial intelligence |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US20160343268A1 (en) * | 2013-09-11 | 2016-11-24 | Lincoln Global, Inc. | Learning management system for a real-time simulated virtual reality welding training environment |
US9508038B2 (en) | 2010-09-24 | 2016-11-29 | International Business Machines Corporation | Using ontological information in open domain type coercion |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US20170060839A1 (en) * | 2015-09-01 | 2017-03-02 | Casio Computer Co., Ltd. | Dialogue control device, dialogue control method and non-transitory computer-readable information recording medium |
US9601104B2 (en) | 2015-03-27 | 2017-03-21 | International Business Machines Corporation | Imbuing artificial intelligence systems with idiomatic traits |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715498B2 (en) | 2015-08-31 | 2017-07-25 | Microsoft Technology Licensing, Llc | Distributed server system for language understanding |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US9741347B2 (en) | 2011-01-05 | 2017-08-22 | Interactions Llc | Automated speech recognition proxy system for natural language understanding |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9779084B2 (en) | 2013-10-04 | 2017-10-03 | Mattersight Corporation | Online classroom analytics system and methods |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798800B2 (en) | 2010-09-24 | 2017-10-24 | International Business Machines Corporation | Providing question and answers with deferred type evaluation using text with limited structure |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US20170344713A1 (en) * | 2014-12-12 | 2017-11-30 | Koninklijke Philips N.V. | Device, system and method for assessing information needs of a person |
US9833200B2 (en) | 2015-05-14 | 2017-12-05 | University Of Florida Research Foundation, Inc. | Low IF architectures for noncontact vital sign detection |
US9842611B2 (en) | 2015-02-06 | 2017-12-12 | Knuedge Incorporated | Estimating pitch using peak-to-peak distances |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US20180012230A1 (en) * | 2016-07-11 | 2018-01-11 | International Business Machines Corporation | Emotion detection over social media |
US9870785B2 (en) | 2015-02-06 | 2018-01-16 | Knuedge Incorporated | Determining features of harmonic signals |
US9875445B2 (en) | 2014-02-25 | 2018-01-23 | Sri International | Dynamic hybrid models for multimodal analysis |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922668B2 (en) | 2015-02-06 | 2018-03-20 | Knuedge Incorporated | Estimating fractional chirp rate with multiple frequency representations |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9924906B2 (en) | 2007-07-12 | 2018-03-27 | University Of Florida Research Foundation, Inc. | Random body movement cancellation for non-contact vital sign detection |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US20180096698A1 (en) * | 2016-09-30 | 2018-04-05 | Honda Motor Co., Ltd. | Processing result error detection device, processing result error detection program, processing result error detection method, and moving entity |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US20180122377A1 (en) * | 2016-10-31 | 2018-05-03 | Furhat Robotics Ab | Voice interaction apparatus and voice interaction method |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9973450B2 (en) | 2007-09-17 | 2018-05-15 | Amazon Technologies, Inc. | Methods and systems for dynamically updating web service profile information by parsing transcribed message strings |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
CN108133038A (en) * | 2018-01-10 | 2018-06-08 | 重庆邮电大学 | A kind of entity level emotional semantic classification system and method based on dynamic memory network |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US20180190284A1 (en) * | 2015-06-22 | 2018-07-05 | Carnegie Mellon University | Processing speech signals in voice-based profiling |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
CN108470024A (en) * | 2018-03-12 | 2018-08-31 | 北京灵伴即时智能科技有限公司 | A kind of Chinese rhythm structure prediction technique of fusion syntactic-semantic pragmatic information |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10074359B2 (en) * | 2016-11-01 | 2018-09-11 | Google Llc | Dynamic text-to-speech provisioning |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10133733B2 (en) | 2007-03-06 | 2018-11-20 | Botanic Technologies, Inc. | Systems and methods for an autonomous avatar driver |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
CN109074809A (en) * | 2016-07-26 | 2018-12-21 | 索尼公司 | Information processing equipment, information processing method and program |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10235990B2 (en) | 2017-01-04 | 2019-03-19 | International Business Machines Corporation | System and method for cognitive intervention on human interactions |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
CN109559734A (en) * | 2018-12-18 | 2019-04-02 | 百度在线网络技术(北京)有限公司 | The acceleration method and device of acoustic training model |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10262061B2 (en) | 2015-05-19 | 2019-04-16 | Oracle International Corporation | Hierarchical data classification using frequency analysis |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
CN109818833A (en) * | 2019-03-14 | 2019-05-28 | 北京信而泰科技股份有限公司 | A kind of ethernet test system and ethernet test method |
US10318639B2 (en) | 2017-02-03 | 2019-06-11 | International Business Machines Corporation | Intelligent action recommendation |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US20190198040A1 (en) * | 2017-12-22 | 2019-06-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Mood recognition method, electronic device and computer-readable storage medium |
US10347244B2 (en) | 2017-04-21 | 2019-07-09 | Go-Vivace Inc. | Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10360903B2 (en) * | 2015-03-20 | 2019-07-23 | Kabushiki Kaisha Toshiba | Spoken language understanding apparatus, method, and program |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10373515B2 (en) | 2017-01-04 | 2019-08-06 | International Business Machines Corporation | System and method for cognitive intervention on human interactions |
CN110147432A (en) * | 2019-05-07 | 2019-08-20 | 大连理工大学 | A kind of Decision Search engine implementing method based on finite-state automata |
US10394958B2 (en) * | 2017-11-09 | 2019-08-27 | Conduent Business Services, Llc | Performing semantic analyses of user-generated text content using a lexicon |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10430157B2 (en) * | 2015-01-19 | 2019-10-01 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing speech signal |
US20190304480A1 (en) * | 2018-03-29 | 2019-10-03 | Ford Global Technologies, Llc | Neural Network Generative Modeling To Transform Speech Utterances And Augment Training Data |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
WO2020036195A1 (en) * | 2018-08-15 | 2020-02-20 | 日本電信電話株式会社 | End-of-speech determination device, end-of-speech determination method, and program |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
WO2020051500A1 (en) * | 2018-09-06 | 2020-03-12 | Coffing Daniel L | System for providing dialogue guidance |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10614725B2 (en) | 2012-09-11 | 2020-04-07 | International Business Machines Corporation | Generating secondary questions in an introspective question answering system |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
RU2721180C1 (en) * | 2019-12-02 | 2020-05-18 | Самсунг Электроникс Ко., Лтд. | Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10742500B2 (en) * | 2017-09-20 | 2020-08-11 | Microsoft Technology Licensing, Llc | Iteratively updating a collaboration site or template |
US10748644B2 (en) | 2018-06-19 | 2020-08-18 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10764206B2 (en) | 2016-08-04 | 2020-09-01 | International Business Machines Corporation | Adjusting network bandwidth based on an analysis of a user's cognitive state |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
WO2020227557A1 (en) * | 2019-05-09 | 2020-11-12 | Sri International | Method, system and apparatus for understanding and generating human conversational cues |
US10867128B2 (en) | 2017-09-12 | 2020-12-15 | Microsoft Technology Licensing, Llc | Intelligently updating a collaboration site or template |
WO2021011139A1 (en) * | 2019-07-18 | 2021-01-21 | Sri International | The conversational assistant for conversational engagement |
US10938592B2 (en) * | 2017-07-21 | 2021-03-02 | Pearson Education, Inc. | Systems and methods for automated platform-based algorithm monitoring |
US10956009B2 (en) | 2011-12-15 | 2021-03-23 | L'oreal | Method and system for interactive cosmetic enhancements interface |
US10991384B2 (en) | 2017-04-21 | 2021-04-27 | audEERING GmbH | Method for automatic affective state inference and an automated affective state inference system |
CN112735427A (en) * | 2020-12-25 | 2021-04-30 | 平安普惠企业管理有限公司 | Radio reception control method and device, electronic equipment and storage medium |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11042711B2 (en) | 2018-03-19 | 2021-06-22 | Daniel L. Coffing | Processing natural language arguments and propositions |
US11051702B2 (en) | 2014-10-08 | 2021-07-06 | University Of Florida Research Foundation, Inc. | Method and apparatus for non-contact fast vital sign acquisition based on radar signal |
US11068519B2 (en) * | 2016-07-29 | 2021-07-20 | Microsoft Technology Licensing, Llc | Conversation oriented machine-user interaction |
US11120895B2 (en) | 2018-06-19 | 2021-09-14 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US20210287664A1 (en) * | 2020-03-13 | 2021-09-16 | Palo Alto Research Center Incorporated | Machine learning used to detect alignment and misalignment in conversation |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US11183174B2 (en) * | 2018-08-31 | 2021-11-23 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11250038B2 (en) * | 2018-01-21 | 2022-02-15 | Microsoft Technology Licensing, Llc. | Question and answer pair generation using machine learning |
WO2022036446A1 (en) * | 2020-08-17 | 2022-02-24 | Jali Inc. | System and method for triggering animated paralingual behavior from dialogue |
US11295736B2 (en) * | 2016-01-25 | 2022-04-05 | Sony Corporation | Communication system and communication control method |
US20220129627A1 (en) * | 2020-10-27 | 2022-04-28 | Disney Enterprises, Inc. | Multi-persona social agent |
US11335347B2 (en) * | 2019-06-03 | 2022-05-17 | Amazon Technologies, Inc. | Multiple classifications of audio data |
US11341962B2 (en) | 2010-05-13 | 2022-05-24 | Poltorak Technologies Llc | Electronic personal interactive device |
US20220164539A1 (en) * | 2019-04-26 | 2022-05-26 | Tucknologies Holdings, Inc. | Human Emotion Detection |
US20220208216A1 (en) * | 2020-12-28 | 2022-06-30 | Sharp Kabushiki Kaisha | Two-way communication support system and storage medium |
US11416556B2 (en) * | 2019-12-19 | 2022-08-16 | Accenture Global Solutions Limited | Natural language dialogue system perturbation testing |
US11430438B2 (en) * | 2019-03-22 | 2022-08-30 | Samsung Electronics Co., Ltd. | Electronic device providing response corresponding to user conversation style and emotion and method of operating same |
US20220284920A1 (en) * | 2019-07-05 | 2022-09-08 | Gn Audio A/S | A method and a noise indicator system for identifying one or more noisy persons |
US11521220B2 (en) | 2019-06-05 | 2022-12-06 | International Business Machines Corporation | Generating classification and regression tree from IoT data |
US11563852B1 (en) * | 2021-08-13 | 2023-01-24 | Capital One Services, Llc | System and method for identifying complaints in interactive communications and providing feedback in real-time |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11599332B1 (en) | 2007-10-04 | 2023-03-07 | Great Northern Research, LLC | Multiple shell multi faceted graphical user interface |
US20230119954A1 (en) * | 2020-06-01 | 2023-04-20 | Amazon Technologies, Inc. | Sentiment aware voice user interface |
US11700426B2 (en) | 2021-02-23 | 2023-07-11 | Firefly 14, Llc | Virtual platform for recording and displaying responses and reactions to audiovisual contents |
US11743268B2 (en) | 2018-09-14 | 2023-08-29 | Daniel L. Coffing | Fact management system |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5602653B2 (en) * | 2011-01-31 | 2014-10-08 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Information processing apparatus, information processing method, information processing system, and program |
CN105513587B (en) * | 2014-09-22 | 2020-07-24 | 联想(北京)有限公司 | MFCC extraction method and device |
US10334103B2 (en) | 2017-01-25 | 2019-06-25 | International Business Machines Corporation | Message translation for cognitive assistance |
US10921755B2 (en) | 2018-12-17 | 2021-02-16 | General Electric Company | Method and system for competence monitoring and contiguous learning for control |
CN109754809B (en) * | 2019-01-29 | 2021-02-09 | 北京猎户星空科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN109817201B (en) * | 2019-03-29 | 2021-03-26 | 北京金山安全软件有限公司 | Language learning method and device, electronic equipment and readable storage medium |
Citations (98)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4473904A (en) * | 1978-12-11 | 1984-09-25 | Hitachi, Ltd. | Speech information transmission method and system |
US4521907A (en) * | 1982-05-25 | 1985-06-04 | American Microsystems, Incorporated | Multiplier/adder circuit |
US4587670A (en) * | 1982-10-15 | 1986-05-06 | At&T Bell Laboratories | Hidden Markov model speech recognition arrangement |
US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US4785408A (en) * | 1985-03-11 | 1988-11-15 | AT&T Information Systems Inc. American Telephone and Telegraph Company | Method and apparatus for generating computer-controlled interactive voice services |
US4852170A (en) * | 1986-12-18 | 1989-07-25 | R & D Associates | Real time computer speech recognition system |
US4868750A (en) * | 1987-10-07 | 1989-09-19 | Houghton Mifflin Company | Collocational grammar system |
US4914590A (en) * | 1988-05-18 | 1990-04-03 | Emhart Industries, Inc. | Natural language understanding system |
US4937870A (en) * | 1988-11-14 | 1990-06-26 | American Telephone And Telegraph Company | Speech recognition arrangement |
US4956865A (en) * | 1985-01-30 | 1990-09-11 | Northern Telecom Limited | Speech recognition |
US4991094A (en) * | 1989-04-26 | 1991-02-05 | International Business Machines Corporation | Method for language-independent text tokenization using a character categorization |
US4991217A (en) * | 1984-11-30 | 1991-02-05 | Ibm Corporation | Dual processor speech recognition system with dedicated data acquisition bus |
US5036539A (en) * | 1989-07-06 | 1991-07-30 | Itt Corporation | Real-time speech processing development system |
US5068789A (en) * | 1988-09-15 | 1991-11-26 | Oce-Nederland B.V. | Method and means for grammatically processing a natural language sentence |
US5146405A (en) * | 1988-02-05 | 1992-09-08 | At&T Bell Laboratories | Methods for part-of-speech determination and usage |
US5157727A (en) * | 1980-02-22 | 1992-10-20 | Alden Schloss | Process for digitizing speech |
US5231670A (en) * | 1987-06-01 | 1993-07-27 | Kurzweil Applied Intelligence, Inc. | Voice controlled system and method for generating text from a voice controlled input |
US5265014A (en) * | 1990-04-10 | 1993-11-23 | Hewlett-Packard Company | Multi-modal user interface |
US5278980A (en) * | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
US5293584A (en) * | 1992-05-21 | 1994-03-08 | International Business Machines Corporation | Speech recognition system for natural language translation |
US5317507A (en) * | 1990-11-07 | 1994-05-31 | Gallant Stephen I | Method for document retrieval and for word sense disambiguation using neural networks |
US5371901A (en) * | 1991-07-08 | 1994-12-06 | Motorola, Inc. | Remote voice control system |
US5384892A (en) * | 1992-12-31 | 1995-01-24 | Apple Computer, Inc. | Dynamic language model for speech recognition |
US5454106A (en) * | 1993-05-17 | 1995-09-26 | International Business Machines Corporation | Database retrieval system using natural language for presenting understood components of an ambiguous query on a user interface |
US5475792A (en) * | 1992-09-21 | 1995-12-12 | International Business Machines Corporation | Telephony channel simulator for speech recognition application |
US5500920A (en) * | 1993-09-23 | 1996-03-19 | Xerox Corporation | Semantic co-occurrence filtering for speech recognition and signal transcription applications |
US5509104A (en) * | 1989-05-17 | 1996-04-16 | At&T Corp. | Speech recognition employing key word modeling and non-key word modeling |
US5513298A (en) * | 1992-09-21 | 1996-04-30 | International Business Machines Corporation | Instantaneous context switching for speech recognition systems |
US5519608A (en) * | 1993-06-24 | 1996-05-21 | Xerox Corporation | Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation |
US5524169A (en) * | 1993-12-30 | 1996-06-04 | International Business Machines Incorporated | Method and system for location-specific speech recognition |
US5540589A (en) * | 1994-04-11 | 1996-07-30 | Mitsubishi Electric Information Technology Center | Audio interactive tutor |
US5553119A (en) * | 1994-07-07 | 1996-09-03 | Bell Atlantic Network Services, Inc. | Intelligent recognition of speech signals using caller demographics |
US5602963A (en) * | 1993-10-12 | 1997-02-11 | Voice Powered Technology International, Inc. | Voice activated personal organizer |
US5615296A (en) * | 1993-11-12 | 1997-03-25 | International Business Machines Corporation | Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors |
US5625748A (en) * | 1994-04-18 | 1997-04-29 | Bbn Corporation | Topic discriminator using posterior probability or confidence scores |
US5625814A (en) * | 1992-05-27 | 1997-04-29 | Apple Computer, Inc. | Method and apparatus for processing natural language with a hierarchy of mapping routines |
US5652789A (en) * | 1994-09-30 | 1997-07-29 | Wildfire Communications, Inc. | Network based knowledgeable assistant |
US5652897A (en) * | 1993-05-24 | 1997-07-29 | Unisys Corporation | Robust language processor for segmenting and parsing-language containing multiple instructions |
US5668854A (en) * | 1993-07-29 | 1997-09-16 | International Business Machine Corp. | Distributed system for call processing |
US5675788A (en) * | 1995-09-15 | 1997-10-07 | Infonautics Corp. | Method and apparatus for generating a composite document on a selected topic from a plurality of information sources |
US5675707A (en) * | 1995-09-15 | 1997-10-07 | At&T | Automated call router system and method |
US5675819A (en) * | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US5680628A (en) * | 1995-07-19 | 1997-10-21 | Inso Corporation | Method and apparatus for automated search and retrieval process |
US5680511A (en) * | 1995-06-07 | 1997-10-21 | Dragon Systems, Inc. | Systems and methods for word recognition |
US5694592A (en) * | 1993-11-05 | 1997-12-02 | University Of Central Florida | Process for determination of text relevancy |
US5696962A (en) * | 1993-06-24 | 1997-12-09 | Xerox Corporation | Method for computerized information retrieval using shallow linguistic analysis |
US5727950A (en) * | 1996-05-22 | 1998-03-17 | Netsage Corporation | Agent based instruction system and method |
US5730603A (en) * | 1996-05-16 | 1998-03-24 | Interactive Drama, Inc. | Audiovisual simulation system and method with dynamic intelligent prompts |
US5737485A (en) * | 1995-03-07 | 1998-04-07 | Rutgers The State University Of New Jersey | Method and apparatus including microphone arrays and neural networks for speech/speaker recognition systems |
US5748841A (en) * | 1994-02-25 | 1998-05-05 | Morin; Philippe | Supervised contextual language acquisition system |
US5758023A (en) * | 1993-07-13 | 1998-05-26 | Bordeaux; Theodore Austin | Multi-language speech recognition system |
US5758322A (en) * | 1994-12-09 | 1998-05-26 | International Voice Register, Inc. | Method and apparatus for conducting point-of-sale transactions using voice recognition |
US5765130A (en) * | 1996-05-21 | 1998-06-09 | Applied Language Technologies, Inc. | Method and apparatus for facilitating speech barge-in in connection with voice recognition systems |
US5774841A (en) * | 1995-09-20 | 1998-06-30 | The United States Of America As Represented By The Adminstrator Of The National Aeronautics And Space Administration | Real-time reconfigurable adaptive speech recognition command and control apparatus and method |
US5774859A (en) * | 1995-01-03 | 1998-06-30 | Scientific-Atlanta, Inc. | Information system having a speech interface |
US5794193A (en) * | 1995-09-15 | 1998-08-11 | Lucent Technologies Inc. | Automated phrase generation |
US5797123A (en) * | 1996-10-01 | 1998-08-18 | Lucent Technologies Inc. | Method of key-phase detection and verification for flexible speech understanding |
US5802251A (en) * | 1993-12-30 | 1998-09-01 | International Business Machines Corporation | Method and system for reducing perplexity in speech recognition via caller identification |
US5802526A (en) * | 1995-11-15 | 1998-09-01 | Microsoft Corporation | System and method for graphically displaying and navigating through an interactive voice response menu |
US5819220A (en) * | 1996-09-30 | 1998-10-06 | Hewlett-Packard Company | Web triggered word set boosting for speech interfaces to the world wide web |
US5826227A (en) * | 1995-12-18 | 1998-10-20 | Lucent Technologies Inc. | Hiding a source identifier within a signal |
US5838683A (en) * | 1995-03-13 | 1998-11-17 | Selsius Systems Inc. | Distributed interactive multimedia system architecture |
US5836771A (en) * | 1996-12-02 | 1998-11-17 | Ho; Chi Fai | Learning method and system based on questioning |
US5878406A (en) * | 1993-01-29 | 1999-03-02 | Noyes; Dallas B. | Method for representation of knowledge in a computer as a network database system |
US5913193A (en) * | 1996-04-30 | 1999-06-15 | Microsoft Corporation | Method and system of runtime acoustic unit selection for speech synthesis |
US5987415A (en) * | 1998-03-23 | 1999-11-16 | Microsoft Corporation | Modeling a user's emotion and personality in a computer user interface |
US6078886A (en) * | 1997-04-14 | 2000-06-20 | At&T Corporation | System and method for providing remote automatic speech recognition services via a packet network |
US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
US6173260B1 (en) * | 1997-10-29 | 2001-01-09 | Interval Research Corporation | System and method for automatic classification of speech based upon affective content |
US6246981B1 (en) * | 1998-11-25 | 2001-06-12 | International Business Machines Corporation | Natural language task-oriented dialog manager and method |
US6246977B1 (en) * | 1997-03-07 | 2001-06-12 | Microsoft Corporation | Information retrieval utilizing semantic representation of text and based on constrained expansion of query words |
US6304864B1 (en) * | 1999-04-20 | 2001-10-16 | Textwise Llc | System for retrieving multimedia information from the internet using multiple evolving intelligent agents |
US6311182B1 (en) * | 1997-11-17 | 2001-10-30 | Genuity Inc. | Voice activated web browser |
US6345245B1 (en) * | 1997-03-06 | 2002-02-05 | Kabushiki Kaisha Toshiba | Method and system for managing a common dictionary and updating dictionary data selectively according to a type of local processing system |
US20020095295A1 (en) * | 1998-12-01 | 2002-07-18 | Cohen Michael H. | Detection of characteristics of human-machine interactions for dialog customization and analysis |
US20020147581A1 (en) * | 2001-04-10 | 2002-10-10 | Sri International | Method and apparatus for performing prosody-based endpointing of a speech signal |
US6496799B1 (en) * | 1999-12-22 | 2002-12-17 | International Business Machines Corporation | End-of-utterance determination for voice processing |
US20030028383A1 (en) * | 2001-02-20 | 2003-02-06 | I & A Research Inc. | System for modeling and simulating emotion states |
US6526382B1 (en) * | 1999-12-07 | 2003-02-25 | Comverse, Inc. | Language-oriented user interfaces for voice activated services |
US20030069728A1 (en) * | 2001-10-05 | 2003-04-10 | Raquel Tato | Method for detecting emotions involving subspace specialists |
US6587822B2 (en) * | 1998-10-06 | 2003-07-01 | Lucent Technologies Inc. | Web-based platform for interactive voice response (IVR) |
US6609089B1 (en) * | 1999-08-30 | 2003-08-19 | Lucent Technologies Inc. | Method and apparatus for providing interactive services with multiple interfaces |
US6615172B1 (en) * | 1999-11-12 | 2003-09-02 | Phoenix Solutions, Inc. | Intelligent query engine for processing voice based queries |
US20030179877A1 (en) * | 2002-03-21 | 2003-09-25 | Anthony Dezonno | Adaptive transaction guidance |
US6745161B1 (en) * | 1999-09-17 | 2004-06-01 | Discern Communications, Inc. | System and method for incorporating concept-based retrieval within boolean search engines |
US20040148172A1 (en) * | 2003-01-24 | 2004-07-29 | Voice Signal Technologies, Inc, | Prosodic mimic method and apparatus |
US6842767B1 (en) * | 1999-10-22 | 2005-01-11 | Tellme Networks, Inc. | Method and apparatus for content personalization over a telephone interface with adaptive personalization |
US6851115B1 (en) * | 1999-01-05 | 2005-02-01 | Sri International | Software-based architecture for communication and cooperation among distributed electronic agents |
US20050060158A1 (en) * | 2003-09-12 | 2005-03-17 | Norikazu Endo | Method and system for adjusting the voice prompt of an interactive system based upon the user's state |
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
US6879956B1 (en) * | 1999-09-30 | 2005-04-12 | Sony Corporation | Speech recognition with feedback from natural language processing for adaptation of acoustic models |
US20050102135A1 (en) * | 2003-11-12 | 2005-05-12 | Silke Goronzy | Apparatus and method for automatic extraction of important events in audio signals |
US6909453B2 (en) * | 2001-12-20 | 2005-06-21 | Matsushita Electric Industrial Co., Ltd. | Virtual television phone apparatus |
US6970915B1 (en) * | 1999-11-01 | 2005-11-29 | Tellme Networks, Inc. | Streaming content over a telephone interface |
US6985852B2 (en) * | 2001-08-21 | 2006-01-10 | Microsoft Corporation | Method and apparatus for dynamic grammars and focused semantic parsing |
US7143042B1 (en) * | 1999-10-04 | 2006-11-28 | Nuance Communications | Tool for graphically defining dialog flows and for establishing operational links between speech applications and hypermedia content in an interactive voice response environment |
US7321854B2 (en) * | 2002-09-19 | 2008-01-22 | The Penn State Research Foundation | Prosody based audio/visual co-analysis for co-verbal gesture recognition |
US7398209B2 (en) * | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7222075B2 (en) * | 1999-08-31 | 2007-05-22 | Accenture Llp | Detecting emotions using voice signal analysis |
US9076448B2 (en) * | 1999-11-12 | 2015-07-07 | Nuance Communications, Inc. | Distributed real time speech recognition system |
US6934756B2 (en) * | 2000-11-01 | 2005-08-23 | International Business Machines Corporation | Conversational networking via transport, coding and control conversational protocols |
EP1256937B1 (en) * | 2001-05-11 | 2006-11-02 | Sony France S.A. | Emotion recognition method and device |
US20050234727A1 (en) * | 2001-07-03 | 2005-10-20 | Leo Chiu | Method and apparatus for adapting a voice extensible markup language-enabled voice system for natural speech recognition and system response |
US6795793B2 (en) * | 2002-07-19 | 2004-09-21 | Med-Ed Innovations, Inc. | Method and apparatus for evaluating data and implementing training based on the evaluation of the data |
-
2005
- 2005-12-05 US US11/294,918 patent/US20060122834A1/en not_active Abandoned
-
2006
- 2006-12-04 WO PCT/US2006/061558 patent/WO2007067878A2/en active Search and Examination
Patent Citations (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4473904A (en) * | 1978-12-11 | 1984-09-25 | Hitachi, Ltd. | Speech information transmission method and system |
US5157727A (en) * | 1980-02-22 | 1992-10-20 | Alden Schloss | Process for digitizing speech |
US4521907A (en) * | 1982-05-25 | 1985-06-04 | American Microsystems, Incorporated | Multiplier/adder circuit |
US4587670A (en) * | 1982-10-15 | 1986-05-06 | At&T Bell Laboratories | Hidden Markov model speech recognition arrangement |
US4991217A (en) * | 1984-11-30 | 1991-02-05 | Ibm Corporation | Dual processor speech recognition system with dedicated data acquisition bus |
US4956865A (en) * | 1985-01-30 | 1990-09-11 | Northern Telecom Limited | Speech recognition |
US4785408A (en) * | 1985-03-11 | 1988-11-15 | AT&T Information Systems Inc. American Telephone and Telegraph Company | Method and apparatus for generating computer-controlled interactive voice services |
US4783803A (en) * | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US4852170A (en) * | 1986-12-18 | 1989-07-25 | R & D Associates | Real time computer speech recognition system |
US5231670A (en) * | 1987-06-01 | 1993-07-27 | Kurzweil Applied Intelligence, Inc. | Voice controlled system and method for generating text from a voice controlled input |
US4868750A (en) * | 1987-10-07 | 1989-09-19 | Houghton Mifflin Company | Collocational grammar system |
US5146405A (en) * | 1988-02-05 | 1992-09-08 | At&T Bell Laboratories | Methods for part-of-speech determination and usage |
US4914590A (en) * | 1988-05-18 | 1990-04-03 | Emhart Industries, Inc. | Natural language understanding system |
US5068789A (en) * | 1988-09-15 | 1991-11-26 | Oce-Nederland B.V. | Method and means for grammatically processing a natural language sentence |
US4937870A (en) * | 1988-11-14 | 1990-06-26 | American Telephone And Telegraph Company | Speech recognition arrangement |
US4991094A (en) * | 1989-04-26 | 1991-02-05 | International Business Machines Corporation | Method for language-independent text tokenization using a character categorization |
US5509104A (en) * | 1989-05-17 | 1996-04-16 | At&T Corp. | Speech recognition employing key word modeling and non-key word modeling |
US5036539A (en) * | 1989-07-06 | 1991-07-30 | Itt Corporation | Real-time speech processing development system |
US5265014A (en) * | 1990-04-10 | 1993-11-23 | Hewlett-Packard Company | Multi-modal user interface |
US5317507A (en) * | 1990-11-07 | 1994-05-31 | Gallant Stephen I | Method for document retrieval and for word sense disambiguation using neural networks |
US5371901A (en) * | 1991-07-08 | 1994-12-06 | Motorola, Inc. | Remote voice control system |
US5278980A (en) * | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
US5293584A (en) * | 1992-05-21 | 1994-03-08 | International Business Machines Corporation | Speech recognition system for natural language translation |
US5625814A (en) * | 1992-05-27 | 1997-04-29 | Apple Computer, Inc. | Method and apparatus for processing natural language with a hierarchy of mapping routines |
US5475792A (en) * | 1992-09-21 | 1995-12-12 | International Business Machines Corporation | Telephony channel simulator for speech recognition application |
US5513298A (en) * | 1992-09-21 | 1996-04-30 | International Business Machines Corporation | Instantaneous context switching for speech recognition systems |
US5384892A (en) * | 1992-12-31 | 1995-01-24 | Apple Computer, Inc. | Dynamic language model for speech recognition |
US5878406A (en) * | 1993-01-29 | 1999-03-02 | Noyes; Dallas B. | Method for representation of knowledge in a computer as a network database system |
US5454106A (en) * | 1993-05-17 | 1995-09-26 | International Business Machines Corporation | Database retrieval system using natural language for presenting understood components of an ambiguous query on a user interface |
US5652897A (en) * | 1993-05-24 | 1997-07-29 | Unisys Corporation | Robust language processor for segmenting and parsing-language containing multiple instructions |
US5519608A (en) * | 1993-06-24 | 1996-05-21 | Xerox Corporation | Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation |
US5696962A (en) * | 1993-06-24 | 1997-12-09 | Xerox Corporation | Method for computerized information retrieval using shallow linguistic analysis |
US5758023A (en) * | 1993-07-13 | 1998-05-26 | Bordeaux; Theodore Austin | Multi-language speech recognition system |
US5668854A (en) * | 1993-07-29 | 1997-09-16 | International Business Machine Corp. | Distributed system for call processing |
US5500920A (en) * | 1993-09-23 | 1996-03-19 | Xerox Corporation | Semantic co-occurrence filtering for speech recognition and signal transcription applications |
US5602963A (en) * | 1993-10-12 | 1997-02-11 | Voice Powered Technology International, Inc. | Voice activated personal organizer |
US5694592A (en) * | 1993-11-05 | 1997-12-02 | University Of Central Florida | Process for determination of text relevancy |
US5615296A (en) * | 1993-11-12 | 1997-03-25 | International Business Machines Corporation | Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors |
US5802251A (en) * | 1993-12-30 | 1998-09-01 | International Business Machines Corporation | Method and system for reducing perplexity in speech recognition via caller identification |
US5524169A (en) * | 1993-12-30 | 1996-06-04 | International Business Machines Incorporated | Method and system for location-specific speech recognition |
US5748841A (en) * | 1994-02-25 | 1998-05-05 | Morin; Philippe | Supervised contextual language acquisition system |
US5540589A (en) * | 1994-04-11 | 1996-07-30 | Mitsubishi Electric Information Technology Center | Audio interactive tutor |
US5625748A (en) * | 1994-04-18 | 1997-04-29 | Bbn Corporation | Topic discriminator using posterior probability or confidence scores |
US5675819A (en) * | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US5553119A (en) * | 1994-07-07 | 1996-09-03 | Bell Atlantic Network Services, Inc. | Intelligent recognition of speech signals using caller demographics |
US5652789A (en) * | 1994-09-30 | 1997-07-29 | Wildfire Communications, Inc. | Network based knowledgeable assistant |
US5758322A (en) * | 1994-12-09 | 1998-05-26 | International Voice Register, Inc. | Method and apparatus for conducting point-of-sale transactions using voice recognition |
US5774859A (en) * | 1995-01-03 | 1998-06-30 | Scientific-Atlanta, Inc. | Information system having a speech interface |
US5737485A (en) * | 1995-03-07 | 1998-04-07 | Rutgers The State University Of New Jersey | Method and apparatus including microphone arrays and neural networks for speech/speaker recognition systems |
US5838683A (en) * | 1995-03-13 | 1998-11-17 | Selsius Systems Inc. | Distributed interactive multimedia system architecture |
US5680511A (en) * | 1995-06-07 | 1997-10-21 | Dragon Systems, Inc. | Systems and methods for word recognition |
US5680628A (en) * | 1995-07-19 | 1997-10-21 | Inso Corporation | Method and apparatus for automated search and retrieval process |
US5675788A (en) * | 1995-09-15 | 1997-10-07 | Infonautics Corp. | Method and apparatus for generating a composite document on a selected topic from a plurality of information sources |
US5794193A (en) * | 1995-09-15 | 1998-08-11 | Lucent Technologies Inc. | Automated phrase generation |
US5675707A (en) * | 1995-09-15 | 1997-10-07 | At&T | Automated call router system and method |
US5774841A (en) * | 1995-09-20 | 1998-06-30 | The United States Of America As Represented By The Adminstrator Of The National Aeronautics And Space Administration | Real-time reconfigurable adaptive speech recognition command and control apparatus and method |
US5802526A (en) * | 1995-11-15 | 1998-09-01 | Microsoft Corporation | System and method for graphically displaying and navigating through an interactive voice response menu |
US5826227A (en) * | 1995-12-18 | 1998-10-20 | Lucent Technologies Inc. | Hiding a source identifier within a signal |
US5913193A (en) * | 1996-04-30 | 1999-06-15 | Microsoft Corporation | Method and system of runtime acoustic unit selection for speech synthesis |
US5730603A (en) * | 1996-05-16 | 1998-03-24 | Interactive Drama, Inc. | Audiovisual simulation system and method with dynamic intelligent prompts |
US5765130A (en) * | 1996-05-21 | 1998-06-09 | Applied Language Technologies, Inc. | Method and apparatus for facilitating speech barge-in in connection with voice recognition systems |
US5727950A (en) * | 1996-05-22 | 1998-03-17 | Netsage Corporation | Agent based instruction system and method |
US5819220A (en) * | 1996-09-30 | 1998-10-06 | Hewlett-Packard Company | Web triggered word set boosting for speech interfaces to the world wide web |
US5797123A (en) * | 1996-10-01 | 1998-08-18 | Lucent Technologies Inc. | Method of key-phase detection and verification for flexible speech understanding |
US5836771A (en) * | 1996-12-02 | 1998-11-17 | Ho; Chi Fai | Learning method and system based on questioning |
US6345245B1 (en) * | 1997-03-06 | 2002-02-05 | Kabushiki Kaisha Toshiba | Method and system for managing a common dictionary and updating dictionary data selectively according to a type of local processing system |
US6246977B1 (en) * | 1997-03-07 | 2001-06-12 | Microsoft Corporation | Information retrieval utilizing semantic representation of text and based on constrained expansion of query words |
US6078886A (en) * | 1997-04-14 | 2000-06-20 | At&T Corporation | System and method for providing remote automatic speech recognition services via a packet network |
US6173260B1 (en) * | 1997-10-29 | 2001-01-09 | Interval Research Corporation | System and method for automatic classification of speech based upon affective content |
US6311182B1 (en) * | 1997-11-17 | 2001-10-30 | Genuity Inc. | Voice activated web browser |
US5987415A (en) * | 1998-03-23 | 1999-11-16 | Microsoft Corporation | Modeling a user's emotion and personality in a computer user interface |
US6085160A (en) * | 1998-07-10 | 2000-07-04 | Lernout & Hauspie Speech Products N.V. | Language independent speech recognition |
US6587822B2 (en) * | 1998-10-06 | 2003-07-01 | Lucent Technologies Inc. | Web-based platform for interactive voice response (IVR) |
US6246981B1 (en) * | 1998-11-25 | 2001-06-12 | International Business Machines Corporation | Natural language task-oriented dialog manager and method |
US20020095295A1 (en) * | 1998-12-01 | 2002-07-18 | Cohen Michael H. | Detection of characteristics of human-machine interactions for dialog customization and analysis |
US6851115B1 (en) * | 1999-01-05 | 2005-02-01 | Sri International | Software-based architecture for communication and cooperation among distributed electronic agents |
US6304864B1 (en) * | 1999-04-20 | 2001-10-16 | Textwise Llc | System for retrieving multimedia information from the internet using multiple evolving intelligent agents |
US6609089B1 (en) * | 1999-08-30 | 2003-08-19 | Lucent Technologies Inc. | Method and apparatus for providing interactive services with multiple interfaces |
US6745161B1 (en) * | 1999-09-17 | 2004-06-01 | Discern Communications, Inc. | System and method for incorporating concept-based retrieval within boolean search engines |
US6910003B1 (en) * | 1999-09-17 | 2005-06-21 | Discern Communications, Inc. | System, method and article of manufacture for concept based information searching |
US6879956B1 (en) * | 1999-09-30 | 2005-04-12 | Sony Corporation | Speech recognition with feedback from natural language processing for adaptation of acoustic models |
US7143042B1 (en) * | 1999-10-04 | 2006-11-28 | Nuance Communications | Tool for graphically defining dialog flows and for establishing operational links between speech applications and hypermedia content in an interactive voice response environment |
US6842767B1 (en) * | 1999-10-22 | 2005-01-11 | Tellme Networks, Inc. | Method and apparatus for content personalization over a telephone interface with adaptive personalization |
US6970915B1 (en) * | 1999-11-01 | 2005-11-29 | Tellme Networks, Inc. | Streaming content over a telephone interface |
US6615172B1 (en) * | 1999-11-12 | 2003-09-02 | Phoenix Solutions, Inc. | Intelligent query engine for processing voice based queries |
US6526382B1 (en) * | 1999-12-07 | 2003-02-25 | Comverse, Inc. | Language-oriented user interfaces for voice activated services |
US6496799B1 (en) * | 1999-12-22 | 2002-12-17 | International Business Machines Corporation | End-of-utterance determination for voice processing |
US6873953B1 (en) * | 2000-05-22 | 2005-03-29 | Nuance Communications | Prosody based endpoint detection |
US20030028383A1 (en) * | 2001-02-20 | 2003-02-06 | I & A Research Inc. | System for modeling and simulating emotion states |
US20020147581A1 (en) * | 2001-04-10 | 2002-10-10 | Sri International | Method and apparatus for performing prosody-based endpointing of a speech signal |
US6985852B2 (en) * | 2001-08-21 | 2006-01-10 | Microsoft Corporation | Method and apparatus for dynamic grammars and focused semantic parsing |
US20030069728A1 (en) * | 2001-10-05 | 2003-04-10 | Raquel Tato | Method for detecting emotions involving subspace specialists |
US6909453B2 (en) * | 2001-12-20 | 2005-06-21 | Matsushita Electric Industrial Co., Ltd. | Virtual television phone apparatus |
US20030179877A1 (en) * | 2002-03-21 | 2003-09-25 | Anthony Dezonno | Adaptive transaction guidance |
US7398209B2 (en) * | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7321854B2 (en) * | 2002-09-19 | 2008-01-22 | The Penn State Research Foundation | Prosody based audio/visual co-analysis for co-verbal gesture recognition |
US20040148172A1 (en) * | 2003-01-24 | 2004-07-29 | Voice Signal Technologies, Inc, | Prosodic mimic method and apparatus |
US20050060158A1 (en) * | 2003-09-12 | 2005-03-17 | Norikazu Endo | Method and system for adjusting the voice prompt of an interactive system based upon the user's state |
US20050102135A1 (en) * | 2003-11-12 | 2005-05-12 | Silke Goronzy | Apparatus and method for automatic extraction of important events in audio signals |
Cited By (628)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527861B2 (en) | 1999-08-13 | 2013-09-03 | Apple Inc. | Methods and apparatuses for display and traversing of links in page character array |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8345665B2 (en) | 2001-10-22 | 2013-01-01 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US8718047B2 (en) | 2001-10-22 | 2014-05-06 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US10623347B2 (en) | 2003-05-02 | 2020-04-14 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
US10348654B2 (en) | 2003-05-02 | 2019-07-09 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
US8458278B2 (en) | 2003-05-02 | 2013-06-04 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
US7475007B2 (en) * | 2004-02-20 | 2009-01-06 | International Business Machines Corporation | Expression extraction device, expression extraction method, and recording medium |
US20050187932A1 (en) * | 2004-02-20 | 2005-08-25 | International Business Machines Corporation | Expression extraction device, expression extraction method, and recording medium |
US7149987B2 (en) * | 2004-03-08 | 2006-12-12 | Synopsys, Inc. | Method and apparatus for performing generator-based verification |
US20050198597A1 (en) * | 2004-03-08 | 2005-09-08 | Yunshan Zhu | Method and apparatus for performing generator-based verification |
US20050261905A1 (en) * | 2004-05-21 | 2005-11-24 | Samsung Electronics Co., Ltd. | Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same |
US8234118B2 (en) * | 2004-05-21 | 2012-07-31 | Samsung Electronics Co., Ltd. | Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same |
US20060129403A1 (en) * | 2004-12-13 | 2006-06-15 | Delta Electronics, Inc. | Method and device for speech synthesizing and dialogue system thereof |
US7529665B2 (en) * | 2004-12-21 | 2009-05-05 | Electronics And Telecommunications Research Institute | Two stage utterance verification device and method thereof in speech recognition system |
US20060136207A1 (en) * | 2004-12-21 | 2006-06-22 | Electronics And Telecommunications Research Institute | Two stage utterance verification device and method thereof in speech recognition system |
US20070206017A1 (en) * | 2005-06-02 | 2007-09-06 | University Of Southern California | Mapping Attitudes to Movements Based on Cultural Norms |
US7778948B2 (en) | 2005-06-02 | 2010-08-17 | University Of Southern California | Mapping each of several communicative functions during contexts to multiple coordinated behaviors of a virtual character |
US7912720B1 (en) * | 2005-07-20 | 2011-03-22 | At&T Intellectual Property Ii, L.P. | System and method for building emotional machines |
US20070033050A1 (en) * | 2005-08-05 | 2007-02-08 | Yasuharu Asano | Information processing apparatus and method, and program |
US8407055B2 (en) * | 2005-08-05 | 2013-03-26 | Sony Corporation | Information processing apparatus and method for recognizing a user's emotion |
US20070055526A1 (en) * | 2005-08-25 | 2007-03-08 | International Business Machines Corporation | Method, apparatus and computer program product providing prosodic-categorical enhancement to phrase-spliced text-to-speech synthesis |
US9501741B2 (en) | 2005-09-08 | 2016-11-22 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US20070061139A1 (en) * | 2005-09-14 | 2007-03-15 | Delta Electronics, Inc. | Interactive speech correcting method |
US9619079B2 (en) | 2005-09-30 | 2017-04-11 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9389729B2 (en) | 2005-09-30 | 2016-07-12 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US8614431B2 (en) | 2005-09-30 | 2013-12-24 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9958987B2 (en) | 2005-09-30 | 2018-05-01 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US20080052080A1 (en) * | 2005-11-30 | 2008-02-28 | University Of Southern California | Emotion Recognition System |
US8209182B2 (en) * | 2005-11-30 | 2012-06-26 | University Of Southern California | Emotion recognition system |
US20070150281A1 (en) * | 2005-12-22 | 2007-06-28 | Hoff Todd M | Method and system for utilizing emotion to search content |
US8452668B1 (en) | 2006-03-02 | 2013-05-28 | Convergys Customer Management Delaware Llc | System for closed loop decisionmaking in an automated care system |
US8639517B2 (en) | 2006-03-03 | 2014-01-28 | At&T Intellectual Property Ii, L.P. | Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input |
US8204751B1 (en) * | 2006-03-03 | 2012-06-19 | At&T Intellectual Property Ii, L.P. | Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input |
US7983910B2 (en) * | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
US20070208569A1 (en) * | 2006-03-03 | 2007-09-06 | Balan Subramanian | Communicating across voice and text channels with emotion preservation |
US20070225975A1 (en) * | 2006-03-27 | 2007-09-27 | Kabushiki Kaisha Toshiba | Apparatus, method, and computer program product for processing voice in speech |
US7949523B2 (en) * | 2006-03-27 | 2011-05-24 | Kabushiki Kaisha Toshiba | Apparatus, method, and computer program product for processing voice in speech |
US9583107B2 (en) | 2006-04-05 | 2017-02-28 | Amazon Technologies, Inc. | Continuous speech transcription performance indication |
US9576571B2 (en) | 2006-05-18 | 2017-02-21 | Nuance Communications, Inc. | Method and apparatus for recognizing and reacting to user personality in accordance with speech recognition system |
US8719035B2 (en) * | 2006-05-18 | 2014-05-06 | Nuance Communications, Inc. | Method and apparatus for recognizing and reacting to user personality in accordance with speech recognition system |
US20070271098A1 (en) * | 2006-05-18 | 2007-11-22 | International Business Machines Corporation | Method and apparatus for recognizing and reacting to user personality in accordance with speech recognition system |
US8150692B2 (en) | 2006-05-18 | 2012-04-03 | Nuance Communications, Inc. | Method and apparatus for recognizing a user personality trait based on a number of compound words used by the user |
US20080177540A1 (en) * | 2006-05-18 | 2008-07-24 | International Business Machines Corporation | Method and Apparatus for Recognizing and Reacting to User Personality in Accordance with Speech Recognition System |
US8379830B1 (en) | 2006-05-22 | 2013-02-19 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US9549065B1 (en) | 2006-05-22 | 2017-01-17 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US7809663B1 (en) | 2006-05-22 | 2010-10-05 | Convergys Cmg Utah, Inc. | System and method for supporting the utilization of machine language |
US20070276659A1 (en) * | 2006-05-25 | 2007-11-29 | Keiichi Yamada | Apparatus and method for identifying prosody and apparatus and method for recognizing speech |
US7908142B2 (en) * | 2006-05-25 | 2011-03-15 | Sony Corporation | Apparatus and method for identifying prosody and apparatus and method for recognizing speech |
US20090313019A1 (en) * | 2006-06-23 | 2009-12-17 | Yumiko Kato | Emotion recognition apparatus |
US8204747B2 (en) * | 2006-06-23 | 2012-06-19 | Panasonic Corporation | Emotion recognition apparatus |
US20080033994A1 (en) * | 2006-08-07 | 2008-02-07 | Mci, Llc | Interactive voice controlled project management system |
US8296147B2 (en) * | 2006-08-07 | 2012-10-23 | Verizon Patent And Licensing Inc. | Interactive voice controlled project management system |
US7783114B2 (en) * | 2006-08-22 | 2010-08-24 | Intel Corporation | Training and using classification components on multiple processing units |
US20080050014A1 (en) * | 2006-08-22 | 2008-02-28 | Gary Bradski | Training and using classification components on multiple processing units |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
WO2008033095A1 (en) * | 2006-09-15 | 2008-03-20 | Agency For Science, Technology And Research | Apparatus and method for speech utterance verification |
US20100004931A1 (en) * | 2006-09-15 | 2010-01-07 | Bin Ma | Apparatus and method for speech utterance verification |
US20100100218A1 (en) * | 2006-10-09 | 2010-04-22 | Siemens Aktiengesellschaft | Method for Controlling and/or Regulating an Industrial Process |
US8391998B2 (en) * | 2006-10-09 | 2013-03-05 | Siemens Aktiengesellschaft | Method for controlling and/or regulating an industrial process |
US8793301B2 (en) * | 2006-11-22 | 2014-07-29 | Agfa Healthcare | Method and system for dynamic image processing |
US20090138544A1 (en) * | 2006-11-22 | 2009-05-28 | Rainer Wegenkittl | Method and System for Dynamic Image Processing |
US8538755B2 (en) * | 2007-01-31 | 2013-09-17 | Telecom Italia S.P.A. | Customizable method and system for emotional recognition |
US20100030714A1 (en) * | 2007-01-31 | 2010-02-04 | Gianmario Bollano | Method and system to improve automated emotional recognition |
WO2008092473A1 (en) * | 2007-01-31 | 2008-08-07 | Telecom Italia S.P.A. | Customizable method and system for emotional recognition |
US20100088088A1 (en) * | 2007-01-31 | 2010-04-08 | Gianmario Bollano | Customizable method and system for emotional recognition |
US10133733B2 (en) | 2007-03-06 | 2018-11-20 | Botanic Technologies, Inc. | Systems and methods for an autonomous avatar driver |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8583569B2 (en) * | 2007-04-19 | 2013-11-12 | Microsoft Corporation | Field-programmable gate array based accelerator system |
US20080270133A1 (en) * | 2007-04-24 | 2008-10-30 | Microsoft Corporation | Speech model refinement with transcription error detection |
US7860716B2 (en) * | 2007-04-24 | 2010-12-28 | Microsoft Corporation | Speech model refinement with transcription error detection |
US20090055824A1 (en) * | 2007-04-26 | 2009-02-26 | Ford Global Technologies, Llc | Task initiator and method for initiating tasks for a vehicle information system |
US20090063154A1 (en) * | 2007-04-26 | 2009-03-05 | Ford Global Technologies, Llc | Emotive text-to-speech system and method |
US20090064155A1 (en) * | 2007-04-26 | 2009-03-05 | Ford Global Technologies, Llc | Task manager and method for managing tasks of an information system |
US9811935B2 (en) * | 2007-04-26 | 2017-11-07 | Ford Global Technologies, Llc | Emotive advisory system and method |
US8812171B2 (en) | 2007-04-26 | 2014-08-19 | Ford Global Technologies, Llc | Emotive engine and method for generating a simulated emotion for an information system |
US9495787B2 (en) | 2007-04-26 | 2016-11-15 | Ford Global Technologies, Llc | Emotive text-to-speech system and method |
US20090055190A1 (en) * | 2007-04-26 | 2009-02-26 | Ford Global Technologies, Llc | Emotive engine and method for generating a simulated emotion for an information system |
US9292952B2 (en) | 2007-04-26 | 2016-03-22 | Ford Global Technologies, Llc | Task manager and method for managing tasks of an information system |
US9189879B2 (en) | 2007-04-26 | 2015-11-17 | Ford Global Technologies, Llc | Emotive engine and method for generating a simulated emotion for an information system |
US20080269958A1 (en) * | 2007-04-26 | 2008-10-30 | Ford Global Technologies, Llc | Emotive advisory system and method |
US20090004633A1 (en) * | 2007-06-29 | 2009-01-01 | Alelo, Inc. | Interactive language pronunciation teaching |
WO2009006433A1 (en) * | 2007-06-29 | 2009-01-08 | Alelo, Inc. | Interactive language pronunciation teaching |
US8005197B2 (en) | 2007-06-29 | 2011-08-23 | Avaya Inc. | Methods and apparatus for defending against telephone-based robotic attacks using contextual-based degradation |
US7978831B2 (en) | 2007-06-29 | 2011-07-12 | Avaya Inc. | Methods and apparatus for defending against telephone-based robotic attacks using random personal codes |
US20090003549A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Permutation of an IVR Menu |
US20090003539A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Random Personal Codes |
US8005198B2 (en) * | 2007-06-29 | 2011-08-23 | Avaya Inc. | Methods and apparatus for defending against telephone-based robotic attacks using permutation of an IVR menu |
US20090003548A1 (en) * | 2007-06-29 | 2009-01-01 | Henry Baird | Methods and Apparatus for Defending Against Telephone-Based Robotic Attacks Using Contextual-Based Degradation |
US9924906B2 (en) | 2007-07-12 | 2018-03-27 | University Of Florida Research Foundation, Inc. | Random body movement cancellation for non-contact vital sign detection |
US8543396B2 (en) | 2007-08-22 | 2013-09-24 | Canyon Ip Holdings Llc | Continuous speech transcription performance indication |
US20090055175A1 (en) * | 2007-08-22 | 2009-02-26 | Terrell Ii James Richard | Continuous speech transcription performance indication |
US8510109B2 (en) | 2007-08-22 | 2013-08-13 | Canyon Ip Holdings Llc | Continuous speech transcription performance indication |
US8868420B1 (en) | 2007-08-22 | 2014-10-21 | Canyon Ip Holdings Llc | Continuous speech transcription performance indication |
US9973450B2 (en) | 2007-09-17 | 2018-05-15 | Amazon Technologies, Inc. | Methods and systems for dynamically updating web service profile information by parsing transcribed message strings |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US11599332B1 (en) | 2007-10-04 | 2023-03-07 | Great Northern Research, LLC | Multiple shell multi faceted graphical user interface |
US8543407B1 (en) | 2007-10-04 | 2013-09-24 | Great Northern Research, LLC | Speech interface system and method for control and interaction with applications on a computing system |
US9582805B2 (en) | 2007-10-24 | 2017-02-28 | Invention Science Fund I, Llc | Returning a personalized advertisement |
US20090113298A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Method of selecting a second content based on a user's reaction to a first content |
US9513699B2 (en) | 2007-10-24 | 2016-12-06 | Invention Science Fund I, LL | Method of selecting a second content based on a user's reaction to a first content |
US20090112696A1 (en) * | 2007-10-24 | 2009-04-30 | Jung Edward K Y | Method of space-available advertising in a mobile device |
US20090112656A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Returning a personalized advertisement |
US20090113297A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Requesting a second content based on a user's reaction to a first content |
US20090112693A1 (en) * | 2007-10-24 | 2009-04-30 | Jung Edward K Y | Providing personalized advertising |
US20090112713A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Opportunity advertising in a mobile device |
US20090112694A1 (en) * | 2007-10-24 | 2009-04-30 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Targeted-advertising based on a sensed physiological response by a person to a general advertisement |
US8364694B2 (en) | 2007-10-26 | 2013-01-29 | Apple Inc. | Search assistant for digital media assets |
US9305101B2 (en) | 2007-10-26 | 2016-04-05 | Apple Inc. | Search assistant for digital media assets |
US8943089B2 (en) | 2007-10-26 | 2015-01-27 | Apple Inc. | Search assistant for digital media assets |
US8639716B2 (en) | 2007-10-26 | 2014-01-28 | Apple Inc. | Search assistant for digital media assets |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US20090177300A1 (en) * | 2008-01-03 | 2009-07-09 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) * | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330381B2 (en) | 2008-01-06 | 2016-05-03 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US10503366B2 (en) | 2008-01-06 | 2019-12-10 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US11126326B2 (en) | 2008-01-06 | 2021-09-21 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US8407173B2 (en) * | 2008-01-30 | 2013-03-26 | Aptima, Inc. | System and method for comparing system features |
US20090192964A1 (en) * | 2008-01-30 | 2009-07-30 | Aptima, Inc. | System and method for comparing system features |
US9361886B2 (en) | 2008-02-22 | 2016-06-07 | Apple Inc. | Providing text input using speech data and non-speech data |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US20090222305A1 (en) * | 2008-03-03 | 2009-09-03 | Berg Jr Charles John | Shopper Communication with Scaled Emotional State |
USRE46139E1 (en) | 2008-03-04 | 2016-09-06 | Apple Inc. | Language input interface on a device |
US8289283B2 (en) | 2008-03-04 | 2012-10-16 | Apple Inc. | Language input interface on a device |
US9491256B2 (en) * | 2008-03-05 | 2016-11-08 | Sony Corporation | Method and device for personalizing a multimedia application |
US20090228796A1 (en) * | 2008-03-05 | 2009-09-10 | Sony Corporation | Method and device for personalizing a multimedia application |
US20170031647A1 (en) * | 2008-03-05 | 2017-02-02 | Sony Corporation | Method and device for personalizing a multimedia application |
US8099372B2 (en) * | 2008-03-25 | 2012-01-17 | Electronics And Telecommunications Research Institute | Method of modeling composite emotion in multidimensional vector space |
US20090248372A1 (en) * | 2008-03-25 | 2009-10-01 | Electronics And Telecommunications Research Institute | Method of modeling composite emotion in multidimensional vector space |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US8768925B2 (en) | 2008-05-14 | 2014-07-01 | International Business Machines Corporation | System and method for providing answers to questions |
US20090287678A1 (en) * | 2008-05-14 | 2009-11-19 | International Business Machines Corporation | System and method for providing answers to questions |
US8275803B2 (en) | 2008-05-14 | 2012-09-25 | International Business Machines Corporation | System and method for providing answers to questions |
US9703861B2 (en) | 2008-05-14 | 2017-07-11 | International Business Machines Corporation | System and method for providing answers to questions |
US8332394B2 (en) | 2008-05-23 | 2012-12-11 | International Business Machines Corporation | System and method for providing question and answers with deferred type evaluation |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US8219397B2 (en) * | 2008-06-10 | 2012-07-10 | Nuance Communications, Inc. | Data processing system for autonomously building speech identification and tagging data |
US20090306979A1 (en) * | 2008-06-10 | 2009-12-10 | Peeyush Jaiswal | Data processing system for autonomously building speech identification and tagging data |
US9294814B2 (en) | 2008-06-12 | 2016-03-22 | International Business Machines Corporation | Simulation method and system |
US8493410B2 (en) | 2008-06-12 | 2013-07-23 | International Business Machines Corporation | Simulation method and system |
US9524734B2 (en) | 2008-06-12 | 2016-12-20 | International Business Machines Corporation | Simulation |
US20120239393A1 (en) * | 2008-06-13 | 2012-09-20 | International Business Machines Corporation | Multiple audio/video data stream simulation |
US8644550B2 (en) | 2008-06-13 | 2014-02-04 | International Business Machines Corporation | Multiple audio/video data stream simulation |
US8392195B2 (en) * | 2008-06-13 | 2013-03-05 | International Business Machines Corporation | Multiple audio/video data stream simulation |
US8682666B2 (en) | 2008-06-17 | 2014-03-25 | Voicesense Ltd. | Speaker characterization through speech analysis |
US20090313018A1 (en) * | 2008-06-17 | 2009-12-17 | Yoav Degani | Speaker Characterization Through Speech Analysis |
US8195460B2 (en) * | 2008-06-17 | 2012-06-05 | Voicesense Ltd. | Speaker characterization through speech analysis |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US20130085760A1 (en) * | 2008-08-12 | 2013-04-04 | Morphism Llc | Training and applying prosody models |
US8856008B2 (en) * | 2008-08-12 | 2014-10-07 | Morphism Llc | Training and applying prosody models |
US20150012277A1 (en) * | 2008-08-12 | 2015-01-08 | Morphism Llc | Training and Applying Prosody Models |
US8554566B2 (en) * | 2008-08-12 | 2013-10-08 | Morphism Llc | Training and applying prosody models |
US9070365B2 (en) * | 2008-08-12 | 2015-06-30 | Morphism Llc | Training and applying prosody models |
US8301454B2 (en) * | 2008-08-22 | 2012-10-30 | Canyon Ip Holdings Llc | Methods, apparatuses, and systems for providing timely user cues pertaining to speech recognition |
US20100049525A1 (en) * | 2008-08-22 | 2010-02-25 | Yap, Inc. | Methods, apparatuses, and systems for providing timely user cues pertaining to speech recognition |
US9691383B2 (en) | 2008-09-05 | 2017-06-27 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8352272B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for text to speech synthesis |
US8352268B2 (en) | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for selective rate of speech and speech preferences for text to speech synthesis |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8396714B2 (en) | 2008-09-29 | 2013-03-12 | Apple Inc. | Systems and methods for concatenation of words in text to speech synthesis |
US8355919B2 (en) | 2008-09-29 | 2013-01-15 | Apple Inc. | Systems and methods for text normalization for text to speech synthesis |
US20100082329A1 (en) * | 2008-09-29 | 2010-04-01 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8713119B2 (en) | 2008-10-02 | 2014-04-29 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8762469B2 (en) | 2008-10-02 | 2014-06-24 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8296383B2 (en) | 2008-10-02 | 2012-10-23 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US20100114556A1 (en) * | 2008-10-31 | 2010-05-06 | International Business Machines Corporation | Speech translation method and apparatus |
US9342509B2 (en) * | 2008-10-31 | 2016-05-17 | Nuance Communications, Inc. | Speech translation method and apparatus utilizing prosodic information |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US8751238B2 (en) | 2009-03-09 | 2014-06-10 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US20130103398A1 (en) * | 2009-08-04 | 2013-04-25 | Nokia Corporation | Method and Apparatus for Audio Signal Classification |
US9215538B2 (en) * | 2009-08-04 | 2015-12-15 | Nokia Technologies Oy | Method and apparatus for audio signal classification |
US20120173464A1 (en) * | 2009-09-02 | 2012-07-05 | Gokhan Tur | Method and apparatus for exploiting human feedback in an intelligent automated assistant |
US10366336B2 (en) * | 2009-09-02 | 2019-07-30 | Sri International | Method and apparatus for exploiting human feedback in an intelligent automated assistant |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US20110110534A1 (en) * | 2009-11-12 | 2011-05-12 | Apple Inc. | Adjustable voice output based on device status |
US20110125734A1 (en) * | 2009-11-23 | 2011-05-26 | International Business Machines Corporation | Questions and answers generation |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8775186B2 (en) * | 2010-01-08 | 2014-07-08 | Electronics And Telecommnications Research Institute | Method for emotion communication between emotion signal sensing device and emotion service providing device |
US20110172992A1 (en) * | 2010-01-08 | 2011-07-14 | Electronics And Telecommunications Research Institute | Method for emotion communication between emotion signal sensing device and emotion service providing device |
US8670985B2 (en) | 2010-01-13 | 2014-03-11 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US8706503B2 (en) | 2010-01-18 | 2014-04-22 | Apple Inc. | Intent deduction based on previous user interactions with voice assistant |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8670979B2 (en) | 2010-01-18 | 2014-03-11 | Apple Inc. | Active input elicitation by intelligent automated assistant |
US8799000B2 (en) | 2010-01-18 | 2014-08-05 | Apple Inc. | Disambiguation based on active input elicitation by intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US8731942B2 (en) | 2010-01-18 | 2014-05-20 | Apple Inc. | Maintaining context information between user interactions with a voice assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8660849B2 (en) | 2010-01-18 | 2014-02-25 | Apple Inc. | Prioritizing selection criteria by automated assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US8412530B2 (en) * | 2010-02-21 | 2013-04-02 | Nice Systems Ltd. | Method and apparatus for detection of sentiment in automated transcriptions |
US20110208522A1 (en) * | 2010-02-21 | 2011-08-25 | Nice Systems Ltd. | Method and apparatus for detection of sentiment in automated transcriptions |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US20110282666A1 (en) * | 2010-04-22 | 2011-11-17 | Fujitsu Limited | Utterance state detection device and utterance state detection method |
US9099088B2 (en) * | 2010-04-22 | 2015-08-04 | Fujitsu Limited | Utterance state detection device and utterance state detection method |
US11367435B2 (en) | 2010-05-13 | 2022-06-21 | Poltorak Technologies Llc | Electronic personal interactive device |
US11341962B2 (en) | 2010-05-13 | 2022-05-24 | Poltorak Technologies Llc | Electronic personal interactive device |
US8639516B2 (en) | 2010-06-04 | 2014-01-28 | Apple Inc. | User-specific noise suppression for voice quality improvements |
US10446167B2 (en) | 2010-06-04 | 2019-10-15 | Apple Inc. | User-specific noise suppression for voice quality improvements |
US20110307423A1 (en) * | 2010-06-09 | 2011-12-15 | Microsoft Corporation | Distributed decision tree training |
US8543517B2 (en) * | 2010-06-09 | 2013-09-24 | Microsoft Corporation | Distributed decision tree training |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US9104670B2 (en) | 2010-07-21 | 2015-08-11 | Apple Inc. | Customized search or acquisition of digital media assets |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US10002608B2 (en) * | 2010-09-17 | 2018-06-19 | Nuance Communications, Inc. | System and method for using prosody for voice-enabled search |
US20120072217A1 (en) * | 2010-09-17 | 2012-03-22 | At&T Intellectual Property I, L.P | System and method for using prosody for voice-enabled search |
US8892550B2 (en) | 2010-09-24 | 2014-11-18 | International Business Machines Corporation | Source expansion for information retrieval and information extraction |
US9798800B2 (en) | 2010-09-24 | 2017-10-24 | International Business Machines Corporation | Providing question and answers with deferred type evaluation using text with limited structure |
US9495481B2 (en) | 2010-09-24 | 2016-11-15 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9830381B2 (en) | 2010-09-24 | 2017-11-28 | International Business Machines Corporation | Scoring candidates using structural information in semi-structured documents for question answering systems |
US8600986B2 (en) | 2010-09-24 | 2013-12-03 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
US11144544B2 (en) | 2010-09-24 | 2021-10-12 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9864818B2 (en) | 2010-09-24 | 2018-01-09 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9508038B2 (en) | 2010-09-24 | 2016-11-29 | International Business Machines Corporation | Using ontological information in open domain type coercion |
US10482115B2 (en) | 2010-09-24 | 2019-11-19 | International Business Machines Corporation | Providing question and answers with deferred type evaluation using text with limited structure |
US10223441B2 (en) | 2010-09-24 | 2019-03-05 | International Business Machines Corporation | Scoring candidates using structural information in semi-structured documents for question answering systems |
US9569724B2 (en) | 2010-09-24 | 2017-02-14 | International Business Machines Corporation | Using ontological information in open domain type coercion |
US10318529B2 (en) | 2010-09-24 | 2019-06-11 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9965509B2 (en) | 2010-09-24 | 2018-05-08 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9600601B2 (en) | 2010-09-24 | 2017-03-21 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US8510296B2 (en) | 2010-09-24 | 2013-08-13 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
US8943051B2 (en) | 2010-09-24 | 2015-01-27 | International Business Machines Corporation | Lexical answer type confidence estimation and application |
US10331663B2 (en) | 2010-09-24 | 2019-06-25 | International Business Machines Corporation | Providing answers to questions including assembling answers from multiple document segments |
US9075783B2 (en) | 2010-09-27 | 2015-07-07 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US9110944B2 (en) | 2010-09-28 | 2015-08-18 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US9323831B2 (en) | 2010-09-28 | 2016-04-26 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US9348893B2 (en) | 2010-09-28 | 2016-05-24 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US11409751B2 (en) | 2010-09-28 | 2022-08-09 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US9037580B2 (en) | 2010-09-28 | 2015-05-19 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US8738617B2 (en) | 2010-09-28 | 2014-05-27 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US8898159B2 (en) | 2010-09-28 | 2014-11-25 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US10823265B2 (en) | 2010-09-28 | 2020-11-03 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US9990419B2 (en) | 2010-09-28 | 2018-06-05 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US10133808B2 (en) | 2010-09-28 | 2018-11-20 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US9507854B2 (en) | 2010-09-28 | 2016-11-29 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US10216804B2 (en) | 2010-09-28 | 2019-02-26 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US8819007B2 (en) | 2010-09-28 | 2014-08-26 | International Business Machines Corporation | Providing answers to questions using multiple models to score candidate answers |
US10902038B2 (en) | 2010-09-28 | 2021-01-26 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US9852213B2 (en) | 2010-09-28 | 2017-12-26 | International Business Machines Corporation | Providing answers to questions using logical synthesis of candidate answers |
US9317586B2 (en) | 2010-09-28 | 2016-04-19 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
WO2012058691A1 (en) * | 2010-10-31 | 2012-05-03 | Speech Morphing Systems, Inc. | Speech morphing communication system |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US8706493B2 (en) * | 2010-12-22 | 2014-04-22 | Industrial Technology Research Institute | Controllable prosody re-estimation system and method and computer program product thereof |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US20120166198A1 (en) * | 2010-12-22 | 2012-06-28 | Industrial Technology Research Institute | Controllable prosody re-estimation system and method and computer program product thereof |
US20140025385A1 (en) * | 2010-12-30 | 2014-01-23 | Nokia Corporation | Method, Apparatus and Computer Program Product for Emotion Detection |
US10147419B2 (en) | 2011-01-05 | 2018-12-04 | Interactions Llc | Automated recognition system for natural language understanding |
US9472185B1 (en) * | 2011-01-05 | 2016-10-18 | Interactions Llc | Automated recognition system for natural language understanding |
US10810997B2 (en) | 2011-01-05 | 2020-10-20 | Interactions Llc | Automated recognition system for natural language understanding |
US9741347B2 (en) | 2011-01-05 | 2017-08-22 | Interactions Llc | Automated speech recognition proxy system for natural language understanding |
US10049676B2 (en) | 2011-01-05 | 2018-08-14 | Interactions Llc | Automated speech recognition proxy system for natural language understanding |
US20130311185A1 (en) * | 2011-02-15 | 2013-11-21 | Nokia Corporation | Method apparatus and computer program product for prosodic tagging |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US20120243694A1 (en) * | 2011-03-21 | 2012-09-27 | The Intellisis Corporation | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information |
US8849663B2 (en) * | 2011-03-21 | 2014-09-30 | The Intellisis Corporation | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9601119B2 (en) | 2011-03-21 | 2017-03-21 | Knuedge Incorporated | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information |
US9142220B2 (en) | 2011-03-25 | 2015-09-22 | The Intellisis Corporation | Systems and methods for reconstructing an audio signal from transformed audio information |
US9177561B2 (en) | 2011-03-25 | 2015-11-03 | The Intellisis Corporation | Systems and methods for reconstructing an audio signal from transformed audio information |
US9620130B2 (en) | 2011-03-25 | 2017-04-11 | Knuedge Incorporated | System and method for processing sound signals implementing a spectral motion transform |
US8767978B2 (en) | 2011-03-25 | 2014-07-01 | The Intellisis Corporation | System and method for processing sound signals implementing a spectral motion transform |
US9177560B2 (en) | 2011-03-25 | 2015-11-03 | The Intellisis Corporation | Systems and methods for reconstructing an audio signal from transformed audio information |
US9002704B2 (en) * | 2011-03-31 | 2015-04-07 | Fujitsu Limited | Speaker state detecting apparatus and speaker state detecting method |
US20120253807A1 (en) * | 2011-03-31 | 2012-10-04 | Fujitsu Limited | Speaker state detecting apparatus and speaker state detecting method |
WO2012151786A1 (en) * | 2011-05-11 | 2012-11-15 | 北京航空航天大学 | Chinese voice emotion extraction and modeling method combining emotion points |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US9195641B1 (en) * | 2011-07-01 | 2015-11-24 | West Corporation | Method and apparatus of processing user text input information |
US9449275B2 (en) | 2011-07-12 | 2016-09-20 | Siemens Aktiengesellschaft | Actuation of a technical system based on solutions of relaxed abduction |
US20130030812A1 (en) * | 2011-07-29 | 2013-01-31 | Hyun-Jun Kim | Apparatus and method for generating emotion information, and function recommendation apparatus based on emotion information |
US9311680B2 (en) * | 2011-07-29 | 2016-04-12 | Samsung Electronis Co., Ltd. | Apparatus and method for generating emotion information, and function recommendation apparatus based on emotion information |
US8825584B1 (en) | 2011-08-04 | 2014-09-02 | Smart Information Flow Technologies LLC | Systems and methods for determining social regard scores |
US10217049B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technologies, LLC | Systems and methods for determining social perception |
US9053421B2 (en) | 2011-08-04 | 2015-06-09 | Smart Information Flow Technologies LLC | Systems and methods for determining social perception scores |
US10217051B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technologies, LLC | Systems and methods for determining social perception |
US10217050B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technolgies, Llc | Systems and methods for determining social perception |
US9183850B2 (en) | 2011-08-08 | 2015-11-10 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
US9473866B2 (en) | 2011-08-08 | 2016-10-18 | Knuedge Incorporated | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US9485597B2 (en) | 2011-08-08 | 2016-11-01 | Knuedge Incorporated | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US8942981B2 (en) * | 2011-10-28 | 2015-01-27 | Cellco Partnership | Natural language call router |
US20130110510A1 (en) * | 2011-10-28 | 2013-05-02 | Cellco Partnership D/B/A Verizon Wireless | Natural language call router |
US10956009B2 (en) | 2011-12-15 | 2021-03-23 | L'oreal | Method and system for interactive cosmetic enhancements interface |
EP2812897A4 (en) * | 2012-02-10 | 2015-12-30 | Intel Corp | Perceptual computing with conversational agent |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9202466B2 (en) | 2012-03-29 | 2015-12-01 | Honda Research Institute Europe Gmbh | Spoken dialog system using prominence |
EP2645364A1 (en) | 2012-03-29 | 2013-10-02 | Honda Research Institute Europe GmbH | Spoken dialog system using prominence |
US20130304686A1 (en) * | 2012-05-09 | 2013-11-14 | Yahoo! Inc. | Methods and systems for personalizing user experience based on attitude prediction |
US9092757B2 (en) * | 2012-05-09 | 2015-07-28 | Yahoo! Inc. | Methods and systems for personalizing user experience based on attitude prediction |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9263060B2 (en) * | 2012-08-21 | 2016-02-16 | Marian Mason Publishing Company, Llc | Artificial neural network based system for classification of the emotional content of digital music |
US20140058735A1 (en) * | 2012-08-21 | 2014-02-27 | David A. Sharp | Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music |
US9443515B1 (en) * | 2012-09-05 | 2016-09-13 | Paul G. Boyce | Personality designer system for a detachably attachable remote audio object |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US10614725B2 (en) | 2012-09-11 | 2020-04-07 | International Business Machines Corporation | Generating secondary questions in an introspective question answering system |
US10621880B2 (en) | 2012-09-11 | 2020-04-14 | International Business Machines Corporation | Generating secondary questions in an introspective question answering system |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US20140136196A1 (en) * | 2012-11-09 | 2014-05-15 | Institute For Information Industry | System and method for posting message by audio signal |
US20150254061A1 (en) * | 2012-11-28 | 2015-09-10 | OOO "Speaktoit" | Method for user training of information dialogue system |
US9946511B2 (en) * | 2012-11-28 | 2018-04-17 | Google Llc | Method for user training of information dialogue system |
US10489112B1 (en) | 2012-11-28 | 2019-11-26 | Google Llc | Method for user training of information dialogue system |
US10503470B2 (en) | 2012-11-28 | 2019-12-10 | Google Llc | Method for user training of information dialogue system |
US20140188876A1 (en) * | 2012-12-28 | 2014-07-03 | Sony Corporation | Information processing device, information processing method and computer program |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
WO2015123332A1 (en) * | 2013-02-12 | 2015-08-20 | Begel Daniel | Method and system to identify human characteristics using speech acoustics |
US20140244249A1 (en) * | 2013-02-28 | 2014-08-28 | International Business Machines Corporation | System and Method for Identification of Intent Segment(s) in Caller-Agent Conversations |
US10354677B2 (en) * | 2013-02-28 | 2019-07-16 | Nuance Communications, Inc. | System and method for identification of intent segment(s) in caller-agent conversations |
US10074384B2 (en) * | 2013-03-04 | 2018-09-11 | Fujitsu Limited | State estimating apparatus, state estimating method, and state estimating computer program |
US20140249823A1 (en) * | 2013-03-04 | 2014-09-04 | Fujitsu Limited | State estimating apparatus, state estimating method, and state estimating computer program |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
CN104078045A (en) * | 2013-03-26 | 2014-10-01 | 联想(北京)有限公司 | Identifying method and electronic device |
US20140297551A1 (en) * | 2013-04-02 | 2014-10-02 | Hireiq Solutions, Inc. | System and Method of Evaluating a Candidate Fit for a Hiring Decision |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9472207B2 (en) | 2013-06-20 | 2016-10-18 | Suhas Gondi | Portable assistive device for combating autism spectrum disorders |
US20150006170A1 (en) * | 2013-06-28 | 2015-01-01 | International Business Machines Corporation | Real-Time Speech Analysis Method and System |
US11062726B2 (en) | 2013-06-28 | 2021-07-13 | International Business Machines Corporation | Real-time speech analysis method and system using speech recognition and comparison with standard pronunciation |
US10586556B2 (en) * | 2013-06-28 | 2020-03-10 | International Business Machines Corporation | Real-time speech analysis and method using speech recognition and comparison with standard pronunciation |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US20160343268A1 (en) * | 2013-09-11 | 2016-11-24 | Lincoln Global, Inc. | Learning management system for a real-time simulated virtual reality welding training environment |
US10198962B2 (en) * | 2013-09-11 | 2019-02-05 | Lincoln Global, Inc. | Learning management system for a real-time simulated virtual reality welding training environment |
US20160210985A1 (en) * | 2013-09-25 | 2016-07-21 | Intel Corporation | Improving natural language interactions using emotional modulation |
US9761249B2 (en) * | 2013-09-25 | 2017-09-12 | Intel Corporation | Improving natural language interactions using emotional modulation |
US9779084B2 (en) | 2013-10-04 | 2017-10-03 | Mattersight Corporation | Online classroom analytics system and methods |
US9754587B2 (en) * | 2013-10-04 | 2017-09-05 | Nuance Communications, Inc. | System and method of using neural transforms of robust audio features for speech processing |
US10191901B2 (en) | 2013-10-04 | 2019-01-29 | Mattersight Corporation | Enrollment pairing analytics system and methods |
US20170358298A1 (en) * | 2013-10-04 | 2017-12-14 | Nuance Communications, Inc. | System and method of using neural transforms of robust audio features for speech processing |
US20160180843A1 (en) * | 2013-10-04 | 2016-06-23 | At&T Intellectual Property I, L.P. | System and method of using neural transforms of robust audio features for speech processing |
US20150100312A1 (en) * | 2013-10-04 | 2015-04-09 | At&T Intellectual Property I, L.P. | System and method of using neural transforms of robust audio features for speech processing |
US9280968B2 (en) * | 2013-10-04 | 2016-03-08 | At&T Intellectual Property I, L.P. | System and method of using neural transforms of robust audio features for speech processing |
US10096318B2 (en) * | 2013-10-04 | 2018-10-09 | Nuance Communications, Inc. | System and method of using neural transforms of robust audio features for speech processing |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
WO2015094617A1 (en) * | 2013-12-16 | 2015-06-25 | Sri International | Method and apparatus for classifying lexical stress |
US20150170644A1 (en) * | 2013-12-16 | 2015-06-18 | Sri International | Method and apparatus for classifying lexical stress |
US9928832B2 (en) * | 2013-12-16 | 2018-03-27 | Sri International | Method and apparatus for classifying lexical stress |
US9972341B2 (en) * | 2014-01-22 | 2018-05-15 | Samsung Electronics Co., Ltd. | Apparatus and method for emotion recognition |
US20150206543A1 (en) * | 2014-01-22 | 2015-07-23 | Samsung Electronics Co., Ltd. | Apparatus and method for emotion recognition |
US9549068B2 (en) * | 2014-01-28 | 2017-01-17 | Simple Emotion, Inc. | Methods for adaptive voice interaction |
US20150213800A1 (en) * | 2014-01-28 | 2015-07-30 | Simple Emotion, Inc. | Methods for adaptive voice interaction |
US9875445B2 (en) | 2014-02-25 | 2018-01-23 | Sri International | Dynamic hybrid models for multimodal analysis |
US20170228609A1 (en) * | 2014-05-09 | 2017-08-10 | Samsung Electronics Co., Ltd. | Liveness testing methods and apparatuses and image processing methods and apparatuses |
US10360465B2 (en) * | 2014-05-09 | 2019-07-23 | Samsung Electronics Co., Ltd. | Liveness testing methods and apparatuses and image processing methods and apparatuses |
US11151397B2 (en) * | 2014-05-09 | 2021-10-19 | Samsung Electronics Co., Ltd. | Liveness testing methods and apparatuses and image processing methods and apparatuses |
US20160328623A1 (en) * | 2014-05-09 | 2016-11-10 | Samsung Electronics Co., Ltd. | Liveness testing methods and apparatuses and image processing methods and apparatuses |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
WO2015184196A3 (en) * | 2014-05-28 | 2016-03-17 | Aliphcom | Speech summary and action item generation |
US9508360B2 (en) * | 2014-05-28 | 2016-11-29 | International Business Machines Corporation | Semantic-free text analysis for identifying traits |
US20150348569A1 (en) * | 2014-05-28 | 2015-12-03 | International Business Machines Corporation | Semantic-free text analysis for identifying traits |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10748534B2 (en) | 2014-06-19 | 2020-08-18 | Mattersight Corporation | Personality-based chatbot and methods including non-text input |
US9390706B2 (en) * | 2014-06-19 | 2016-07-12 | Mattersight Corporation | Personality-based intelligent personal assistant system and methods |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
WO2016014597A3 (en) * | 2014-07-21 | 2016-03-24 | Feele, A Partnership By Operation Of Law | Translating emotions into electronic representations |
CN104167208A (en) * | 2014-08-08 | 2014-11-26 | 中国科学院深圳先进技术研究院 | Speaker recognition method and device |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US11051702B2 (en) | 2014-10-08 | 2021-07-06 | University Of Florida Research Foundation, Inc. | Method and apparatus for non-contact fast vital sign acquisition based on radar signal |
US11622693B2 (en) | 2014-10-08 | 2023-04-11 | University Of Florida Research Foundation, Inc. | Method and apparatus for non-contact fast vital sign acquisition based on radar signal |
US9659564B2 (en) * | 2014-10-24 | 2017-05-23 | Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi | Speaker verification based on acoustic behavioral characteristics of the speaker |
US20160118050A1 (en) * | 2014-10-24 | 2016-04-28 | Sestek Ses Ve Iletisim Bilgisayar Teknolojileri Sanayi Ticaret Anonim Sirketi | Non-standard speech detection system and method |
US9678946B2 (en) | 2014-11-10 | 2017-06-13 | Oracle International Corporation | Automatic generation of N-grams and concept relations from linguistic input data |
US20160132482A1 (en) * | 2014-11-10 | 2016-05-12 | Oracle International Corporation | Automatic ontology generation for natural-language processing applications |
US9582493B2 (en) | 2014-11-10 | 2017-02-28 | Oracle International Corporation | Lemma mapping to universal ontologies in computer natural language processing |
US9842102B2 (en) * | 2014-11-10 | 2017-12-12 | Oracle International Corporation | Automatic ontology generation for natural-language processing applications |
US9741342B2 (en) * | 2014-11-26 | 2017-08-22 | Panasonic Intellectual Property Corporation Of America | Method and apparatus for recognizing speech by lip reading |
US20160148616A1 (en) * | 2014-11-26 | 2016-05-26 | Panasonic Intellectual Property Corporation Of America | Method and apparatus for recognizing speech by lip reading |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US20170344713A1 (en) * | 2014-12-12 | 2017-11-30 | Koninklijke Philips N.V. | Device, system and method for assessing information needs of a person |
US20160210963A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for determining ranking of dialogue sticker based on situation and preference information |
US9792903B2 (en) * | 2015-01-19 | 2017-10-17 | Ncsoft Corporation | Methods and systems for determining ranking of dialogue sticker based on situation and preference information |
US20160210279A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for analyzing communication situation based on emotion information |
US9792909B2 (en) * | 2015-01-19 | 2017-10-17 | Ncsoft Corporation | Methods and systems for recommending dialogue sticker based on similar situation detection |
US9792279B2 (en) * | 2015-01-19 | 2017-10-17 | Ncsoft Corporation | Methods and systems for analyzing communication situation based on emotion information |
US10430157B2 (en) * | 2015-01-19 | 2019-10-01 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing speech signal |
US20160210117A1 (en) * | 2015-01-19 | 2016-07-21 | Ncsoft Corporation | Methods and systems for recommending dialogue sticker based on similar situation detection |
US9922668B2 (en) | 2015-02-06 | 2018-03-20 | Knuedge Incorporated | Estimating fractional chirp rate with multiple frequency representations |
US9870785B2 (en) | 2015-02-06 | 2018-01-16 | Knuedge Incorporated | Determining features of harmonic signals |
US9842611B2 (en) | 2015-02-06 | 2017-12-12 | Knuedge Incorporated | Estimating pitch using peak-to-peak distances |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US10360903B2 (en) * | 2015-03-20 | 2019-07-23 | Kabushiki Kaisha Toshiba | Spoken language understanding apparatus, method, and program |
US9601104B2 (en) | 2015-03-27 | 2017-03-21 | International Business Machines Corporation | Imbuing artificial intelligence systems with idiomatic traits |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
WO2016183229A1 (en) * | 2015-05-11 | 2016-11-17 | Olsher Daniel Joseph | Universal task independent simulation and control platform for generating controlled actions using nuanced artificial intelligence |
US9833200B2 (en) | 2015-05-14 | 2017-12-05 | University Of Florida Research Foundation, Inc. | Low IF architectures for noncontact vital sign detection |
US10262061B2 (en) | 2015-05-19 | 2019-04-16 | Oracle International Corporation | Hierarchical data classification using frequency analysis |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10529328B2 (en) * | 2015-06-22 | 2020-01-07 | Carnegie Mellon University | Processing speech signals in voice-based profiling |
US20180190284A1 (en) * | 2015-06-22 | 2018-07-05 | Carnegie Mellon University | Processing speech signals in voice-based profiling |
US11538472B2 (en) | 2015-06-22 | 2022-12-27 | Carnegie Mellon University | Processing speech signals in voice-based profiling |
US9715498B2 (en) | 2015-08-31 | 2017-07-25 | Microsoft Technology Licensing, Llc | Distributed server system for language understanding |
US9965465B2 (en) | 2015-08-31 | 2018-05-08 | Microsoft Technology Licensing, Llc | Distributed server system for language understanding |
US20170060839A1 (en) * | 2015-09-01 | 2017-03-02 | Casio Computer Co., Ltd. | Dialogue control device, dialogue control method and non-transitory computer-readable information recording medium |
US9953078B2 (en) * | 2015-09-01 | 2018-04-24 | Casio Computer Co., Ltd. | Dialogue control device, dialogue control method and non-transitory computer-readable information recording medium |
US10379715B2 (en) * | 2015-09-08 | 2019-08-13 | Apple Inc. | Intelligent automated assistant in a media environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10956006B2 (en) | 2015-09-08 | 2021-03-23 | Apple Inc. | Intelligent automated assistant in a media environment |
US10331312B2 (en) | 2015-09-08 | 2019-06-25 | Apple Inc. | Intelligent automated assistant in a media environment |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US20220189479A1 (en) * | 2016-01-25 | 2022-06-16 | Sony Group Corporation | Communication system and communication control method |
US11295736B2 (en) * | 2016-01-25 | 2022-04-05 | Sony Corporation | Communication system and communication control method |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US20180012230A1 (en) * | 2016-07-11 | 2018-01-11 | International Business Machines Corporation | Emotion detection over social media |
EP3493201A4 (en) * | 2016-07-26 | 2019-07-03 | Sony Corporation | Information processing device, information processing method, and program |
US10847154B2 (en) * | 2016-07-26 | 2020-11-24 | Sony Corporation | Information processing device, information processing method, and program |
CN109074809A (en) * | 2016-07-26 | 2018-12-21 | 索尼公司 | Information processing equipment, information processing method and program |
US20190103110A1 (en) * | 2016-07-26 | 2019-04-04 | Sony Corporation | Information processing device, information processing method, and program |
US11693894B2 (en) * | 2016-07-29 | 2023-07-04 | Microsoft Technology Licensing, Llc | Conversation oriented machine-user interaction |
US20210319051A1 (en) * | 2016-07-29 | 2021-10-14 | Microsoft Technology Licensing, Llc | Conversation oriented machine-user interaction |
US11068519B2 (en) * | 2016-07-29 | 2021-07-20 | Microsoft Technology Licensing, Llc | Conversation oriented machine-user interaction |
US10764206B2 (en) | 2016-08-04 | 2020-09-01 | International Business Machines Corporation | Adjusting network bandwidth based on an analysis of a user's cognitive state |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US20180096698A1 (en) * | 2016-09-30 | 2018-04-05 | Honda Motor Co., Ltd. | Processing result error detection device, processing result error detection program, processing result error detection method, and moving entity |
US10475470B2 (en) * | 2016-09-30 | 2019-11-12 | Honda Motor Co., Ltd. | Processing result error detection device, processing result error detection program, processing result error detection method, and moving entity |
US10573307B2 (en) * | 2016-10-31 | 2020-02-25 | Furhat Robotics Ab | Voice interaction apparatus and voice interaction method |
US20180122377A1 (en) * | 2016-10-31 | 2018-05-03 | Furhat Robotics Ab | Voice interaction apparatus and voice interaction method |
US10074359B2 (en) * | 2016-11-01 | 2018-09-11 | Google Llc | Dynamic text-to-speech provisioning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10235990B2 (en) | 2017-01-04 | 2019-03-19 | International Business Machines Corporation | System and method for cognitive intervention on human interactions |
US10373515B2 (en) | 2017-01-04 | 2019-08-06 | International Business Machines Corporation | System and method for cognitive intervention on human interactions |
US10902842B2 (en) | 2017-01-04 | 2021-01-26 | International Business Machines Corporation | System and method for cognitive intervention on human interactions |
US10318639B2 (en) | 2017-02-03 | 2019-06-11 | International Business Machines Corporation | Intelligent action recommendation |
US10991384B2 (en) | 2017-04-21 | 2021-04-27 | audEERING GmbH | Method for automatic affective state inference and an automated affective state inference system |
US10347244B2 (en) | 2017-04-21 | 2019-07-09 | Go-Vivace Inc. | Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11621865B2 (en) * | 2017-07-21 | 2023-04-04 | Pearson Education, Inc. | Systems and methods for automated platform-based algorithm monitoring |
US10938592B2 (en) * | 2017-07-21 | 2021-03-02 | Pearson Education, Inc. | Systems and methods for automated platform-based algorithm monitoring |
US20210152385A1 (en) * | 2017-07-21 | 2021-05-20 | Pearson Education, Inc. | Systems and methods for automated platform-based algorithm monitoring |
US10867128B2 (en) | 2017-09-12 | 2020-12-15 | Microsoft Technology Licensing, Llc | Intelligently updating a collaboration site or template |
US10742500B2 (en) * | 2017-09-20 | 2020-08-11 | Microsoft Technology Licensing, Llc | Iteratively updating a collaboration site or template |
US10394958B2 (en) * | 2017-11-09 | 2019-08-27 | Conduent Business Services, Llc | Performing semantic analyses of user-generated text content using a lexicon |
US10964338B2 (en) * | 2017-12-22 | 2021-03-30 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Mood recognition method, electronic device and computer-readable storage medium |
US20190198040A1 (en) * | 2017-12-22 | 2019-06-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Mood recognition method, electronic device and computer-readable storage medium |
CN108133038A (en) * | 2018-01-10 | 2018-06-08 | 重庆邮电大学 | A kind of entity level emotional semantic classification system and method based on dynamic memory network |
US11250038B2 (en) * | 2018-01-21 | 2022-02-15 | Microsoft Technology Licensing, Llc. | Question and answer pair generation using machine learning |
CN108470024A (en) * | 2018-03-12 | 2018-08-31 | 北京灵伴即时智能科技有限公司 | A kind of Chinese rhythm structure prediction technique of fusion syntactic-semantic pragmatic information |
US11042711B2 (en) | 2018-03-19 | 2021-06-22 | Daniel L. Coffing | Processing natural language arguments and propositions |
US20190304480A1 (en) * | 2018-03-29 | 2019-10-03 | Ford Global Technologies, Llc | Neural Network Generative Modeling To Transform Speech Utterances And Augment Training Data |
US10937438B2 (en) * | 2018-03-29 | 2021-03-02 | Ford Global Technologies, Llc | Neural network generative modeling to transform speech utterances and augment training data |
US10748644B2 (en) | 2018-06-19 | 2020-08-18 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US11942194B2 (en) | 2018-06-19 | 2024-03-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US11120895B2 (en) | 2018-06-19 | 2021-09-14 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
WO2020036195A1 (en) * | 2018-08-15 | 2020-02-20 | 日本電信電話株式会社 | End-of-speech determination device, end-of-speech determination method, and program |
JPWO2020036195A1 (en) * | 2018-08-15 | 2021-08-10 | 日本電信電話株式会社 | End-of-speech determination device, end-of-speech determination method and program |
JP7007617B2 (en) | 2018-08-15 | 2022-01-24 | 日本電信電話株式会社 | End-of-speech judgment device, end-of-speech judgment method and program |
US11183174B2 (en) * | 2018-08-31 | 2021-11-23 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
WO2020051500A1 (en) * | 2018-09-06 | 2020-03-12 | Coffing Daniel L | System for providing dialogue guidance |
US11429794B2 (en) | 2018-09-06 | 2022-08-30 | Daniel L. Coffing | System for providing dialogue guidance |
US11743268B2 (en) | 2018-09-14 | 2023-08-29 | Daniel L. Coffing | Fact management system |
US11302303B2 (en) | 2018-12-18 | 2022-04-12 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and device for training an acoustic model |
CN109559734A (en) * | 2018-12-18 | 2019-04-02 | 百度在线网络技术(北京)有限公司 | The acceleration method and device of acoustic training model |
CN109818833A (en) * | 2019-03-14 | 2019-05-28 | 北京信而泰科技股份有限公司 | A kind of ethernet test system and ethernet test method |
US11430438B2 (en) * | 2019-03-22 | 2022-08-30 | Samsung Electronics Co., Ltd. | Electronic device providing response corresponding to user conversation style and emotion and method of operating same |
US20220164539A1 (en) * | 2019-04-26 | 2022-05-26 | Tucknologies Holdings, Inc. | Human Emotion Detection |
US11847419B2 (en) * | 2019-04-26 | 2023-12-19 | Virtual Emotion Resource Network, Llc | Human emotion detection |
CN110147432A (en) * | 2019-05-07 | 2019-08-20 | 大连理工大学 | A kind of Decision Search engine implementing method based on finite-state automata |
WO2020227557A1 (en) * | 2019-05-09 | 2020-11-12 | Sri International | Method, system and apparatus for understanding and generating human conversational cues |
US11335347B2 (en) * | 2019-06-03 | 2022-05-17 | Amazon Technologies, Inc. | Multiple classifications of audio data |
US11521220B2 (en) | 2019-06-05 | 2022-12-06 | International Business Machines Corporation | Generating classification and regression tree from IoT data |
US20220284920A1 (en) * | 2019-07-05 | 2022-09-08 | Gn Audio A/S | A method and a noise indicator system for identifying one or more noisy persons |
WO2021011139A1 (en) * | 2019-07-18 | 2021-01-21 | Sri International | The conversational assistant for conversational engagement |
US20220310079A1 (en) * | 2019-07-18 | 2022-09-29 | Sri International | The conversational assistant for conversational engagement |
RU2721180C1 (en) * | 2019-12-02 | 2020-05-18 | Самсунг Электроникс Ко., Лтд. | Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it |
US11416556B2 (en) * | 2019-12-19 | 2022-08-16 | Accenture Global Solutions Limited | Natural language dialogue system perturbation testing |
US20210287664A1 (en) * | 2020-03-13 | 2021-09-16 | Palo Alto Research Center Incorporated | Machine learning used to detect alignment and misalignment in conversation |
US11817086B2 (en) * | 2020-03-13 | 2023-11-14 | Xerox Corporation | Machine learning used to detect alignment and misalignment in conversation |
US20230119954A1 (en) * | 2020-06-01 | 2023-04-20 | Amazon Technologies, Inc. | Sentiment aware voice user interface |
WO2022036446A1 (en) * | 2020-08-17 | 2022-02-24 | Jali Inc. | System and method for triggering animated paralingual behavior from dialogue |
US11488339B2 (en) | 2020-08-17 | 2022-11-01 | Jali Inc. | System and method for triggering animated paralingual behavior from dialogue |
US11748558B2 (en) * | 2020-10-27 | 2023-09-05 | Disney Enterprises, Inc. | Multi-persona social agent |
US20220129627A1 (en) * | 2020-10-27 | 2022-04-28 | Disney Enterprises, Inc. | Multi-persona social agent |
CN112735427A (en) * | 2020-12-25 | 2021-04-30 | 平安普惠企业管理有限公司 | Radio reception control method and device, electronic equipment and storage medium |
US20220208216A1 (en) * | 2020-12-28 | 2022-06-30 | Sharp Kabushiki Kaisha | Two-way communication support system and storage medium |
US11700426B2 (en) | 2021-02-23 | 2023-07-11 | Firefly 14, Llc | Virtual platform for recording and displaying responses and reactions to audiovisual contents |
US11563852B1 (en) * | 2021-08-13 | 2023-01-24 | Capital One Services, Llc | System and method for identifying complaints in interactive communications and providing feedback in real-time |
Also Published As
Publication number | Publication date |
---|---|
WO2007067878A2 (en) | 2007-06-14 |
WO2007067878A3 (en) | 2008-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20060122834A1 (en) | Emotion detection device & method for use in distributed systems | |
US8214214B2 (en) | Emotion detection device and method for use in distributed systems | |
Agarwal et al. | A review of tools and techniques for computer aided pronunciation training (CAPT) in English | |
Litman et al. | Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors | |
Levis et al. | Automatic speech recognition | |
Lee et al. | On the effectiveness of robot-assisted language learning | |
Cole et al. | The challenge of spoken language systems: Research directions for the nineties | |
Ashwell et al. | How Accurately Can the Google Web Speech API Recognize and Transcribe Japanese L2 English Learners' Oral Production?. | |
CN101551947A (en) | Computer system for assisting spoken language learning | |
US20070206017A1 (en) | Mapping Attitudes to Movements Based on Cultural Norms | |
US8036896B2 (en) | System, server and method for distributed literacy and language skill instruction | |
JP2001159865A (en) | Method and device for leading interactive language learning | |
Godwin-Jones | Speech tools and technologies | |
US9520068B2 (en) | Sentence level analysis in a reading tutor | |
WO2019075828A1 (en) | Voice evaluation method and apparatus | |
LaRocca et al. | On the path to 2X learning: Exploring the possibilities of advanced speech recognition | |
Shao et al. | AI-based Arabic Language and Speech Tutor | |
Kantor et al. | Reading companion: The technical and social design of an automated reading tutor | |
Zhang et al. | Cognitive state classification in a spoken tutorial dialogue system | |
Kochem et al. | The Use of ASR-Equipped Software in the Teaching of Suprasegmental Features of Pronunciation: A Critical Review. | |
Wik | Designing a virtual language tutor | |
Li et al. | Speech interaction of educational robot based on Ekho and Sphinx | |
Pachler | Speech technologies and foreign language teaching and learning | |
CN114783412B (en) | Spanish spoken language pronunciation training correction method and system | |
Delmonte | A prosodic module for self-learning activities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PHOENIX SOLUTIONS, INC.,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GURURAJ, PALLAKI;REEL/FRAME:018433/0739 Effective date: 20061025 Owner name: PHOENIX SOLUTIONS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GURURAJ, PALLAKI;REEL/FRAME:018433/0739 Effective date: 20061025 |
|
AS | Assignment |
Owner name: PHOENIX SOLUTIONS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BENNETT, IAN M.;REEL/FRAME:022708/0202 Effective date: 20061031 Owner name: PHOENIX SOLUTIONS, INC.,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BENNETT, IAN M.;REEL/FRAME:022708/0202 Effective date: 20061031 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |
|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PHOENIX SOLUTIONS, INC.;REEL/FRAME:030949/0249 Effective date: 20130715 |