WO2000022597A1

WO2000022597A1 - Method for computer-aided foreign language instruction

Info

Publication number: WO2000022597A1
Application number: PCT/US1999/023264
Authority: WO
Inventors: David Lawrence Topolewski; Luther Marvin Shannon
Original assignee: Planetlingo Inc.
Priority date: 1998-10-15
Filing date: 1999-10-06
Publication date: 2000-04-20
Also published as: AU6292799A; TW448379B

Abstract

A foreign spoken language instructional method for use by a student utilizing a multi-media computer in which the computer is programmed to display virtual settings that would be encountered by the foreign student in everyday life in the country for the foreign language being learned is spoken. For example, one of the settings could be a fast food restaurant. The computer is also programmed to create a virtual host that is appropriate for that setting, and to allow, through the use of automatic speech recognition and natural language understanding technologies, a real-life conversation to occur between the student and the host that is appropriate to the setting. The computer will have stored or have access to a vocabulary and library of possible responses and statements by both the host and the student, along with a conversational tree utilizing those vocabularies and libraries that will enable the conversation to follow any of several twists and turns, and not a precise, pre-set, structured command-response protocol.

Description

DESCRIPTION

Method For Computer-Aided Foreign Language Instruction

Background Of The Invention

1. Field of the Invention This invention relates to field of foreign language instruction generally, and particularly to a multi-media, computer-aided system for helping a student learn a foreign language in a simulated conversational environment.

2. Background

Learning a foreign language has been a universal pursuit literally since Biblical times and the Tower of Babel. This pursuit is more prevalent and important today than ever before. As the global community becomes more and more interdependent, as international travel and commerce become more and more the norm, and as the world becomes more and more linked by the aptly named World Wide Web, being able to communicate with people from other nations and cultures is becoming more of a necessity and less of a scholarly pursuit. Indeed, because the English language has become the international language of business, learning English has taken on prerequisite status in developed and developing nations around the world. In Japan, for example, not only are several years of classroom instruction in English required course work in the Japanese equivalent of elementary and high school, but augmenting that class work with private instruction in English is quite common.

One has only to travel outside the United States to be amazed at the degree to which children and adults in other countries are taking the time and expending the effort to learn English. Even in the United States learning a foreign language is a common goal.

Up to the present time, the process of learning a foreign language has followed one or another of several time-honored methods - classroom instruction in which an instructor leads the students through the learning process using face-to-face interaction and textual materials; flash cards; self-instruction audio or video tapes in which words and phrases are spoken on the tape amid pauses for the student to repeat the phrase back to the tape player.

With the advent of personal computing, foreign language instructional software has also now become available. See, for example, those described in U.S. Patent Nos. 5,810,599 and 5,697,789. The currently available programs, however, are essentially transpositions of the previously available non-interactive mediums such as flash cards and tapes, sometimes augmented with entertaining or educational graphics or visuals. (For example, the ' 789 patent includes a video display of an animated representation of a person's lips as the person is pronouncing the selected words). However, the currently available foreign language instructional software does not provide a truly interactive system which approximates or simulates a teacher-student interactive environment. Of all of these currently available educational systems, it is generally agreed that the interactive approach of student-to-teacher (necessarily well-trained and experienced), and preferably in a one-on-one setting, is the most effective and efficient way to learn a foreign language, and is much preferable to the various non-interactive systems such as flash cards or tapes. Unfortunately, most teaching and interaction is done without the benefit of context.

Face-to-face interactive instruction is generally best for several reasons. First, in the interactive student-teacher environment there can be real time communication that is tailored exclusively to that student's particular progress, strengths and weaknesses. Second, the teacher is able to receive and process a variety of responses that may include improper pronunciation and grammatical construction, and still provide the correct response, and a correcting of the erroneous portion of the student's response. Thirdly, there is a recognized physio-linguistic phenomenon in which what a person sees in the way of visual cues affects what that person simultaneously perceives aurally. This is sometimes referred to as the McGurk Effect. Specifically, during childhood a person learns to associate certain mouth/facial positions and expressions with certain vocal sounds. In other words, when the person combines the visual cues simultaneously with the spoken word, understanding and comprehension are remarkably improved. This phenomenon is readily understood by reference to the preference of business people to have face-to-face meetings rather than telephone conferences. While the latter is often more convenient, it is generally believed that communication in the face-to-face meetings is typically better than in the telephone conference which is devoid of visual cues, even though every spoken word will have been heard precisely.

However, the face-to-face student/teacher system has its drawbacks as well. In the classroom setting, there are multiple students so a truly personalized approach is not possible. Plus, the classroom setting inherently involves bringing a number of people together at the same place and time. This takes scheduling, and forces the student to adapt his or her schedule to that of the class, rather than being able to have the teacher available on-call. The class room system is also typically limited to at most a few hours a week. Therefore, the system cannot provide a truly personalized educational experience. While personal tutoring solves some of these drawbacks, it does not solve all of them, and has some of its own. For example, even though the personal tutoring system involves just two people, it still requires scheduling. It can also be expensive. And, the tutor is not always going to be available at any time or place, at the student's beck and call.

Therefore, there exists a need in the art for an improved foreign language instructional system that incorporates the benefits of these prior art systems while minimizing their drawbacks.

Summary Of The Invention

Such an improved system is provided in a method that utilizes a commercially available personal computer capable of what is now commonly referred to as multi-media computing, and which comprises the steps of storing in the ■. computer's memory (or otherwise providing for access) for display on the monitor a visual simulation of a real-life setting, such as the interior of a fast food restaurant, or a bank, or a doctor's office, for only a few examples; storing in the computer's memory (or otherwise providing for access) for display on the monitor an animated character appropriate to the real-life setting, such as, for example, an order taker in the fast food restaurant setting, a teller in a bank, or the receptionist, nurse or doctor in the doctor's office in the examples set forth above; storing in memory conversational statements for the host that are appropriate to the setting displayed; storing in computer's memory (or otherwise providing for access) for replay through the computer's speaker system, of a library of possible responses by the student to each of the host's conversational statements; having the host initiate a conversation with the student in the foreign language that is appropriate for the setting displayed; allowing the student to respond to the host verbally in the foreign language; converting the verbal response to text utilizing automatic speech recognition technology that has been stored in the computer's memory (or is otherwise accessible); interpreting the meaning of the response using natural language understanding technology that has been stored in the computer's memory (or is otherwise accessible) comparing the converted response to a library of possible responses by the student that has been stored in the computer memory or is otherwise accessible; selecting the most appropriate responsive host statement from the library of stored responses; causing that response to be played on the computer speaker system and synchronized with the visual display of the host "mouthing" the audible response; and continuing in this fashion through a plurality of conversational turns comprising a statement or inquiry by the host, a response by the student, a response by the host, and so on until the conversation appropriate for the setting is completed. By bringing together several separate technologies in a novel, non-obvious, fully integrated way, a foreign language instructional method is provided that closely simulates the teacher-student interactive environment via a multi-media, computer-assisted system thereby allowing the student, on his or her own personal computer, to simultaneously enjoy a truly interactive learning situation and the personal convenience of being able to study when, where and for how long the student alone dictates. This invention enables simulated, situational conversations to take place between the student and one of several alternative "teachers" — a server attached to the telephone; a stand alone computer or other stand-alone device; or a personal computer attached to the Internet. For example, a student with a standard personal computer can "enter" a fast food restaurant and engage in a situational conversation with an on-screen host. After the simulated host opens the conversation with a greeting and a first inquiry (such as "Hello, welcome to Cyber Cafe, can I help you") the system will await a verbal response from the student. By incorporating Automatic Voice Recognition technology and Natural Language Understanding technology, the student's response will be "interpreted" by the computer and compared against a pre-programmed set of expected responses and/or knowledge representation bases. Whereas prior art language instruction systems required precise responses, this invention will accept a large variety of acceptable grammars (or responses) within a constrained topic. Indeed, this invention will accept hundreds or thousands of variations of spoken input, both correct and incorrect, and then provide a response to the student based upon the system's understanding of what the student has said. These responses can range from an answer, a request for more information, a correction, a rejection, to a hint, for example.

The vocal response will be "communicated" to the student via the computer speakers and also by the animated host on-screen, whose facial movements will be synchronized with the sound output from the speakers and will be properly animated to visually cue the student as well.

The invention will also include a conversation "tree" that will allow the interaction between the student and the host to follow the path dictated by the student. For example, in the Cyber Cafe, if the student orders a hamburger, he or she would be able to do so in a variety of ways - for example "I'll have a hamburger please." "Give me a burger." "I want a cheeseburger." All of these would be recognized by the computer as appropriate responses, to which the host would inquire about how well done the student wanted it cooked, whether the student wants onions and pickles, or french fries, etc. If the student asks for pizza instead, the method of this invention will recognize the response, and will inquire about thin or thick crust, toppings, etc. Detailed Description Of The Preferred Embodiment

In the presently preferred embodiment, the method of this invention is designed for use with a typical commercially available personal computer capable of multi-media process, having a processing unit, memory, monitor, operating system, CD-ROM drive, keyboard, mouse (or other equivalent device), speakers, and microphone, such as those computers widely available from innumerable sources such as IBM, Compaq, Dell, and Gateway.

The computer memory is pre-loaded in the conventional way with both Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) engines. ASR technologies of various types are now available commercially. Some of these systems are speaker-dependent, meaning that they are customized, over time, to recognize one speaker's phrasing and pronunciation. Some systems recognize discrete speech patterns in which the spoken words are separated by short pauses. Others have the capability to recognize continuous speech. Some have very small vocabularies and libraries. Other have quite large vocabularies and libraries of phrases. Regardless of the system, all of them convert the spoken word to text, then analyze the text to determine its meaning.

In the preferred embodiment, the ASR and NLU products offered by Unisys are utilized along with the Unisys Natural Language Assistant (NLA), and stored in the computer memory according to the manufacturer's specifications and directions. These commercially available technologies allow the computer to recognize and interpret conversational English, for example, so that interactions can occur in a natural way, without complex menus or artificially constrained vocabularies or syntax.

The memory of the computer is also loaded in the conventional way for display on the computer's monitor of a visual simulation of a real life situational setting that a person would typically encounter during a trip to the country in which the foreign language being learned is spoken. For example, if the method of this invention is being used to teach the English language, the situational setting could be a fast food restaurant, or a bank, or an airport, or a post office, or a doctor's office, or a car rental agency, or a hotel, or a bus, etc. Those skilled in the art will understand that there are many, many situational settings that a tourist will encounter, each of which has its own situational conversation vocabularies. A tourist who, by virtue of his or her computer, has visited a virtual American fast food restaurant and successfully ordered food many times before actually walking into one in the United States will be far better able to handle the real life situation than a tourist who has only memorized the English words hamburger, french fries and a cola. The computer memory has stored in the conventional way for display on the monitor in conjunction with the setting a "host" who is appropriate for the setting. For example, if the setting if a fast food restaurant, the host will be the order taker who works behind the counter and cash register, who greets the customer (in this instance the student), takes the order, retrieves and delivers the food to the student, states the total cost, collects the money, and makes the change. In the bank setting, the host could be the teller, who also greets the bank customer-student, inquires as to or receives the student's input on the desired transaction, and then completes the transaction.

For each of the virtual settings, the computer memory has stored in a conventional way a conversation tree that is appropriate to the setting. The conversation tree will comprise a vocabulary of words and phrases that enable the host and the student to have a simulated real life conversation that is appropriate to the setting. The conversation will be initiated by the host. For example, in the Cyber Cafe, the host would welcome the student, then ask what he or she wanted. Using ASR, NLU and NLA, the computer will be able to recognize a large number of variant responses. If the student responds with any statement that is recognized as asking for a hamburger, the "limb" of the conversation tree dealing with a hamburger order is pursued. The host may be programmed to ask, in response to an order for a hamburger, as to how the student wants it cooked. An appropriate response (again recognized by the combination of ASR, NLU and NLA) would then elicit the next logical and appropriate statement or inquiry by the host - whether the student wants it with or without onions, for example. The computer will have been programmed to ask for a repeated response from the student if the first response is not understood or is incorrect. After a pre-determined number of inaccurate or inappropriate responses, the computer will provide a translation of the host's statement or inquiry into the student's native language, then restart the conversation. Each language and culture has visual cues that are quite important in communication. The combination of the spoken word with the appropriate visual cue provides the most effective communication. To enhance the learning experience in the preferred embodiment of the method of this invention, the computer memory has stored in the conventional way data that will allow the host, at the same time it is "mouthing" the words that are being played on the computer's speakers, to also provide those visual cues that typically accompany that statement. As a very simple example, in the fast food restaurant during some portion of the conversation the host may advise the customer/student of the location of the napkin dispenser by saying "It's over there." In real life, the fast food worker would likely also point in the right direction. In the preferred embodiment of this invention, the simulated host would also point at the same time as the host mouthed the words. The method of this invention will include any number of such settings which can be accessed though computer disk or CD-ROM. The student will be able to experience a simulated trip to the foreign country. For example, utilizing the method of this invention, a Japanese businessman or businesswoman will be able to have virtually visited the United States, and successfully communicated his or her way in English from the airport, to the hotel, to a restaurant, to a bank, and to a business meeting many times before actually setting foot in the United States.

While the preferred embodiment of the method of this invention has been set forth above, it will be obvious to those skilled in the art that many modifications are possible to the embodiments disclosed without departing from the inventive concept claimed below.

Therefore, this patent and the protection it provides is not limited to the embodiments set forth above, but is of the full scope of the following claims, including equivalents thereof.

Claims

1. A foreign spoken language instructional method for use by a student utilizing a multi-media computer, the computer having a processor, memory, an operating system, a monitor, a microphone, a modem and speakers, the memory including automatic speech recognition and natural voice understanding technologies, the method comprising the steps of: a. storing in memory for display on the monitor a visual simulation of a real-life setting, such as the interior of a fast food restaurant, or a bank, or a doctor's office, for only a few examples; b. storing in memory for display on the monitor an animated character appropriate to the real-life setting, such as, for example, an order taker in the fast food restaurant setting, a teller in a bank, or the receptionist, nurse or doctor in the doctor's office in the examples set forth above; c. storing in memory conversational statements for the host that are appropriate to the setting displayed; d. storing in memory a library of possible responses by the student to each of the host's conversational statement; e. having the host initiate a conversation with the student in the foreign language that is appropriate for the setting displayed; f. allowing the student to respond to the host verbally in the foreign language; g. converting the verbal response to text utilizing automatic speech recognition technology; h. comparing the converted response to a library of possible responses by the student; i. selecting the most appropriately responsive host statement from the library of stored responses; j. causing that response to be played on the computer speakers and synchronized with the visual display of the host "mouthing" the audible response; k. continuing in this fashion through a plurality of conversational turns comprising a statement or inquiry by the host, a response by the student, a response by the host, and so on until the conversation appropriate for the setting is completed.

2. The invention of claim 1 further comprising the step of having the animated host simultaneously provide the appropriate visual cues that correspond to the statement being made by the host in that foreign language.

3. The invention of claim 1 further comprising the step of simultaneously displaying on the monitor the text of the response by the host, if desired by the student.

4. The invention of claim 1 further comprising the step of utilizing the stored natural language technology to analyze the student's response so as to be able to recognize many different syntaxes, grammars, sentence structures to discern the concept being communicated by the student.

5. A foreign language instructional method for use by a student utilizing a multi-media computer, the computer having a processor, memory, an operating system, a monitor, a microphone and speakers, the memory including automatic speech recognition and natural voice understanding technologies, the computer further having stored in memory or otherwise being accessible a visual simulation of a real-life setting, an animated character appropriate to the real-life setting, conversational statements for the host that are appropriate to the setting displayed, a library of possible responses by the student to each of the host's conversational statements, the method comprising the steps of: a. recalling for display on the monitor a visual simulation of a real-life setting, such as the interior of a fast food restaurant, or a bank, or a doctor's office, for only a few examples; b. recalling for display on the monitor an animated character appropriate to the real-life setting, such as, for example, an order taker in the fast food restaurant setting, a teller in a bank, or the receptionist, nurse or doctor in the doctor's office in the examples set forth above; c. having the host initiate a conversation with the student in the foreign language that is appropriate for the setting displayed; d. allowing the student to respond to the host verbally in the foreign language; e. converting the verbal response to text utilizing automatic speech recognition technology; f. comparing the converted response to a library of possible responses by the student; AMENDED CLAIMS

[received by the International Bureau on 24 March 2000 (24.03.00); original claims 1-3 and 5-7 amended; original claims 4 and 8 cancelled; new claims 9-14 added; remaining claims unchanged (8 pages)]

1. A foreign spoken language instructional method for use by a student utilizing a multi -media computer, the computer having a processor, memory, an operating system, a monitor, a microphone, and speakers, the memory including automatic speech recognition and natural language understanding technologies, the method comprising the steps of : a. storing in memory for display on the monitor a visual simulation of a background contextual setting; b. storing in memory for display on the monitor an animated character visually tailored to the context of the displayed background setting; c. storing in memory a library of possible conversational statements for the host that correspond to the context of the displayed background setting; d. storing in memory a library of possible responses by the student to each of the host's conversational statements; e. having the host initiate a conversation with the student corresponding to the displayed background setting in a selected foreign language; f. converting a foreign language verbal response from the student to text utilizing an automatic speech recognizer; g. utilizing the stored natural language technology to analyze the student's response so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by the student; h. comparing the converted response to the library of possible responses by the student; i . selecting a host statement from the library of stored responses that corresponds to the translated verbal response of the student; j . causing the host response to be played on the computer speakers synchronized with a visual display of the host "mouthing" the audible response; k. repeating steps (f) through (j) through a plurality of conversational exchanges comprising a statement or inquiry by the host, a response by the student, a response by the host, and so on until a conversation corresponding to the context of the background setting is completed.

2. The method of claim 1 further comprising the step of having the animated host simultaneously provide visual cues corresponding to the statement being made by the host in the selected foreign language.

3. The method of claim 1 further comprising the step of selectively displaying on the monitor the text of the response by the host.

4 . (Canceled)

5. A foreign language instructional method for use by a student utilizing a multimedia computer, the computer having a processor, memory, an operating system, a monitor, a microphone and speakers, the memory including automatic speech recognition and natural language understanding technologies, the computer further having stored in memory or otherwise being accessible a visual simulation of a background setting, an animated character visually tailored to the context of the background setting, a library of conversational statements for the host that correspond to the context of the background setting, and a library of possible responses by the student to each of the host's conversational statements, the method comprising the steps of: a. recalling for display on the monitor a visual simulation of a background contextual setting; b. recalling for display on the monitor an animated character visually tailored to the context of the displayed background setting; c. having the host initiate a conversation with the student corresponding to the displayed background setting in a selected foreign language; d. converting a foreign language verbal response from the student to text utilizing an automatic speech recognizer; e. utilizing the stored natural language technology to analyze the student's response so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by the student; f. comparing the converted response to the library of possible responses by the student; g. selecting a host statement from the library of stored responses that corresponds to the translated verbal response of the student; h. causing the response to be played on the computer speakers synchronized with a visual display of the host "mouthing" the audible response; i. repeating steps (d) through (h) through a plurality of conversational exchanges comprising a statement or inquiry by the host, a response by the student, a response by the host, and so on until a conversation corresponding to the context of the background setting is completed.

6. The method of claim 5 further comprising the step of having the animated host simultaneously provide visual cues corresponding to the statement being made by the host in the selected foreign language.

7. The method of claim 5 further comprising the step of selectively displaying on the monitor the text of the response by the host .

8. (Canceled)

9. A method for computer-aided language interaction involving a multi -media computing device, the computing device having a processor, memory, an operating system, a graphical display, a microphone, and speakers, the memory including automatic speech recognition and natural language understanding technologies, the method comprising the steps of: a. storing in memory for display on the graphical display a visual simulation of a background contextual setting; b. storing in memory for display on the graphical display a host character visually tailored to the context of the displayed background setting; c. storing in memory a library of possible conversational statements for the host that correspond to the context of the displayed background setting; d. storing in memory a library of possible responses by a student to each of the host's conversational statements; e. having the host initiate a conversation with the student corresponding to the displayed background setting in a selected language; f. analyzing a verbal response from the student utilizing an automatic speech recognizer and the stored natural language technology so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by the student; g. comparing the analyzed response to the library of possible responses by the student; h. selecting a host statement from the library of stored responses that corresponds to the translated verbal response of the student; i . causing the host response to be played on the computer speakers; j . repeating steps (f) through (i) through a plurality of conversational exchanges comprising a statement or inquiry by the host, a response by the student, a response by the host, and so on until a conversation corresponding to the context of the background setting is completed.

10. The method of claim 9 wherein the host character is animated, further comprising the steps of synchronizing said host response with a visual display of the host "mouthing" the audible response, and having the host simultaneously provide visual cues corresponding to the statement being made by the host in the selected language .

11. The method of claim 9 further comprising the step of selectively displaying on

SUBSTITUTE SHEET the graphical display the text of the response by the host .

12. A method for conversation-based language interaction involving a multi-media computer, the computer having a processor, memory, an operating system, a monitor, a microphone and speakers, the memory including automatic speech recognition and natural language understanding technologies, the computer further having stored in memory or otherwise being accessible a visual simulation of a background setting, a host character visually tailored to the context of the background setting, a library of conversational statements for the host that correspond to the context of the background setting, and a library of possible responses by the user to each of the host's conversational statements, the method comprising the steps of: a. recalling for display on the monitor a visual simulation of a background contextual setting; b. recalling for display on the monitor a host character visually tailored to the context of the displayed background setting; c. having the host initiate a conversation with a user corresponding to the displayed background setting in a selected language; d. analyzing a verbal response from the user to text utilizing an automatic speech recognizer and the stored natural language technology so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by the user; e . comparing the analyzed response to the library of possible responses by the user; f. selecting a host statement from the library of stored responses that corresponds to the translated verbal response of the user; g. causing the response to be played on the computer speakers; h. repeating steps (d) through (g) through a plurality of conversational exchanges comprising a statement or inquiry by the host, a response by the user, a response by the host, and so on until a conversation corresponding to the context of the background setting is completed.

13. The method of claim 12, wherein said host character is animated, further comprising the steps of synchronized the host response with a visual display of the host "mouthing" the host response, and having the animated host simultaneously provide visual cues corresponding to the statement being made by the host in the selected language.

14. The method of claim 12 further comprising the step of selectively displaying on the monitor the text of the response by the host . Statement Under Article 19

The ISA cites WO 98/11523 (Appleby) with respect to claims 1-3 and 5-7 as an allegedly pertinent document.

Claims 1 and 5 have been amended to incorporate the recitals of claims 4 and 8, respectively. Claim 4, for example, recites the step of " utilizing the stored natural language technology to analyze the student's response so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by the student," and claim 8 contains similar recitals. Claims 4 and 8 were not found by the ISA either to lack novelty or an inventive step over Appleby. Accordingly, claims 1 and 5 as amended are believed to be both novel and contain an inventive step over the cited items.

Claims 2-3 and 6 -1 are each dependent upon one of the aforementioned independent claims.

Claims 9 and 12 are new independent claims. Similar to original claims 4 and 8, and as combined with further subject

matter appearing generally in claims 1 and 5, they each recite the step of " analyzing a verbal response from the student utilizing an automatic speech recognizer and the stored natural language technology so as to recognize a plurality of different syntaxes, grammars, and sentence structures to discern the concept being communicated by" the student or user. New claims 10-11 and 13-14 depend upon claims 9 and 12. It is respectfully submitted that new claims 9-14 are thus novel and contain an inventive step over Appleby.