EP1073036A2 - Parsing of downloaded documents for a speech synthesis enabled browser - Google Patents

Parsing of downloaded documents for a speech synthesis enabled browser Download PDF

Info

Publication number
EP1073036A2
EP1073036A2 EP00306355A EP00306355A EP1073036A2 EP 1073036 A2 EP1073036 A2 EP 1073036A2 EP 00306355 A EP00306355 A EP 00306355A EP 00306355 A EP00306355 A EP 00306355A EP 1073036 A2 EP1073036 A2 EP 1073036A2
Authority
EP
European Patent Office
Prior art keywords
data
speech
text data
data file
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP00306355A
Other languages
German (de)
French (fr)
Other versions
EP1073036B1 (en
EP1073036A3 (en
Inventor
Mitsuru Otsuka
Yasuko Miyazaki
Shinichi Kamiyama
Takashi Aso
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of EP1073036A2 publication Critical patent/EP1073036A2/en
Publication of EP1073036A3 publication Critical patent/EP1073036A3/en
Application granted granted Critical
Publication of EP1073036B1 publication Critical patent/EP1073036B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • the present invention relates to a speech processing method and apparatus.
  • the invention has particular, although not exclusive relevance to the retrieval of data including speech data, which comprises control data and phoneme data identifying phoneme symbols, from a remote server over a communications channel.
  • Web browsing software exists for allowing users to browse web pages which are stored on servers located throughout the Internet. There are currently two techniques which allow the browsing software to output speech relevant to the web page.
  • the remote server can store and send the user recorded voice data which is relevant to the web page, or the user's browser may include a text to speech synthesizer which converts the hypertext in the web page into speech.
  • the present invention provides a web browser which is operable to process web pages received from the Internet in order to determine whether or not there is an associated speech data file stored for the web page in a server on the Internet and, if there is, means for retrieving the speech data file from that server.
  • the invention also provides a data retrieval system which can process received text data to determine whether or not there is an associated speech data file and, if there is, means for retrieving the speech data file a remote storage location.
  • FIG. 1 is a block diagram showing an Internet browsing system embodying the present invention.
  • the invention may be used in any hypertext system which includes a first and second apparatus which are connected together via a network or other communications channel and in which data from the second apparatus can be retrieved by the first apparatus and can be output as speech.
  • the system comprises a client 1, a server 3, a communications network 5 for linking the client 1 to the server 3.
  • the system is generally called a client server system.
  • Both the client 1 and the server 3 comprise the modules illustrated in Figure 1 and operate under control of a stored program in order to carry out the process steps shown in Figure 2.
  • the control program for each of the client and server may be stored in a memory (not shown) at the time of manufacture or it may be read out from a storage medium detachable from the client or server or downloaded from a remote terminal connected to the client or server via a computer network.
  • step s101 an input address for a web page (which is a hypertext data file) is input by a user of the client 1.
  • the input web address is stored in the address memory 110 and identifies the server which stores the web page and the file name of the web page that the client wishes to download.
  • the input web address is then transmitted, in step s102, by a web address transmitting unit 101 to the server 3, where it is received, in step s151, by a web page communicating unit 102.
  • the web page communicating unit 102 then reads out the web page from the storage unit 103 corresponding to the received web page address and transmits it, in step s152, back to the client 1.
  • the web page receiving unit 104 in the client 1 receives, in step s103, the web page transmitted from the server 3 and a web page display unit 105 in the client 1 develops image and character data on the basis of the received web page and displays it on a display (not shown) in step s104. Then, in step s105, a speech data require unit 106 in the client 1 processes the received web page and determines whether it comprises a requirement for speech data. In particular, the speech data require unit 106 processes the received web page to determine whether or not it includes a predetermined speech command which identifies that a speech data file exists which is associated with the received web page.
  • FIGS 2b, 2c and 2d illustrate the form of different types of web pages which may be received in step s103 by the client 1.
  • each of the web pages includes a header portion 11 which defines, among other things, the size and position of a window in which the web page will be displayed; a text portion 13 which includes the text to be displayed and a speech command portion 15 which identifies that there is speech data associated with the current web page.
  • the speech command portion 15 may occur before or after the text portion 13 and will comprise at least the file name 17 of the file holding the associated speech data.
  • the speech command also includes an address portion 19 which identifies where the speech data file is stored.
  • the address 19 will identify the same server 3 which transmitted the current web page being viewed by the user. However this is not essential and the address portion 19 may identify a different server. Further, as illustrated in Figure 2d, the speech command may include the file names 17a and 17b of more than one speech data file and the corresponding address portions 19a and 19b. In this embodiment, the address portion 19 also identifies where the speech data file 17 is to be found within the identified server.
  • step s105 the client 1 determines that there is a requirement for speech data, then the processing proceeds to step s106, otherwise it proceeds to step s109.
  • a speech data require unit 106 transmits a request for the speech data. This request for speech data includes the file name 17 for the speech data file and its address 19.
  • a speech data communicating unit 107 located within the server 3 receives, in step s153, the speech data request transmitted from the client 1 and then retrieves the appropriate speech data file from the speech data storage unit 108 and then transmits it back to the client 1 in step s154.
  • a speech synthesizer 109 located within the client 1 receives, in step s107, the speech data file transmitted from the server 3, transforms the speech data into a synthesized speech signal in step s108 and outputs the synthesized speech signal to a loudspeaker (not shown).
  • step s109 the client 1 determines whether or not the browsing process is finished. If it has, then the processing ends, otherwise the processing returns to step s101 where the process is continued for a new input address.
  • the browser is a computer program run on a conventional computer system and which is displayed as a window on the display of the computer system. Therefore, in order to finish the browsing process, the user of the client 1 can simply close the window currently running the browser program.
  • the speech data within the speech data file transmitted from the server 3 to the client 1 includes phoneme data (identifying phoneme symbols) which defines the sound signals to be output by the speech synthesizing unit 109; prosody control data which is used to control the stress and intonation of the sounds defined by the phoneme data; and speaker control data which controls the pitch, speed and type of speaker (male or female) of the speech signals generated by the speech synthesizing unit 109.
  • Figure 3 illustrates the phoneme symbols used in this embodiment which are transmitted from the server 3 to the client 1. Figure 3 also gives, next to each symbol, an example of how the corresponding phoneme is used.
  • Figure 3 illustrates the phoneme symbols are split into three groups - those relating to simple vowels 25, those relating to diphthongs 27 and those relating to consonants 29.
  • Figure 4 illustrates the prosody control data used in this embodiment, which, in use, is embedded within the transmitted phoneme data.
  • the prosody control data includes symbols which identify boundaries between syllables and words and symbols for controlling the stress applied to vowels and other symbols which control the pronunciation of the phonemes.
  • Figure 5 illustrates the form of the speaker control data used in this embodiment.
  • the top half 31 of Figure 5 illustrates the way in which the speaker control data is used, whilst the lower half 33 of Figure 5 illustrates the parameters that can be set.
  • the parameters include the speed at which the speech is generated (which can be set to a value of 1 to 4); the pitch of the generated speech (which can be set to a value indicative of a shift from a predetermined standard speaker's pitch); and the type of speaker (which can either be set to male or female).
  • Figure 6 illustrates the phoneme data together with the embedded prosody control data generated for the example text: "New York City.” and "This is a text-to-speech system.”
  • the prosody control data used in these examples include the use of a primary stress control symbol "1" which causes the speech synthesizer 109 to place more stress on the subsequent vowel and a period control symbol ".” which causes the speech synthesizer 109 to add a pause.
  • Figure 7 illustrates a further example which shows text data for display 39 together with corresponding speech data 41 which includes the use of both the above described prosody control data and speaker control data.
  • the speaker control data may be inserted within the phoneme data so that, for example, the type of speaker outputting the speech may be changed during the speech synthesizing operation.
  • the speech data file which is transmitted from the server 3 to the client 1 only includes phoneme data and appropriate control data
  • the speech data file will be much smaller than a corresponding recorded speech waveform file and it can therefore be retrieved from the server 3 more quickly.
  • the data for controlling the way in which the speech synthesizer 109 synthesizes the speech is also transmitted from the server 3, it can be set in advance by the owner of the web site. Therefore, the web site owner can set the control data so that the client's speech synthesizer 109 will output the speech in a predetermined manner.
  • FIG. 8 is a block diagram illustrating the client server system of this embodiment
  • Figure 9 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation.
  • the main difference between this embodiment and the first embodiment is in the provision of a speech synthesizer user interface 209 within the client 1.
  • the description of this second embodiment will therefore be restricted to the interaction of this speech synthesizer user interface unit 209 with the other modules of the system.
  • the speech data require unit 106 determines that the received web page includes a speech command identifying that a speech data file exists which is associated with the current web page being viewed, it starts the speech synthesizer user interface unit 209 as well as transmitting the requirement for speech data to the server 3.
  • the speech data communicating unit 107 located within the server 3 retrieves the appropriate speech data file and transmits it back to the client 1, the speech synthesizer 109 synthesizes a corresponding speech signal under control of the speech synthesizer user interface unit 209.
  • Figure 11a shows an example of a graphical user interface which the speech synthesizer user interface 209 outputs to the display (not shown) of the client 1, and which allows the user of the client to set various control parameters and input various commands used to control the operation of the speech synthesizer 109.
  • the graphical user interface illustrated in Figure 11a allows the user to start, stop, pause and restart the operation of the speech synthesizer 109, to change the pitch and speed of the synthesized speech signal and to change the type of speaker etc.
  • FIG 10 is a flow chart illustrating the way in which commands and settings input via the speech synthesizer user interface 209, controls the operation of the speech synthesizer 109.
  • the speech synthesizer 109 receives a command input by the user via the user interface 209.
  • the speech synthesizer 109 determines, in step s20902 if the input command is a start command. If it is, then the processing proceeds to step S20903 where the speech synthesizer 109 begins to output a synthesized speech signal from the received speech data and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • step s20902 the input command is not the start command, then the processing proceeds to step s20904 where the speech synthesizer 109 determines if the input command is a stop command. If it is, then the processing proceeds to step s20905 where the speech synthesizer 109 stops outputting the synthesized speech signal and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • step s20904 the input command is not the stop command, then the processing proceeds to step s20906 where the speech synthesizer 109 determines whether or not the input command is a pause command. If it is, then the processing proceeds to step s20907 where the speech synthesizer 109 pauses outputting the synthesized speech signal at the current location in the speech data file and then the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • step s20906 the speech synthesizer 109 determines that the input command is not the pause command, then processing proceeds to step s20908 where the speech synthesizer 109 determines whether or not the input command is a restart command. If it is, then the processing proceeds to step s20909 where the speech synthesizer 109 restarts outputting the synthesized speech signal from the current location (corresponding to the location where the outputting of the synthesized speech was paused) in the speech data file and then the processing returns to step s20901 where the speech synthesizer awaits the next input command.
  • step s20908 the speech synthesizer 109 determines that the input command is not the restart command, then the processing proceeds to step s20910 where the speech synthesizer 109 determines if the input command is a command to change the pitch of the synthesized speech. If it is, then the processing proceeds to step s20911 where the pitch of the generated speech signal is changed on the basis of the pitch level set by the user. The processing then returns to step s20901 where speech synthesizer 109 awaits the next input command.
  • step s20910 the speech synthesizer 109 determines that the input command is not a change of pitch command. If it is, then the processing proceeds to step s20913 where the speech synthesizer 109 changes the speed at which the speech is being synthesized on the basis of the speed level set by the user. The processing then returns to step s20901 where the speech synthesizer awaits the next input command.
  • step s20912 the speech synthesizer 109 determines that the input command is not a speed change command
  • processing proceeds to step s20914 where the speech synthesizer 109 determines if the input command is a command to change the type of speaker. If it is, then the processing proceeds to step s20915 where the settings of the speech synthesizer 109 are changed in accordance with the type of speaker set by the user. The processing then returns to step s20901 where the next input command is awaited.
  • step s20914 the speech synthesizer 109 determines that the input command is not a command to change the type of speaker, then the processing proceeds to step s20916 where the speech synthesizer 109 determines whether or not the input command is to end the synthesizing operation. If it is, then the processing ends, otherwise the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • the speech synthesizer user interface 209 is a graphical user interface such as the two example interfaces shown in Figure 11.
  • the interfaces include a control button 10 for inputting a command to start a synthesizing operation; a control button 11 for stopping a synthesizing operation and a control button 12 for controlling the pausing and restarting of a synthesizing operation, each of which is activated by a user controlled cursor (not shown) in a known manner.
  • the interfaces also include a menu select button 13 for changing the type of speaker from a male speaker to a female speaker and vice versa and two sliders 14 and 15 for controlling the pitch and speed of the synthesized speech respectively.
  • a progress slider 16 is also provided to show the user the progress of the speech synthesizer in generating synthesized speech signals for the received speech data file.
  • the interface may be provided through a numerical or general keyboard forming part of the client, provided the relation between the keys and the commands is defined in advance.
  • Figure 12 is a table illustrating the correlation between keys of a numeric keyboard and a general keyboard which may be programmed in advance to allow the user to control the speech synthesizer 109 using the keys of the keyboards.
  • the user can change the default settings of the control parameters used to control the speech synthesizing operation set by the web site owners.
  • the user of the client can customize the way in which the speech synthesizer synthesizes the speech in accordance with the user's preferences.
  • Figure 13 is a block diagram illustrating the client server system of this embodiment
  • Figure 14 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation.
  • the main difference between this embodiment and the second embodiment is the arrow coming from the speech synthesizer 109 to the address transmitting unit 101.
  • the description of this third embodiment will therefore be restricted to the processing which is performed in this situation.
  • step s310 the speech synthesizer 109 determines whether or not a hypertext address is found and output from the received speech data. If there is not, then the processing proceeds to step s109 as before. On the other hand, if an address is output, then the processing returns to step s101 where the output address is input to the address transmitting unit 101 so that the linked web page is accessed.
  • the speech synthesizer 109 in addition to generating a synthesized speech signal corresponding to the received speech data on the basis of commands within the speech data and/or commands input by the user via the user interface 209, the speech synthesizer 109 also responds to input commands designating a move to another part of the received speech data or to a linked web page.
  • Figure 15 is a flow chart illustrating the processing steps performed by the speech synthesizer 109 in determining whether or not to move to another part of the speech data or to linked data.
  • Figure 16 shows an example of speech data which includes link data 162.
  • Figure 16 also shows the corresponding text which is displayed to the user and which illustrates to the user that there is a link (by virtue of the underlined text).
  • the link data 162 includes and address 162-1 which identifies the location of related information (which may be a related web site or it may be a further file stored in the same web site) and speech data 162-2 of a message for the user (which in this example is: "For Canon-group companies pages, push M key now").
  • step s30901 the process shown starts when the user inputs in step s30901, one of the commands shown in Figure 17 whilst the speech synthesizer 109 is synthesizing speech.
  • the commands include "next link”, “previous link” and “go to the link”.
  • step s30902 the speech synthesizer 109 determines whether or not the input command is the command to move to the next link. If it is, then the processing proceeds to step s30903 where the speech synthesizer 109 searches forward from its current location within the speech data file, for the next link 162 and restarts the synthesizing operation from the speech data 162-2 within that link 162.
  • the processing then returns to step s30901 where the next input command is awaited. In this way, the user can cause the speech synthesizer to skip portions of the speech data being synthesized.
  • step s30902 the speech synthesizer 109 determines that the input command is not a command to go to the next link, then the processing proceeds to step s30904 where the speech synthesizer 109 determines whether or not the input command is a command to return to the previous link. If it is, then the processing proceeds to step s30905 where the speech synthesizer 109 searches backward from its current location for the previous link and then restarts the synthesizing operation from the speech data 612-2 within that link 162. The processing then proceeds to step s30901 where the next input command is awaited.
  • step s30904 the speech synthesizer 109 determines whether or not the input command was to go to the link. If it is, then the speech synthesizer retrieves the hypertext address 162-1 from the link data and outputs the address in step s30907 to the address transmitting unit 101. The processing then ends. If, however, at step s30906 the speech synthesizer 109 determines that the input command is not to go to the link, then the processing returns to step s30901 where the next input command is awaited.
  • the user can control the synthesizing of the current speech data file and can cause the client to access a linked hypertext address to retrieve further text data and/or further speech data.
  • the speech synthesizer 109 can be configured so that speech data 162-2 which forms part of a link is synthesized with a different speed or pitch or with a different type of speaker. If the speaker type is to change, then this can simply be set to be the opposite of the speaker type for speech being generated from speech data which is not part of a link.
  • FIG. 18 is a block diagram illustrating the client server system of this embodiment
  • Figure 19 is a flow diagram illustrating the processing steps initially performed during a browsing operation.
  • the client 1 in this embodiment does not include a speech synthesizer. Therefore, prior to or at the same time that the client 1 transmits a request for the speech data, it transmits a request to the server 3 to send the client 1 a speech synthesizer. The way in which this is performed will now be described in more detail.
  • step s405 the client 1 determines whether or not there is a speech synthesizer module in the client. If there is, then the processing shown in Figure 19 ends and the processing returns to step s105 shown in Figure 2, where the downloading of the speech data is performed in the manner described above.
  • step s405 determines that there is no speech synthesizer
  • processing proceeds to step s406 where a speech data require unit 406 transmits a request for a speech synthesizer module to the server 3.
  • This request is received at step s453 by the speech synthesizer module transmitting unit 407.
  • the speech synthesizer module transmitting unit 407 retrieves an appropriate speech synthesizer module from the storage unit 408 and transmits it, in step s454 back to the client 1.
  • the transmitted speech synthesizer is received in step s407 by a speech synthesizer receiving unit 409.
  • the processing proceeds to step s408 where the received speech synthesizer is initialized and set into a working condition.
  • the processing shown in Figure 19 then ends and the processing then returns to step s105 shown in Figure 2.
  • the client may have a speech synthesizer stored in, for example, a hard disk, it may not be currently running. In this case, rather than downloading the speech synthesizer from the server 3, the client may simply load the stored speech synthesizer into the working memory and set the synthesizer into a working condition.
  • the web page received from the server included both text data and a speech command which identifies that a speech data file exits for the received web page.
  • This speech command included both a file name and an address for this speech data file.
  • the speech command may only include a file name for the speech data file.
  • the client may be programmed to assume that the speech data file is stored in the server from which it downloaded the web page. Therefore, in such an embodiment, the request for the speech data file would be transmitted to the server using the address stored in the address memory 110.
  • the user interface allowed a user to move to different portions of a speech data file being synthesized and to retrieve a further web page and/or a further speech data file.
  • the user interface allows the user to input a command to move from the current point in the speech data file to a next or to a previous link.
  • the links were used as control characters within the speech data files in addition to providing a link to another web page or another speech data file.
  • the speech data file may be arranged in paragraphs or sections, with each paragraph or section having a control header which can be used to control movement of the speech synthesizing operation through the speech data file.
  • the user may input a command to move to a next paragraph or to a previous paragraph.

Abstract

A browser is provided which is operable to download web pages from a remote server and which is operable to process the received web pages to identify whether or not it includes a command identifying that a speech data file exists for the retrieved web page. If such a command is present, the browser retrieves the appropriate data file and uses it to synthesize speech associated with the current web page being viewed.

Description

  • The present invention relates to a speech processing method and apparatus. The invention has particular, although not exclusive relevance to the retrieval of data including speech data, which comprises control data and phoneme data identifying phoneme symbols, from a remote server over a communications channel.
  • Web browsing software exists for allowing users to browse web pages which are stored on servers located throughout the Internet. There are currently two techniques which allow the browsing software to output speech relevant to the web page. In particular, either the remote server can store and send the user recorded voice data which is relevant to the web page, or the user's browser may include a text to speech synthesizer which converts the hypertext in the web page into speech.
  • The problem with sending recorded speech waveform data is that speech waveform data files are relatively large and they must be transmitted over the Internet to the user. Further, each time the user selects a new web page, new speech waveform data files may need to be downloaded. Since this type of speech waveform data is much larger than the hypertext associated with the web page, this will slow down the browser's operation. This is not a problem where the user's browser converts the hypertext of the web page into speech using a text to speech synthesizer. However, with this technique, the synthesized speech can have a poor quality since it depends upon the accuracy of the text to speech converter used by the synthesizer in the web browser.
  • According to one aspect, the present invention provides a web browser which is operable to process web pages received from the Internet in order to determine whether or not there is an associated speech data file stored for the web page in a server on the Internet and, if there is, means for retrieving the speech data file from that server.
  • The invention also provides a data retrieval system which can process received text data to determine whether or not there is an associated speech data file and, if there is, means for retrieving the speech data file a remote storage location.
  • Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings in which:
  • Figure 1 is a block diagram illustrating a server, a client and a computer network which allows the client to browse web pages stored on the server;
  • Figure 2a is a flow diagram illustrating the processing steps performed during a browsing operation using the system shown in Figure 1;
  • Figure 2b illustrates the data format of a web page which can be read out by the client from the server shown in Figure 1 and which includes a speech command which informs the client that speech data is stored for this web page;
  • Figure 2c illustrates another data format of a web page which can be read out by the client from the server shown in Figure 1 and which includes a speech command which informs the client that speech data is stored for this web page;
  • Figure 2d illustrates another data format of a web page which can be read out by the client from the server shown in Figure 1 and which includes a speech command which informs the client that speech data is stored for this web page;
  • Figure 3 illustrates the phoneme symbols used in the speech data and, next to each symbol, an example of how the phoneme is used;
  • Figure 4 illustrates prosody control data which is embedded within the phoneme data and which controls the pronunciation of the phoneme data using a speech synthesizer;
  • Figure 5 illustrates the form of speaker control data which is transmitted from the server to the client and which controls the speed and pitch of the speech generated by a synthesizer forming part of the client shown in Figure 1;
  • Figure 6 illustrates two examples of text and the corresponding speech data and prosody data transmitted by the server;
  • Figure 7 illustrates a further example of text for display and corresponding speech data transmitted by the server to the client which includes both prosody control data and speaker control data;
  • Figure 8 is a block diagram illustrating a client server system similar to that shown in Figure 1 but which includes a user interface which allows a user to set parameters of the speaker control data;
  • Figure 9 is a flow diagram illustrating the processing steps performed during a browsing operation using the system shown in Figure 8;
  • Figure 10 illustrates the processing steps performed by the speech synthesizer under control of user commands input via the user interface;
  • Figure 11a illustrates a displayed user interface which allows a user of the client to control the synthesizing operation of the speech synthesizer;
  • Figure 11b illustrates another displayed user interface which allows a user of the client to control the synthesizing operation of the speech synthesizer;
  • Figure 12 is a table illustrating the correlation between key-strokes and their corresponding function in a keyboard controlled user interface for allowing a user to control the synthesizing operation of the speech synthesizer;
  • Figure 13 is a block diagram illustrating a client server system similar to that shown in Figure 8 but which allows a user to input commands via the user interface to move to another part of the speech data or to other linked data;
  • Figure 14 is a flow diagram illustrating the processing steps performed during a browsing operation using the system shown in Figure 13;
  • Figure 15 is a flow diagram illustrating the processing steps performed in controlling the operation of the speech synthesizer in moving to another part of the speech data or to other linked data;
  • Figure 16 illustrates an example of text for display and corresponding speech data transmitted by the server to the client, which speech data includes links to other data files in addition to the prosody control data and the speaker control data;
  • Figure 17 is a table illustrating the correlation between key-strokes and their corresponding function in a keyboard controlled user interface used to control the browsing operation of the system shown in Figure 13;
  • Figure 18 is a block diagram of a client server system similar to that shown in Figure 1, in which the client does not include a speech synthesizer but which includes a module for receiving a speech synthesizer from the server; and
  • Figure 19 is a flow diagram illustrating the processing steps performed during a browsing operation using the system shown in Figure 18.
  • FIRST EMBODIMENT
  • Figure 1 is a block diagram showing an Internet browsing system embodying the present invention. Although the invention is described with reference to an Internet browsing system, the invention may be used in any hypertext system which includes a first and second apparatus which are connected together via a network or other communications channel and in which data from the second apparatus can be retrieved by the first apparatus and can be output as speech.
  • As shown in Figure 1, the system comprises a client 1, a server 3, a communications network 5 for linking the client 1 to the server 3. The system is generally called a client server system. Both the client 1 and the server 3 comprise the modules illustrated in Figure 1 and operate under control of a stored program in order to carry out the process steps shown in Figure 2. The control program for each of the client and server may be stored in a memory (not shown) at the time of manufacture or it may be read out from a storage medium detachable from the client or server or downloaded from a remote terminal connected to the client or server via a computer network.
  • The operation of the client server system shown in Figure 1 will now be described referring also to Figure 2. As shown in Figure 2, in step s101, an input address for a web page (which is a hypertext data file) is input by a user of the client 1. The input web address is stored in the address memory 110 and identifies the server which stores the web page and the file name of the web page that the client wishes to download. The input web address is then transmitted, in step s102, by a web address transmitting unit 101 to the server 3, where it is received, in step s151, by a web page communicating unit 102. The web page communicating unit 102 then reads out the web page from the storage unit 103 corresponding to the received web page address and transmits it, in step s152, back to the client 1. The web page receiving unit 104 in the client 1 receives, in step s103, the web page transmitted from the server 3 and a web page display unit 105 in the client 1 develops image and character data on the basis of the received web page and displays it on a display (not shown) in step s104. Then, in step s105, a speech data require unit 106 in the client 1 processes the received web page and determines whether it comprises a requirement for speech data. In particular, the speech data require unit 106 processes the received web page to determine whether or not it includes a predetermined speech command which identifies that a speech data file exists which is associated with the received web page.
  • Figures 2b, 2c and 2d, illustrate the form of different types of web pages which may be received in step s103 by the client 1. As shown, each of the web pages includes a header portion 11 which defines, among other things, the size and position of a window in which the web page will be displayed; a text portion 13 which includes the text to be displayed and a speech command portion 15 which identifies that there is speech data associated with the current web page. As shown in Figures 2b, 2c and 2d, the speech command portion 15 may occur before or after the text portion 13 and will comprise at least the file name 17 of the file holding the associated speech data. In this embodiment, the speech command also includes an address portion 19 which identifies where the speech data file is stored. In most cases, the address 19 will identify the same server 3 which transmitted the current web page being viewed by the user. However this is not essential and the address portion 19 may identify a different server. Further, as illustrated in Figure 2d, the speech command may include the file names 17a and 17b of more than one speech data file and the corresponding address portions 19a and 19b. In this embodiment, the address portion 19 also identifies where the speech data file 17 is to be found within the identified server.
  • If at step s105, the client 1 determines that there is a requirement for speech data, then the processing proceeds to step s106, otherwise it proceeds to step s109. In step s106, a speech data require unit 106 transmits a request for the speech data. This request for speech data includes the file name 17 for the speech data file and its address 19. A speech data communicating unit 107 located within the server 3 receives, in step s153, the speech data request transmitted from the client 1 and then retrieves the appropriate speech data file from the speech data storage unit 108 and then transmits it back to the client 1 in step s154. A speech synthesizer 109 located within the client 1 receives, in step s107, the speech data file transmitted from the server 3, transforms the speech data into a synthesized speech signal in step s108 and outputs the synthesized speech signal to a loudspeaker (not shown).
  • The processing then proceeds to step s109 where the client 1 determines whether or not the browsing process is finished. If it has, then the processing ends, otherwise the processing returns to step s101 where the process is continued for a new input address. In this embodiment, the browser is a computer program run on a conventional computer system and which is displayed as a window on the display of the computer system. Therefore, in order to finish the browsing process, the user of the client 1 can simply close the window currently running the browser program.
  • In this embodiment, the speech data within the speech data file transmitted from the server 3 to the client 1 includes phoneme data (identifying phoneme symbols) which defines the sound signals to be output by the speech synthesizing unit 109; prosody control data which is used to control the stress and intonation of the sounds defined by the phoneme data; and speaker control data which controls the pitch, speed and type of speaker (male or female) of the speech signals generated by the speech synthesizing unit 109. Figure 3 illustrates the phoneme symbols used in this embodiment which are transmitted from the server 3 to the client 1. Figure 3 also gives, next to each symbol, an example of how the corresponding phoneme is used. As shown in Figure 3, the phoneme symbols are split into three groups - those relating to simple vowels 25, those relating to diphthongs 27 and those relating to consonants 29. Figure 4 illustrates the prosody control data used in this embodiment, which, in use, is embedded within the transmitted phoneme data. As shown in Figure 4, the prosody control data includes symbols which identify boundaries between syllables and words and symbols for controlling the stress applied to vowels and other symbols which control the pronunciation of the phonemes. Figure 5 illustrates the form of the speaker control data used in this embodiment. The top half 31 of Figure 5 illustrates the way in which the speaker control data is used, whilst the lower half 33 of Figure 5 illustrates the parameters that can be set. As shown, the parameters include the speed at which the speech is generated (which can be set to a value of 1 to 4); the pitch of the generated speech (which can be set to a value indicative of a shift from a predetermined standard speaker's pitch); and the type of speaker (which can either be set to male or female).
  • Figure 6 illustrates the phoneme data together with the embedded prosody control data generated for the example text: "New York City." and "This is a text-to-speech system.". As shown, the prosody control data used in these examples include the use of a primary stress control symbol "1" which causes the speech synthesizer 109 to place more stress on the subsequent vowel and a period control symbol "." which causes the speech synthesizer 109 to add a pause.
  • Figure 7 illustrates a further example which shows text data for display 39 together with corresponding speech data 41 which includes the use of both the above described prosody control data and speaker control data. As shown, the speaker control data may be inserted within the phoneme data so that, for example, the type of speaker outputting the speech may be changed during the speech synthesizing operation.
  • As those skilled in the art will appreciate, since the speech data file which is transmitted from the server 3 to the client 1 only includes phoneme data and appropriate control data, the speech data file will be much smaller than a corresponding recorded speech waveform file and it can therefore be retrieved from the server 3 more quickly. Further, since the data for controlling the way in which the speech synthesizer 109 synthesizes the speech is also transmitted from the server 3, it can be set in advance by the owner of the web site. Therefore, the web site owner can set the control data so that the client's speech synthesizer 109 will output the speech in a predetermined manner.
  • SECOND EMBODIMENT
  • A second embodiment of the client server system shown in Figure 1 will now be described with reference to Figures 8 to 12. Figure 8 is a block diagram illustrating the client server system of this embodiment and Figure 9 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation. As those skilled in the art will appreciate by comparing Figures 8 and 9 with Figures 1 and 2a, the main difference between this embodiment and the first embodiment is in the provision of a speech synthesizer user interface 209 within the client 1. The description of this second embodiment will therefore be restricted to the interaction of this speech synthesizer user interface unit 209 with the other modules of the system.
  • In this embodiment, when the speech data require unit 106 determines that the received web page includes a speech command identifying that a speech data file exists which is associated with the current web page being viewed, it starts the speech synthesizer user interface unit 209 as well as transmitting the requirement for speech data to the server 3. Once the speech data communicating unit 107 located within the server 3 retrieves the appropriate speech data file and transmits it back to the client 1, the speech synthesizer 109 synthesizes a corresponding speech signal under control of the speech synthesizer user interface unit 209.
  • Figure 11a shows an example of a graphical user interface which the speech synthesizer user interface 209 outputs to the display (not shown) of the client 1, and which allows the user of the client to set various control parameters and input various commands used to control the operation of the speech synthesizer 109. In particular, the graphical user interface illustrated in Figure 11a allows the user to start, stop, pause and restart the operation of the speech synthesizer 109, to change the pitch and speed of the synthesized speech signal and to change the type of speaker etc.
  • Figure 10 is a flow chart illustrating the way in which commands and settings input via the speech synthesizer user interface 209, controls the operation of the speech synthesizer 109. As shown, in step s20901, the speech synthesizer 109 receives a command input by the user via the user interface 209. The speech synthesizer 109 then determines, in step s20902 if the input command is a start command. If it is, then the processing proceeds to step S20903 where the speech synthesizer 109 begins to output a synthesized speech signal from the received speech data and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • If at step s20902, the input command is not the start command, then the processing proceeds to step s20904 where the speech synthesizer 109 determines if the input command is a stop command. If it is, then the processing proceeds to step s20905 where the speech synthesizer 109 stops outputting the synthesized speech signal and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • If at step s20904, the input command is not the stop command, then the processing proceeds to step s20906 where the speech synthesizer 109 determines whether or not the input command is a pause command. If it is, then the processing proceeds to step s20907 where the speech synthesizer 109 pauses outputting the synthesized speech signal at the current location in the speech data file and then the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • If at step s20906, the speech synthesizer 109 determines that the input command is not the pause command, then processing proceeds to step s20908 where the speech synthesizer 109 determines whether or not the input command is a restart command. If it is, then the processing proceeds to step s20909 where the speech synthesizer 109 restarts outputting the synthesized speech signal from the current location (corresponding to the location where the outputting of the synthesized speech was paused) in the speech data file and then the processing returns to step s20901 where the speech synthesizer awaits the next input command.
  • If at step s20908, the speech synthesizer 109 determines that the input command is not the restart command, then the processing proceeds to step s20910 where the speech synthesizer 109 determines if the input command is a command to change the pitch of the synthesized speech. If it is, then the processing proceeds to step s20911 where the pitch of the generated speech signal is changed on the basis of the pitch level set by the user. The processing then returns to step s20901 where speech synthesizer 109 awaits the next input command.
  • If at step s20910, the speech synthesizer 109 determines that the input command is not a change of pitch command, then the processing proceeds to step s20912 where the speech synthesizer 109 determines if the input command is for changing the speed at which the speech is being synthesized. If it is, then the processing proceeds to step s20913 where the speech synthesizer 109 changes the speed at which the speech is being synthesized on the basis of the speed level set by the user. The processing then returns to step s20901 where the speech synthesizer awaits the next input command.
  • If at step s20912, the speech synthesizer 109 determines that the input command is not a speed change command, then processing proceeds to step s20914 where the speech synthesizer 109 determines if the input command is a command to change the type of speaker. If it is, then the processing proceeds to step s20915 where the settings of the speech synthesizer 109 are changed in accordance with the type of speaker set by the user. The processing then returns to step s20901 where the next input command is awaited.
  • If at step s20914, the speech synthesizer 109 determines that the input command is not a command to change the type of speaker, then the processing proceeds to step s20916 where the speech synthesizer 109 determines whether or not the input command is to end the synthesizing operation. If it is, then the processing ends, otherwise the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
  • In this embodiment, the speech synthesizer user interface 209 is a graphical user interface such as the two example interfaces shown in Figure 11. As shown, the interfaces include a control button 10 for inputting a command to start a synthesizing operation; a control button 11 for stopping a synthesizing operation and a control button 12 for controlling the pausing and restarting of a synthesizing operation, each of which is activated by a user controlled cursor (not shown) in a known manner. The interfaces also include a menu select button 13 for changing the type of speaker from a male speaker to a female speaker and vice versa and two sliders 14 and 15 for controlling the pitch and speed of the synthesized speech respectively. In the interface shown in Figure 11a, a progress slider 16 is also provided to show the user the progress of the speech synthesizer in generating synthesized speech signals for the received speech data file.
  • As those skilled in the art will appreciate as an alternative to providing a graphical user interface, the interface may be provided through a numerical or general keyboard forming part of the client, provided the relation between the keys and the commands is defined in advance. Figure 12 is a table illustrating the correlation between keys of a numeric keyboard and a general keyboard which may be programmed in advance to allow the user to control the speech synthesizer 109 using the keys of the keyboards.
  • As those skilled in the art will appreciate, with the client of this second embodiment, the user can change the default settings of the control parameters used to control the speech synthesizing operation set by the web site owners. In this way, the user of the client can customize the way in which the speech synthesizer synthesizes the speech in accordance with the user's preferences.
  • THIRD EMBODIMENT
  • A third embodiment of the client server system shown in Figure 1 will now be described with reference to Figures 13 to 17. Figure 13 is a block diagram illustrating the client server system of this embodiment and Figure 14 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation. As those skilled in the art will appreciate by comparing Figures 8 and 9 with Figures 13 and 14, the main difference between this embodiment and the second embodiment is the arrow coming from the speech synthesizer 109 to the address transmitting unit 101.
  • This allows addresses for other speech data or other hypertext data which is embedded within the received speech data file to be extracted and the corresponding data retrieved. The description of this third embodiment will therefore be restricted to the processing which is performed in this situation.
  • Referring to Figure 14, after the speech synthesizer 109 begins to synthesize speech in step s108, the processing proceeds to step s310 where the speech synthesizer 109 determines whether or not a hypertext address is found and output from the received speech data. If there is not, then the processing proceeds to step s109 as before. On the other hand, if an address is output, then the processing returns to step s101 where the output address is input to the address transmitting unit 101 so that the linked web page is accessed.
  • In this embodiment, in addition to generating a synthesized speech signal corresponding to the received speech data on the basis of commands within the speech data and/or commands input by the user via the user interface 209, the speech synthesizer 109 also responds to input commands designating a move to another part of the received speech data or to a linked web page. The way in which this is performed will now be described with reference to Figures 15 and 16. In particular, Figure 15 is a flow chart illustrating the processing steps performed by the speech synthesizer 109 in determining whether or not to move to another part of the speech data or to linked data. Figure 16 shows an example of speech data which includes link data 162. Figure 16 also shows the corresponding text which is displayed to the user and which illustrates to the user that there is a link (by virtue of the underlined text). As shown in the speech data portion of Figure 16, the link data 162 includes and address 162-1 which identifies the location of related information (which may be a related web site or it may be a further file stored in the same web site) and speech data 162-2 of a message for the user (which in this example is: "For Canon-group companies pages, push M key now").
  • Returning to Figure 15, the process shown starts when the user inputs in step s30901, one of the commands shown in Figure 17 whilst the speech synthesizer 109 is synthesizing speech. As shown in Figure 17, the commands include "next link", "previous link" and "go to the link". Once an input command 13 is received in step s30901, the processing proceeds to step s30902 where the speech synthesizer 109 determines whether or not the input command is the command to move to the next link. If it is, then the processing proceeds to step s30903 where the speech synthesizer 109 searches forward from its current location within the speech data file, for the next link 162 and restarts the synthesizing operation from the speech data 162-2 within that link 162. The processing then returns to step s30901 where the next input command is awaited. In this way, the user can cause the speech synthesizer to skip portions of the speech data being synthesized.
  • If at step s30902, the speech synthesizer 109 determines that the input command is not a command to go to the next link, then the processing proceeds to step s30904 where the speech synthesizer 109 determines whether or not the input command is a command to return to the previous link. If it is, then the processing proceeds to step s30905 where the speech synthesizer 109 searches backward from its current location for the previous link and then restarts the synthesizing operation from the speech data 612-2 within that link 162. The processing then proceeds to step s30901 where the next input command is awaited.
  • If at step s30904, the speech synthesizer 109 determines that the input command is not a command to return to the previous link, then the speech synthesizer 109 determines whether or not the input command was to go to the link. If it is, then the speech synthesizer retrieves the hypertext address 162-1 from the link data and outputs the address in step s30907 to the address transmitting unit 101. The processing then ends. If, however, at step s30906 the speech synthesizer 109 determines that the input command is not to go to the link, then the processing returns to step s30901 where the next input command is awaited.
  • Therefore, in this way, the user can control the synthesizing of the current speech data file and can cause the client to access a linked hypertext address to retrieve further text data and/or further speech data.
  • In order to inform the user that the speech data 162-2 currently being synthesized corresponds to a link 162, the speech synthesizer 109 can be configured so that speech data 162-2 which forms part of a link is synthesized with a different speed or pitch or with a different type of speaker. If the speaker type is to change, then this can simply be set to be the opposite of the speaker type for speech being generated from speech data which is not part of a link.
  • FOURTH EMBODIMENT
  • A fourth embodiment of the client server system shown in Figure 1 will now be described with reference to Figures 18 and 19. Figure 18 is a block diagram illustrating the client server system of this embodiment and Figure 19 is a flow diagram illustrating the processing steps initially performed during a browsing operation. The main difference between this embodiment and the first embodiment is that the client 1 in this embodiment does not include a speech synthesizer. Therefore, prior to or at the same time that the client 1 transmits a request for the speech data, it transmits a request to the server 3 to send the client 1 a speech synthesizer. The way in which this is performed will now be described in more detail.
  • Referring to Figure 19, after the received hypertext has been displayed to the user in step s104, the processing proceeds to step s405 where the client 1 determines whether or not there is a speech synthesizer module in the client. If there is, then the processing shown in Figure 19 ends and the processing returns to step s105 shown in Figure 2, where the downloading of the speech data is performed in the manner described above.
  • If, however, at step s405 the client determines that there is no speech synthesizer, then processing proceeds to step s406 where a speech data require unit 406 transmits a request for a speech synthesizer module to the server 3. This request is received at step s453 by the speech synthesizer module transmitting unit 407. The speech synthesizer module transmitting unit 407 then retrieves an appropriate speech synthesizer module from the storage unit 408 and transmits it, in step s454 back to the client 1. The transmitted speech synthesizer is received in step s407 by a speech synthesizer receiving unit 409. The processing then proceeds to step s408 where the received speech synthesizer is initialized and set into a working condition. The processing shown in Figure 19 then ends and the processing then returns to step s105 shown in Figure 2.
  • In some situations, although the client may have a speech synthesizer stored in, for example, a hard disk, it may not be currently running. In this case, rather than downloading the speech synthesizer from the server 3, the client may simply load the stored speech synthesizer into the working memory and set the synthesizer into a working condition.
  • In the above embodiments, the web page received from the server included both text data and a speech command which identifies that a speech data file exits for the received web page. This speech command included both a file name and an address for this speech data file. In an alternative embodiment, the speech command may only include a file name for the speech data file. In this case, the client may be programmed to assume that the speech data file is stored in the server from which it downloaded the web page. Therefore, in such an embodiment, the request for the speech data file would be transmitted to the server using the address stored in the address memory 110.
  • In the third embodiment described above, the user interface allowed a user to move to different portions of a speech data file being synthesized and to retrieve a further web page and/or a further speech data file. In particular the user interface allows the user to input a command to move from the current point in the speech data file to a next or to a previous link. In this way, the links were used as control characters within the speech data files in addition to providing a link to another web page or another speech data file. In an alternative embodiment, the speech data file may be arranged in paragraphs or sections, with each paragraph or section having a control header which can be used to control movement of the speech synthesizing operation through the speech data file. For example, in such an embodiment, the user may input a command to move to a next paragraph or to a previous paragraph.

Claims (55)

  1. A data retrieval apparatus comprising:
    means for receiving text data;
    means for processing the received text data to determine whether or not it includes data identifying a speech data file which is associated with the received text data and which includes phoneme data and control data for use by a speech synthesizer in synthesizing a speech signal corresponding to the phoneme data under control of the control data; and
    means responsive to said processing means, for retrieving the speech data file.
  2. An apparatus according to claim 1, further comprising means for transmitting a request for the text data to a remote terminal which stores the text data.
  3. An apparatus according to claim 1 or 2, comprising storing means for storing one or more predetermined commands identifying speech data and wherein said processing means is operable to compare said stored predetermined commands with the received text data.
  4. An apparatus according to any preceding claim, wherein said retrieving means comprises means for transmitting a request for said speech data file to a storage location and means for receiving said speech data file from said storage location.
  5. An apparatus according to any preceding claim, further comprising a user interface for allowing a user of said apparatus to set one or more control parameters of said control data.
  6. An apparatus according to any preceding claim, wherein said control data comprises prosody control data for controlling pronunciation of said phoneme data.
  7. An apparatus according to any preceding claim, wherein said control data comprises speaker control data for controlling speaker related parameters of said speech.
  8. An apparatus according to claim 7, wherein said speaker related parameters include the speed at which the speech is synthesized.
  9. An apparatus according to claim 7 or 8, wherein said speaker related parameters include the pitch at which said speech is synthesized.
  10. An apparatus according to any of claims 7 to 9, wherein said speaker related parameters include whether the speech is synthesized as a male voice or a female voice.
  11. An apparatus according to any preceding claim, wherein said speech data file includes one or more links to other data and wherein said apparatus comprises means, responsive to a command input by a user, for retrieving said other data.
  12. An apparatus according to claim 11, wherein said one or more links include an address portion which identifies the storage location of said other data and speech data associated with said link.
  13. An apparatus according to any preceding claim, wherein said retrieving means is operable to retrieve a speech synthesizer from a remote terminal for synthesizing speech signals corresponding to said speech data.
  14. An apparatus according to any of claims 1 to 12 further comprising a speech synthesizer operable to receive said phoneme data and said control data and operable to output a speech signal corresponding to the phoneme data under control of said control data.
  15. An apparatus according to claim 14, further comprising a loudspeaker for generating speech sounds corresponding to the speech signal synthesized by said speech synthesizer.
  16. An apparatus according to claim 14 or 15, wherein said retrieval means is operable to retrieve set up data for controlling a set up procedure for said synthesizer prior to the generation of said speech signal corresponding to the speech data.
  17. An apparatus according to any of claims 14 to 16, further comprising a user interface for allowing a user to move to different portions of a speech data file to be synthesized.
  18. An apparatus according to claim 17 when dependent upon claim 11, wherein said user interface allows a user to move to a next link or to a previous link within said speech data file.
  19. An apparatus according to any preceding claim, further comprising output means for outputting the received text data to a user.
  20. An apparatus according to claim 19, wherein said output means comprises a display.
  21. An apparatus according to claim 19 or 20, wherein said processing means is operable to process said text data after said output means has output said text data to said user.
  22. An apparatus according to any preceding claim, wherein said text data forms part of a hypertext data file.
  23. An apparatus according to any preceding claim, wherein said text data forms part of a web page.
  24. An apparatus according to any preceding claim, wherein said text data includes data identifying a storage location of said speech data file and wherein said retrieving means is operable to retrieve said speech data file from said storage location.
  25. A web browser comprising a data retrieval apparatus according to any preceding claim and a display for displaying the received text data and a loudspeaker for outputting synthesized speech.
  26. A data retrieval system comprising one or more computer terminals storing at least text data and wherein at least one of said terminals also stores speech data corresponding to some of the stored text data, a data retrieval apparatus according to any preceding claim for retrieving text data and speech data from said terminals and a communications network for linking said retrieval apparatus with said one or more terminals and through which retrieved text data and speech data passes.
  27. A system according to claim 26, wherein said one or more computer terminals are servers, wherein said communications network forms part of the Internet and said data retrieval apparatus comprises a web browser.
  28. A data retrieval method comprising the steps of:
    receiving text data;
    processing the received text data to determine whether or not it includes data identifying a speech data file which is associated with the received text data and which includes phoneme data and control data for use by a speech synthesizer in synthesizing speech corresponding to the phoneme data under control of the control data; and
    retrieving the speech data file.
  29. A method according to claim 28, further comprising the step of transmitting a request for the text data to a remote terminal which stores the text data.
  30. A method according to claim 28 or 29, comprising the step of storing one or more predetermined commands identifying speech data and wherein said processing step compares said stored predetermined commands with the received text data.
  31. A method according to any of claims 28 to 30, wherein said retrieving step comprises the steps of transmitting a request for said speech data file to a storage location and receiving said speech data file from said storage location.
  32. A method according to any of claims 28 to 31, further comprising the step of receiving control parameters from a user interface
  33. A method according to any of claims 28 to 32, wherein said control data comprises prosody control data for controlling pronunciation of said phoneme data.
  34. A method according to any of claims 28 to 33, wherein said control data comprises speaker control data for controlling speaker related parameters of said speech.
  35. A method according to claim 34, wherein said speaker related parameters include the speed at which the speech is synthesized.
  36. A method according to claim 34 or 35, wherein said speaker related parameters include the pitch at which said speech is synthesized.
  37. A method according to any of claims 34 to 36, wherein said speaker related parameters include whether the speech is synthesized as a male voice or a female voice.
  38. A method according to any of claims 28 to 37, wherein said speech data file includes one or more links to other data and wherein said method comprises the step of retrieving said other data in response to a command input by a user.
  39. A method according to claim 38, wherein said one or more links include an address portion which identifies the storage location of said other data and speech data associated with said link.
  40. A method according to any of claims 28 to 39, wherein said retrieving step retrieves a speech synthesizer from a remote terminal for synthesizing speech signals corresponding to said speech data.
  41. A method according to any of claims 28 to 39, further comprising the steps of synthesizing and outputting a speech signal corresponding to the phoneme data under control of said control data.
  42. A method according to claim 41, further comprising the step of outputting said synthesized speech signal to a loudspeaker.
  43. A method according to claim 41 or 42, wherein said retrieval step retrieves set up data for controlling a set up procedure for a speech synthesizer prior to the generation of said speech signal corresponding to the speech data.
  44. A method according to any of claims 41 to 43, further comprising the step of receiving an input command from a user to move to different portions of a speech data file to be synthesized.
  45. A method according to claim 44 when dependent upon claim 37, wherein said input command is a command to move to a next link or to a previous link within said speech data file.
  46. A method according to any of claims 28 to 45, further comprising the step of outputting the received text data to a user.
  47. A method according to claim 46, wherein said output step outputs said text data to a display.
  48. A method according to claim 46 or 47, wherein said processing step is performed after said output step has output said text data to said user.
  49. A method according to any of claims 28 to 48, wherein said text data forms part of a hypertext data file.
  50. A method according to any of claims 28 to 49, wherein said text data forms part of a web page.
  51. A method according to any of claims 28 to 50, wherein said text data includes data identifying a storage location of said speech data file and wherein said retrieving step retrieves said speech data file from said storage location.
  52. A data retrieval method comprising the steps of:
    at a first computer terminal:
    receiving text data;
    processing the received text data to determine whether or not it includes data identifying a speech data file which is associated with the received text data and which includes phoneme data and control data for use by a speech synthesizer in synthesizing a speech signal corresponding to the phoneme data under control of the control data; and
    requesting a second remote computer terminal to send said speech data file to said first computer terminal; and
    at said second remote computer terminal:
    receiving said request for said speech data file;
    retrieving said speech data file in accordance with said request; and
    transmitting said retrieved speech data file to said first computer terminal.
  53. A method according to claim 52 performed over the Internet.
  54. A storage medium storing processor implementable instructions for controlling a processor to implement the method of any one of claims 28 to 53.
  55. Processor implementable instructions for controlling a processor to implement the method of one of claims 28 to 53.
EP00306355A 1999-07-30 2000-07-26 Parsing of downloaded documents for a speech synthesis enabled browser Expired - Lifetime EP1073036B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP21721999 1999-07-30
JP11217219A JP2001043064A (en) 1999-07-30 1999-07-30 Method and device for processing voice information, and storage medium

Publications (3)

Publication Number Publication Date
EP1073036A2 true EP1073036A2 (en) 2001-01-31
EP1073036A3 EP1073036A3 (en) 2003-12-17
EP1073036B1 EP1073036B1 (en) 2005-12-14

Family

ID=16700731

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00306355A Expired - Lifetime EP1073036B1 (en) 1999-07-30 2000-07-26 Parsing of downloaded documents for a speech synthesis enabled browser

Country Status (3)

Country Link
EP (1) EP1073036B1 (en)
JP (1) JP2001043064A (en)
DE (1) DE60024727T2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003054731A2 (en) * 2001-12-20 2003-07-03 Siemens Aktiengesellschaft Method for conducting a computer-aided transformation of structured documents
CN1301452C (en) * 2003-11-11 2007-02-21 富士通株式会社 Modal synchronization control method and multimodal interface system
WO2008132533A1 (en) * 2007-04-26 2008-11-06 Nokia Corporation Text-to-speech conversion method, apparatus and system
CN103827961A (en) * 2011-10-28 2014-05-28 日立公共系统有限公司 Apparatus for providing text data with synthesized voice information and method for providing text data
WO2014130177A1 (en) * 2013-02-20 2014-08-28 Google Inc. Methods and systems for sharing of adapted voice profiles
WO2015116151A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Voice input command
CN110737817A (en) * 2018-07-02 2020-01-31 中兴通讯股份有限公司 Information processing method and device of browser, intelligent device and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4014361B2 (en) * 2001-01-31 2007-11-28 シャープ株式会社 Speech synthesis apparatus, speech synthesis method, and computer-readable recording medium recording speech synthesis program
JP4225703B2 (en) * 2001-04-27 2009-02-18 インターナショナル・ビジネス・マシーンズ・コーポレーション Information access method, information access system and program
JP2002358092A (en) * 2001-06-01 2002-12-13 Sony Corp Voice synthesizing system
JP4653572B2 (en) * 2005-06-17 2011-03-16 日本電信電話株式会社 Client terminal, speech synthesis information processing server, client terminal program, speech synthesis information processing program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0542628A2 (en) * 1991-11-12 1993-05-19 Fujitsu Limited Speech synthesis system
WO1997023973A1 (en) * 1995-12-22 1997-07-03 Rutgers University Method and system for audio access to information in a wide area computer network
EP0848373A2 (en) * 1996-12-13 1998-06-17 Siemens Corporate Research, Inc. A sytem for interactive communication
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
GB2336974A (en) * 1998-04-28 1999-11-03 Ibm Singlecast interactive radio system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0542628A2 (en) * 1991-11-12 1993-05-19 Fujitsu Limited Speech synthesis system
WO1997023973A1 (en) * 1995-12-22 1997-07-03 Rutgers University Method and system for audio access to information in a wide area computer network
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
EP0848373A2 (en) * 1996-12-13 1998-06-17 Siemens Corporate Research, Inc. A sytem for interactive communication
US5899975A (en) * 1997-04-03 1999-05-04 Sun Microsystems, Inc. Style sheets for speech-based presentation of web pages
GB2336974A (en) * 1998-04-28 1999-11-03 Ibm Singlecast interactive radio system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003054731A2 (en) * 2001-12-20 2003-07-03 Siemens Aktiengesellschaft Method for conducting a computer-aided transformation of structured documents
WO2003054731A3 (en) * 2001-12-20 2004-04-01 Siemens Ag Method for conducting a computer-aided transformation of structured documents
CN1301452C (en) * 2003-11-11 2007-02-21 富士通株式会社 Modal synchronization control method and multimodal interface system
WO2008132533A1 (en) * 2007-04-26 2008-11-06 Nokia Corporation Text-to-speech conversion method, apparatus and system
CN103827961A (en) * 2011-10-28 2014-05-28 日立公共系统有限公司 Apparatus for providing text data with synthesized voice information and method for providing text data
US9318104B1 (en) 2013-02-20 2016-04-19 Google Inc. Methods and systems for sharing of adapted voice profiles
US9117451B2 (en) 2013-02-20 2015-08-25 Google Inc. Methods and systems for sharing of adapted voice profiles
CN105190745A (en) * 2013-02-20 2015-12-23 谷歌公司 Methods and systems for sharing of adapted voice profiles
WO2014130177A1 (en) * 2013-02-20 2014-08-28 Google Inc. Methods and systems for sharing of adapted voice profiles
CN105190745B (en) * 2013-02-20 2017-02-08 谷歌公司 Methods and device for sharing of adapted voice profiles
CN106847258A (en) * 2013-02-20 2017-06-13 谷歌公司 Method and apparatus for sharing adjustment speech profiles
CN106847258B (en) * 2013-02-20 2020-09-29 谷歌有限责任公司 Method and apparatus for sharing an adapted voice profile
WO2015116151A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Voice input command
US10978060B2 (en) 2014-01-31 2021-04-13 Hewlett-Packard Development Company, L.P. Voice input command
CN110737817A (en) * 2018-07-02 2020-01-31 中兴通讯股份有限公司 Information processing method and device of browser, intelligent device and storage medium

Also Published As

Publication number Publication date
DE60024727T2 (en) 2006-07-20
JP2001043064A (en) 2001-02-16
EP1073036B1 (en) 2005-12-14
EP1073036A3 (en) 2003-12-17
DE60024727D1 (en) 2006-01-19

Similar Documents

Publication Publication Date Title
US5983184A (en) Hyper text control through voice synthesis
EP1490861B1 (en) Method, apparatus and computer program for voice synthesis
US5899975A (en) Style sheets for speech-based presentation of web pages
EP2112650B1 (en) Speech synthesis apparatus, speech synthesis method, speech synthesis program, portable information terminal, and speech synthesis system
US20020110248A1 (en) Audio renderings for expressing non-audio nuances
JP2001521194A (en) System and method for aurally representing a page of HTML data
KR20030040486A (en) Method and system for synchronizing audio and visual presentation in a multi-modal content renderer
JPH10274997A (en) Document reading-aloud device
JPH06214741A (en) Graphics user interface for control of text-to-voice conversion
EP1073036B1 (en) Parsing of downloaded documents for a speech synthesis enabled browser
US6732078B1 (en) Audio control method and audio controlled device
JP2741833B2 (en) System and method for using vocal search patterns in multimedia presentations
JP2000231475A (en) Vocal reading-aloud method of multimedia information browsing system
JP2001268669A (en) Device and method for equipment control using mobile telephone terminal and recording medium
US6876969B2 (en) Document read-out apparatus and method and storage medium
JP2001306601A (en) Device and method for document processing and storage medium stored with program thereof
JP4311710B2 (en) Speech synthesis controller
WO1997037344A1 (en) Terminal having speech output function, and character information providing system using the terminal
JPH09311775A (en) Device and method voice output
JP2010146381A (en) Web page browsing apparatus and program
WO2013061719A1 (en) Device for providing text data appended with speech synthesis information, and method for providing text data
JPH08272388A (en) Device and method for synthesizing voice
JP3838193B2 (en) Text-to-speech device, program for the device, and recording medium
JP2002268664A (en) Voice converter and program
JP2005181358A (en) Speech recognition and synthesis system

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17P Request for examination filed

Effective date: 20040510

AKX Designation fees paid

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60024727

Country of ref document: DE

Date of ref document: 20060119

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20060915

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20130731

Year of fee payment: 14

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20130726

Year of fee payment: 14

Ref country code: GB

Payment date: 20130712

Year of fee payment: 14

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60024727

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20140726

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20150331

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20150203

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60024727

Country of ref document: DE

Effective date: 20150203

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140726

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20140731