US7978829B2 - Voice file retrieval method - Google Patents

Voice file retrieval method Download PDF

Info

Publication number
US7978829B2
US7978829B2 US11/637,784 US63778406A US7978829B2 US 7978829 B2 US7978829 B2 US 7978829B2 US 63778406 A US63778406 A US 63778406A US 7978829 B2 US7978829 B2 US 7978829B2
Authority
US
United States
Prior art keywords
voice file
amr
word
voice
specific
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/637,784
Other versions
US20070280440A1 (en
Inventor
Ying-Long Mao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Appliances Corp
Original Assignee
Inventec Appliances Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Appliances Corp filed Critical Inventec Appliances Corp
Assigned to INVENTEC APPLIANCES CORP. reassignment INVENTEC APPLIANCES CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MAO, Ying-long
Publication of US20070280440A1 publication Critical patent/US20070280440A1/en
Application granted granted Critical
Publication of US7978829B2 publication Critical patent/US7978829B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Definitions

  • the invention relates to a method for retrieving voice files, and more particularly to a retrieval method for directly retrieving the home address of the voice file to speed up the searching process.
  • the electronic dictionary having the lexical articulation function needs to store both the definition of the word and the voice file recording the articulation of the word.
  • a search on all the voice files has to be executed so as to retrieve correctly the corresponding voice file for broadcasting.
  • the present invention provides a voice file retrieval method comprising the steps of:
  • the storage home address for storing the respective voice file can be obtained directly from the voice field of the word, so the retrieval speed can be substantially increased. That is to say that the idle time for a user to wait for a lexical articulation in the electronic dictionary can be greatly shortened.
  • FIG. 1 is a flowchart of a preferred voice file retrieval method in accordance with the present invention.
  • FIG. 2 is a schematic diagram showing how a multiple retrieval in accordance with the present invention is performed.
  • the voice file retrieval method comprises the steps as follows.
  • step S 2 Determine whether voicing of the word is needed or not. If negative, the method is ended directly. If the voicing of the word is needed, go to step S 3 .
  • S 3 retrieve the word as well as its accompanying field information from the electronic dictionary.
  • the decision whether the articulation of the word is needed is made prior to the retrieval of the word.
  • the decision whether the articulation of the word is needed can be made posterior to the retrieval of the word.
  • every word is mapped to its own voice file 10 , and every voice file 10 has its storage home address. Every storage home address of the voice file 10 is tagged to the voice field information of the word.
  • the voice field as well as all the information tagged to this voice field can be automatically read.
  • the message tagged in the voice field information could include the storage home address 101 of the voice field 10 .
  • the storage address 101 includes at least an index information, a position information.
  • FIG. 2 a schematic diagram to show how a multiple retrieval of the present invention is performed is shown.
  • the message “0 0011FF” in the voice field is read.
  • the storage home address 101 of the voice field 10 with respect to the word is “0 0011FF”, in which the leading “0” and the following “0011FF” stand for the index information and the position information of the voice file 10 , respectively.
  • S 5 retrieve the voice file 10 in accordance with the storage home address 101 .
  • the retrieval of the voice field 10 is executed in accordance with the aforesaid index and position information and an address index table 20 preset in advance.
  • the index table 201 is established by regrouping the voice files 10 into a plurality of document packets.
  • the document packet for a particular voice file 10 can be located.
  • a position table 202 can be established in accordance with the storage addresses of the document packets.
  • the number of the voice files 10 may be different from the number of the document packets.
  • voice files 10 for 50,000 words can be divided into 16 document packets. Namely, every document packet can contain 3,125 voice files 10 .
  • the 16 document packets can be numbered to establish an index table 201 .
  • the voice file 10 corresponding to the aforesaid “0 0011FF” storage home address represents that the voice file 10 is stored in the document packet numbered as “0” according to the index table 201 .
  • 0x000000 ⁇ 0xFFFFFF are assigned to the addresses of the voice files in this packet.
  • the position information “0011FF” represents the position 0x0011FF in the position table 202 . Namely, the target voice file 10 can be retrieved in accordance with 0x0011FF of the document packet “0”.
  • a heading message 30 is loaded to the voice file 10 so as to form a corresponding adaptive multi-rate (AMR) voice file.
  • AMR adaptive multi-rate
  • the voice file 10 can be separated into a heading message region and a voice message region. While performing AMR compression coding, the voice files 10 have the same heading message. For example, if a voice file of pulse code modulation (PCM) experiences the AMR compression coding by an 8K sampling rate and a compression ratio of 4.75 kbit/s, its first 7 bytes would be 0x23, 0x21, 0x41, 0x4D, 0x52, 0x0A and 0x3C.
  • PCM pulse code modulation
  • the voice files 10 under AMR compression coding could have preferably no heading message. That is, all the heading messages of the voice files 10 in the present invention have been removed.
  • the voice file 10 of the present invention is formed by removing the heading message after the voice file experiences the AMR compression coding.
  • the heading message could be reloaded to integrate with the voice region while the voice file is played, and an original AMR voice file is accordingly formed.
  • every voice file 10 can save 7 bytes of the storage space. That is to say that 341.8 K-bit bytes can be saved in the lexicon of 50,000 words.
  • the heading message can be always with the voice file.
  • the aforesaid separating process and the foresaid reloading process for the heading message is unnecessary any more.
  • a retrieval relationship between the word and its corresponding voice file has been established.
  • the storage home address of the voice file can be directly obtained from the voice field of the word.
  • the retrieval speed of the voice file can be substantially increased and the idle time for a user to wait for an articulation is greatly shortened.
  • the storage format of the voice file is an AMR voice file ridding of the heading message during storage step. Therefore, the storage space required to store the voice files can be greatly reduced. Obviously, by providing the present invention, both the aforesaid problems in the retrieval speed and the storage space in the art can be substantially resolved. In particular, to the mainstream slim mobile communication devices that can only provide a limited storage space, the voice file retrieval method provided by the present invention is extremely suitable.

Abstract

A voice file retrieval method comprising the steps of: inputting a word; determining if voicing of the word is needed or not; obtaining a storage home address of a voice file corresponding to the word from a voice field of the word if the voice of the word is needed; and retrieving the voice file from the storage home address. By providing a voice field of the word, the storage home address of the voice file can be directly obtained. Hence, the retrieval speed can be increased, and the time to wait for an articulation of the word can be shortened.

Description

BACKGROUND OF THE INVENTION
(1) Field of the Invention
The invention relates to a method for retrieving voice files, and more particularly to a retrieval method for directly retrieving the home address of the voice file to speed up the searching process.
(2) Description of the Prior Art
It has been one of the mainstream goals and also one of the must ability for people's career to have a preferable speech ability for electronic dictionaries. Therefore, various electronic dictionaries have been provided to the market to help people achieving the aforesaid goals. In these electronic dictionaries, to have the lexical articulation (voice) is one of the basic elements.
Generally, the electronic dictionary having the lexical articulation function needs to store both the definition of the word and the voice file recording the articulation of the word. When an articulation function of a word has been selected, a search on all the voice files has to be executed so as to retrieve correctly the corresponding voice file for broadcasting.
Traditionally, the amount of the voice files is huge and increasing. Therefore, the storage and search of the voice files in the electronic dictionary has become a severe challenge to this industry.
By taking an electronic Chinese-English bilingual dictionary for example, at least 50,000 voice files are basically needed. Even these voice files are stored by compression coding into respective files of adaptive multi-rate (AMR) format, the storage space would require at least 20,000,000,000,000-bit bytes.
Besides the occupation problem raised by the huge storage space in demand, the speed in retrieving the correct voice file from the mega data bank of the AMR files is usually too slow to be tolerated.
Currently, the function of the simple lexical articulation does merely satisfy the people basic need. The new function in voicing a complete example sentence is become a hot topic to the electronic dictionaries. However, it is much complicated to retrieve the voicing of a complete sentence that is consisted of several words. Generally, plural voice files have to be retrieved so as to compose the articulation of the sentence for voicing a complete sentence. Definitely, the speed in achieving the articulation of a complete sentence is far slower than people expected.
Therefore, how to resolve the storage and retrieval problems in the electronic dictionaries is an important issue that the skilled person in the art is particularly devoted to.
SUMMARY OF THE INVENTION
Accordingly, the present invention provides a voice file retrieval method comprising the steps of:
Inputting a word;
Determining if voicing of the word is needed or not;
If positive, obtaining a storage home address of a voice file corresponding to the word from a voice field of the word; and
Retrieving the voice file from the storage home address.
In the present invention, because the storage home address for storing the respective voice file can be obtained directly from the voice field of the word, so the retrieval speed can be substantially increased. That is to say that the idle time for a user to wait for a lexical articulation in the electronic dictionary can be greatly shortened.
All these objects are achieved by the voice file retrieval method described below.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will now be specified with reference to its preferred embodiment illustrated in the drawings, in which:
FIG. 1 is a flowchart of a preferred voice file retrieval method in accordance with the present invention; and
FIG. 2 is a schematic diagram showing how a multiple retrieval in accordance with the present invention is performed.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The invention disclosed herein is directed to a voice file retrieval method. In the following description, numerous details are set forth in order to provide a thorough understanding of the present invention. It will be appreciated by one skilled in the art that variations of these specific details are possible while still achieving the results of the present invention. In other instance, well-known components are not described in detail in order not to unnecessarily obscure the present invention.
Referring now to FIG. 1, a flowchart of a preferred embodiment of the voice file retrieval method in accordance with the present invention is shown. The voice file retrieval method comprises the steps as follows.
S1: Input a word.
S2: Determine whether voicing of the word is needed or not. If negative, the method is ended directly. If the voicing of the word is needed, go to step S3.
S3: Retrieve the word as well as its accompanying field information from the electronic dictionary. In this embodiment, the decision whether the articulation of the word is needed is made prior to the retrieval of the word. In another embodiment, the decision whether the articulation of the word is needed can be made posterior to the retrieval of the word.
S4: Obtain a storage home address of a voice file 10 corresponding to the word from the field information of the word.
In the present invention, every word is mapped to its own voice file 10, and every voice file 10 has its storage home address. Every storage home address of the voice file 10 is tagged to the voice field information of the word.
Upon the aforesaid arrangement, after step S3 is performed to retrieve all the field information of the word, the voice field as well as all the information tagged to this voice field can be automatically read. In particular, the message tagged in the voice field information could include the storage home address 101 of the voice field 10. The storage address 101 includes at least an index information, a position information.
For example, as shown in FIG. 2, a schematic diagram to show how a multiple retrieval of the present invention is performed is shown. In this embodiment, after the field information of the word is retrieval, the message “0 0011FF” in the voice field is read. Namely, the storage home address 101 of the voice field 10 with respect to the word is “0 0011FF”, in which the leading “0” and the following “0011FF” stand for the index information and the position information of the voice file 10, respectively.
S5: Retrieve the voice file 10 in accordance with the storage home address 101. In this step, the retrieval of the voice field 10 is executed in accordance with the aforesaid index and position information and an address index table 20 preset in advance.
As shown in FIG. 2, the index table 201 is established by regrouping the voice files 10 into a plurality of document packets. In accordance with the index table 201, the document packet for a particular voice file 10 can be located. A position table 202 can be established in accordance with the storage addresses of the document packets.
It is noted that the number of the voice files 10 may be different from the number of the document packets. For example, voice files 10 for 50,000 words can be divided into 16 document packets. Namely, every document packet can contain 3,125 voice files 10. The 16 document packets can be numbered to establish an index table 201.
Accordingly, the voice file 10 corresponding to the aforesaid “0 0011FF” storage home address represents that the voice file 10 is stored in the document packet numbered as “0” according to the index table 201.
In the document packet numbered as “0”, 0x000000˜0xFFFFFF are assigned to the addresses of the voice files in this packet. The position information “0011FF” represents the position 0x0011FF in the position table 202. Namely, the target voice file 10 can be retrieved in accordance with 0x0011FF of the document packet “0”.
Upon such an arrangement that the storage home address of the voice file 10 is tagged directly to follow the voice field information of the word, only a few steps are needed to retrieve the home address of the voice file 10, and thereby the retrieval speed can be substantially increased.
S6: A heading message 30 is loaded to the voice file 10 so as to form a corresponding adaptive multi-rate (AMR) voice file.
In the present invention, the voice file 10 can be separated into a heading message region and a voice message region. While performing AMR compression coding, the voice files 10 have the same heading message. For example, if a voice file of pulse code modulation (PCM) experiences the AMR compression coding by an 8K sampling rate and a compression ratio of 4.75 kbit/s, its first 7 bytes would be 0x23, 0x21, 0x41, 0x4D, 0x52, 0x0A and 0x3C.
In one embodiment of the present invention, the voice files 10 under AMR compression coding could have preferably no heading message. That is, all the heading messages of the voice files 10 in the present invention have been removed.
Namely, the voice file 10 of the present invention is formed by removing the heading message after the voice file experiences the AMR compression coding. The heading message could be reloaded to integrate with the voice region while the voice file is played, and an original AMR voice file is accordingly formed.
In the aforesaid example, every voice file 10 can save 7 bytes of the storage space. That is to say that 341.8 K-bit bytes can be saved in the lexicon of 50,000 words.
In another embodiment of the present invention, the heading message can be always with the voice file. Thus, the aforesaid separating process and the foresaid reloading process for the heading message is unnecessary any more.
S7: Load the voice file into the built-in memory. In this step, the original AMR voice file with the heading message is stored into the built-in memory.
S8: Play or broadcast the AMR voice file with an AMR displayer or broadcaster.
S9: After the AMR voice file is played, determine whether or not a replay of the AMR voice file is required. If positive, the AMR player displays the AMR voice file one more time.
In the present invention, a retrieval relationship between the word and its corresponding voice file has been established. By providing this relationship, the storage home address of the voice file can be directly obtained from the voice field of the word. Thereby, the retrieval speed of the voice file can be substantially increased and the idle time for a user to wait for an articulation is greatly shortened.
In the present invention, the storage format of the voice file is an AMR voice file ridding of the heading message during storage step. Therefore, the storage space required to store the voice files can be greatly reduced. Obviously, by providing the present invention, both the aforesaid problems in the retrieval speed and the storage space in the art can be substantially resolved. In particular, to the mainstream slim mobile communication devices that can only provide a limited storage space, the voice file retrieval method provided by the present invention is extremely suitable.
While the present invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be without departing from the spirit and scope of the present invention.

Claims (6)

1. A voice file retrieval method comprising the steps of:
inputting a word;
determining if voicing of the word is needed or not;
when voicing of the word is needed, obtaining a storage home address of an Adaptive Multi-Rate (AMR) voice file without a heading message corresponding to the word from a voice field of the word;
retrieving the voice file from the storage home address;
reloading the heading message to the AMR voice file after the AMR voice file is retrieved;
displaying the AMR voice file by an AMR displayer; and
determining whether or not a reply of said AMR voice file is needed, after said AMR displayer plays said AMR voice file.
2. The voice file retrieval method according to claim 1, wherein the AMR displayer displays the AMR voice file one more time, when the replay of the AMR voice file is needed.
3. The voice file retrieval method according to claim 1, wherein the storage home address includes an index information and a position information, the AMR voice file being retrieved by referring to an index table for the index information and the position information.
4. An Adaptive Multi-Rate (AMR) voice file retrieval method, applied to an electronic dictionary of a mobile communication device, the electronic dictionary including a plurality of words and a plurality of AMR voice files respective to the words, the AMR voice files being individually formed with a common heading message removed, the method comprising the steps of:
inputting a specific word of the words with a field information;
determining if voicing of the specific word is needed or not;
retrieving the specific word from the words for obtaining all of the field information of the specific word;
when voicing of the specific word is needed, obtaining a storage home address of a specific AMR voice file of the AMR voice files corresponding to the specific word from a voice field of the field information of the specific word;
retrieving the specific AMR voice file from the storage home address;
reloading the common heading message to the AMR specific voice file after the specific AMR voice file is retrieved;
displaying the specific AMR voice file by an AMR displayer; and
determining if or not a replay of said specific AMR voice file is needed, after said AMR displayer plays said specific AMR voice file.
5. The AMR voice file retrieval method according to claim 4, wherein the AMR displayer displays the specific AMR voice file one more time, when the specific AMR voice file needs to be replayed.
6. The AMR voice file retrieval method according to claim 4, wherein said storage home address includes an index information and a position information, said specific AMR voice file being retrieved by referring to an index table for the index information and the position information.
US11/637,784 2006-05-30 2006-12-13 Voice file retrieval method Active 2030-05-11 US7978829B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
TW95119230 2006-05-30
TW095119230A TW200744068A (en) 2006-05-30 2006-05-30 Voice file retrieving method
TW95119230A 2006-05-30

Publications (2)

Publication Number Publication Date
US20070280440A1 US20070280440A1 (en) 2007-12-06
US7978829B2 true US7978829B2 (en) 2011-07-12

Family

ID=38790190

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/637,784 Active 2030-05-11 US7978829B2 (en) 2006-05-30 2006-12-13 Voice file retrieval method

Country Status (2)

Country Link
US (1) US7978829B2 (en)
TW (1) TW200744068A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI383376B (en) * 2009-08-14 2013-01-21 Kuo Ping Yang Method and system for voice communication

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920559A (en) * 1996-03-15 1999-07-06 Fujitsu Limited Voice information service system to be connected to ATM network
US6031915A (en) * 1995-07-19 2000-02-29 Olympus Optical Co., Ltd. Voice start recording apparatus
US6356634B1 (en) * 1999-02-25 2002-03-12 Noble Systems Corporation System for pre-recording and later interactive playback of scripted messages during a call engagement
US6493427B1 (en) * 1998-06-16 2002-12-10 Telemanager Technologies, Inc. Remote prescription refill system
US6879957B1 (en) * 1999-10-04 2005-04-12 William H. Pechter Method for producing a speech rendition of text from diphone sounds
US20060199594A1 (en) * 2005-03-04 2006-09-07 Veerabhadra Gundu Restructuring data packets to improve voice quality at low bandwidth conditions in wireless networks
US7746847B2 (en) * 2005-09-20 2010-06-29 Intel Corporation Jitter buffer management in a packet-based network
US7808988B2 (en) * 2006-02-10 2010-10-05 Packet Video Corporation System and method for connecting mobile devices

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6031915A (en) * 1995-07-19 2000-02-29 Olympus Optical Co., Ltd. Voice start recording apparatus
US5920559A (en) * 1996-03-15 1999-07-06 Fujitsu Limited Voice information service system to be connected to ATM network
US6493427B1 (en) * 1998-06-16 2002-12-10 Telemanager Technologies, Inc. Remote prescription refill system
US6356634B1 (en) * 1999-02-25 2002-03-12 Noble Systems Corporation System for pre-recording and later interactive playback of scripted messages during a call engagement
US6879957B1 (en) * 1999-10-04 2005-04-12 William H. Pechter Method for producing a speech rendition of text from diphone sounds
US20060199594A1 (en) * 2005-03-04 2006-09-07 Veerabhadra Gundu Restructuring data packets to improve voice quality at low bandwidth conditions in wireless networks
US7746847B2 (en) * 2005-09-20 2010-06-29 Intel Corporation Jitter buffer management in a packet-based network
US7808988B2 (en) * 2006-02-10 2010-10-05 Packet Video Corporation System and method for connecting mobile devices

Also Published As

Publication number Publication date
TW200744068A (en) 2007-12-01
US20070280440A1 (en) 2007-12-06
TWI303804B (en) 2008-12-01

Similar Documents

Publication Publication Date Title
US8396714B2 (en) Systems and methods for concatenation of words in text to speech synthesis
US8712776B2 (en) Systems and methods for selective text to speech synthesis
US8117026B2 (en) String matching method and system using phonetic symbols and computer-readable recording medium storing computer program for executing the string matching method
US20100082348A1 (en) Systems and methods for text normalization for text to speech synthesis
US20100082327A1 (en) Systems and methods for mapping phonemes for text to speech synthesis
US20100082346A1 (en) Systems and methods for text to speech synthesis
US20070213983A1 (en) Spell checking system including a phonetic speller
US20070193437A1 (en) Apparatus, method, and medium retrieving a highlighted section of audio data using song lyrics
KR20080000203A (en) Method for searching music file using voice recognition
JP2013047809A (en) Methods and apparatus for automatically extending voice vocabulary of mobile communications devices
US20170060531A1 (en) Devices and related methods for simplified proofreading of text entries from voice-to-text dictation
CN105489072A (en) Method for the determination of supplementary content in an electronic device
RU2008128440A (en) METHOD AND DEVICE FOR ACCESSING A DIGITAL FILE FROM A SET OF DIGITAL FILES
JP5465926B2 (en) Speech recognition dictionary creation device and speech recognition dictionary creation method
US7978829B2 (en) Voice file retrieval method
US20050137869A1 (en) Method supporting text-to-speech navigation and multimedia device using the same
US20080192906A1 (en) Method and system for message management for audio storage devices
CN110600003A (en) Robot voice output method and device, robot and storage medium
KR20060014043A (en) Voice response system, voice response method, voice server, voice file processing method, program and recording medium
JP2004289560A (en) Image recording and reproducing method and image recording and reproducing apparatus
US20070136240A1 (en) Compact disc playing system and it spalying method
US20040243398A1 (en) Voice recording and reproducing apparatus and additional voice information recording method
KR20010018370A (en) Format For Digital Data Used for Studying Language and Method For Its Replay
CN101369447B (en) Method and system for information management of sound frequency memory devices
JP4189653B2 (en) Image recording / reproducing method and image recording / reproducing apparatus

Legal Events

Date Code Title Description
AS Assignment

Owner name: INVENTEC APPLIANCES CORP., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAO, YING-LONG;REEL/FRAME:018675/0332

Effective date: 20061106

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12