US6205420B1 - Method and device for instantly changing the speed of a speech - Google Patents
Method and device for instantly changing the speed of a speech Download PDFInfo
- Publication number
- US6205420B1 US6205420B1 US09/180,429 US18042998A US6205420B1 US 6205420 B1 US6205420 B1 US 6205420B1 US 18042998 A US18042998 A US 18042998A US 6205420 B1 US6205420 B1 US 6205420B1
- Authority
- US
- United States
- Prior art keywords
- block
- data
- speech
- speech data
- connection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 230000001419 dependent effect Effects 0.000 claims abstract description 5
- 230000004044 response Effects 0.000 abstract description 5
- 238000005070 sampling Methods 0.000 description 4
- 210000001260 vocal cord Anatomy 0.000 description 3
- 230000003139 buffering effect Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 208000016621 Hearing disease Diseases 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 210000000883 ear external Anatomy 0.000 description 1
- 210000000959 ear middle Anatomy 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
Definitions
- the present invention relates to a speech speed converting method and a device for embodying the same which are employed in various video devices, audio devices, medical devices, etc. such as a television set, a radio, a tape recorder, a video tape recorder, a video diskplayer, etc. and, more particularly, aspeech speed converting method and a device for embodying the same which is able to provide speed-converted speech whose speech speed is fitted for a listening capability of a listener by processing a speech of a speaker.
- the listening capability e.g., a speech recognition critical speed (maximum speech speed at which the speech can be precisely identified) of the listener
- a speech recognition critical speed maximum speech speed at which the speech can be precisely identified
- the conventional hearing aid which is used by the person having declined listening capability or hearing disorder can simply make up for propagation characteristics of an external ear and a middle ear in an auditory organ by virtue of an improvement of a frequency characteristic, a gain control, etc. Therefore, there has been such a problem that decline of the speech identification capability which is mainly associated with degradation of an auditory center cannot be compensated.
- this speech speed controlled type hearing aiding device by executing an expansion process for expanding the speech of the speaker in time, and then storing sequentially the speech obtained by the expansion process into an output buffer memory, and then outputting stored speech, the speech speed of the speaker is changed (slowed down) to compensate the decline of the listening capability of the listener.
- the speech speed controlled type hearing aid in the prior art expands the speech data input as described above by the expansion process, then stores sequentially the speech data obtained by the expansion process into the output buffer memory, and then outputs the stored speech data. Therefore, for example, in case the listener wishes to slow down the speech speed much more or restore the speech speed into the original speed in the middle of listening, the speech speed cannot be restored into the original speed until all the speech data which are stored in the output buffer memory have been output.
- Such speech speed controlled type hearing aid in the prior art can be employed by not only the above listener who has the declined listening capability but also the listener who has the normal listening capability but wish to listen to the foreign language, for example, in the application field to change (slow down) the speech speed of the speaker in order to compensate their listening capability.
- a time delay is caused upon changing the speech speed in the middle of listening.
- the present invention has been made in light of the above circumstances, and it is an object of the present invention to provide a speech speed converting method and a device for embodying the same which is able to convert the speech speed of the output voice to follow instantly an operation of the listener, and thus to improve extremely the convenience of use on the listener side.
- a method for instantly changing the speed of speech comprising the steps of applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence; splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes; storing the split speech data as block speech data and the block lengths sequentially in a buffer and outputting the block speech data and the block lengths sequentially from the buffer; generating connection data at every moment, which are to be replaced or inserted between adjacent block speech data to connect the adjacent block speech data to each other, every block, and then storing the connection data sequentially in another buffer and outputting the connection data sequentially from the other buffer; generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the buffer and extension scaling factors in time for the respective attributes; and connecting sequentially the block speech data output from the buffer
- the speech speed of the output voice can be converted to follow instantly an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
- connection data are generated block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlapadding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
- a device for instantly changing the speed of speech comprising an analysis processor for applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence; a block data splitter for splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes; a block data storing portion for sequentially storing speech data split by the block data splitter as block speech data and the block lengths; a connection data generator for generating connection data at every moment, which are able to be replaced or inserted between adjacent block speech data to connect the adjacent block data each other, by using the block speech data obtained by the block data splitter; a connection data storing portion for sequentially storing the connection data being generated by the connection data generator; a connection order generator for generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the block data storing
- connection data generator generates the connection data block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlap-adding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
- the connection order generator includes a read/write memory for storing the extension scaling factors in time for the respective attributes, and a connection order deciding processor for reading the extension scaling factors in time for the respective attributes stored in the read/write memory at a predetermined time interval, and generating the block connection order of the block speech data and the connection data at every moment based on the extension scaling factors, the block lengths output from the block data storing portion, and the already-connected information output from the speech data connector.
- the speech speed of the output voice can be converted to follow momentarily an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
- FIG. 1 is a block diagram showing an example of a speech speed converting method according to the present invention and a speech speed converting device as an embodiment
- FIG. 2 is a schematic view showing an example of connection data generating steps executed in a connection data generator shown in FIG. 1;
- FIG. 3 is a schematic view showing an example of connection order generating steps executed in a connection order generator shown in FIG. 1 .
- FIG. 1 is a block diagram showing an embodiment of a speech speed converting device according to the present invention.
- a speech speed converting device 1 shown in this figure comprises an A/D converter 2 for converting an input speech signal into a digital speech data, an analysis processor 3 for analyzing attributes of the speech data, a block data splitter 4 for splitting the speech data into block data to generate block speech data, a block data memory 5 for storing the block speech data, a connection data generator 6 for generating connection data necessary for connecting the block speech data, a connection data memory 7 for storing the connection data, a connection order generator 8 for generating connection order of the block speech data and the connection data, a speech data connector 9 for generating a series of speech data by connecting the block speech data and the connection data based on the connection order, and a D/A converter 10 for converting a series of speech data into speech signals.
- the speech speed converting device 1 applies analyzing process to the speech data being input by the speaker based on the attributes, then splits the speech data in unit of block having a predetermined time width according to analyzed information derived by the analyzing process, and then stores block data. Also, in order to achieve expansion of the speech data in time, the speech speed converting device 1 generates the speech data to be replaced or inserted between the adjacent block speech data every block, and then stores the speech data.
- the speech speed converting device 1 generates the block connection order to generate the output speech data corresponding to any voice speed in response to the operation of the listener, and then connects sequentially the speech data (block speech data), which have already been split in unit of block and stored, and to-be-replaced/inserted speech data (connection data), which have already been stored, according to the connection order to generate the output speech data.
- block speech data speech data
- connection data to-be-replaced/inserted speech data
- the A/D converter 2 comprises an A/D converter circuit for A/D-converting an input speech signal into a digital speech data by sampling the input speech signal at a predetermined sampling rate (e.g., 32 kHz), and a FIFO memory for receiving the digital speech data output from the A/D converter circuit to store therein and then outputting them in the FIFO fashion.
- the A/D converter 2 receives the speech signal being input into an input terminal on the speaker side, e.g., the speech signal being output from an analogue sound output terminal of the video device, the audio device, etc. such as a microphone, a television, a radio, etc., then A/D-converts the speech signal into the digital speech data, and then supplies resultant speech data to the analysis processor 3 and the block data splitter 4 while buffering the speech data.
- the analysis processor 3 executes sequentially an input process for receiving the speech data being output from the A/D converter 2 ; a decimation(thinning) process for reducing a deal of succeeding process by lowering the sampling rate of the speech data obtained the input process to 4 kHz; an attribute analysis process for analyzing attributes of the speech data being output from the A/D converter 2 and the speech data obtained by the above decimation process to divide the speech data into voiced sound, voiceless sound, and silent; and a block length decision process for detecting periodicity of the voiced sound, the voiceless sound, and the silent by executing their autocorrelation analysis and then deciding block lengths required to divide the speech data (block lengths required to prevent disadvantages such as change in voice tone, e.g., low voice, due to the repetition of block unit) based on detected results.
- the analysis processor 3 then supplies resultant split information (block lengths of the voiced sound, the voiceless sound, and the silent) to the block data splitter 4 .
- a sum of squares of the speech data being output from the A/D converter 2 is calculated by using a window width of about 30 ms, and also power values P of the speech data are calculated at an interval of about 5 ms. Also, the power values P and a previously set threshold value P min are compared with each other, and as a result a data area to satisfy “P ⁇ P min ” is decided as a silent interval and also a data area to satisfy “P min ⁇ P” is decided as a voiced sound interval and a voiceless interval. Then, zero crossing analysis of the speech data output from the A/D converter 2 , autocorrelation analysis of the speech data obtained by the above decimation process, etc. are carried out.
- the data area of the speech data which satisfies “P min ⁇ P” belongs to the voice interval with vibration of the vocal cords (voiced sound interval) or the voice interval without vibration of the vocal cords (voiceless sound interval).
- attributes such as the noise or the background sound like the music may be considered as attributes of the speech data being output from the A/D converter 2 .
- the noise and the background sound are classified into any one of the voiced sound, the voiceless sound, and the silent.
- the above block length decide process applies the autocorrelation analyses having different long/short window widths to the speech data, which have been decided as the voiced sound interval by the attribute analysis process, over a wide range of 1.25 ms to 28.0 ms, in which pitch periods of the voiced sound are distributed, then detects the pitch periods (pitch periods which are vibration periods of the vocal cords) as precisely as possible, then decides block lengths based on detection results such that respective pitch periods correspond to respective block lengths.
- the above block length decide process applies detects periodicity of less than 10 ms from the speech data in the intervals which have been decided as the voiceless sound interval and the silent interval by the attribute analysis process, and then decides the block lengths based on detected results. As a result, respective block lengths of the voiced sound, the voiceless sound, and the silent are supplied as split information to the block data splitter 4 .
- the block data splitter 4 splits the speech data being output from the A/D converter 2 based on the block length of the voiced sound interval, the voiceless sound interval, and the silent interval which are indicated by the split information being output from the analysis processor 3 . Then, the block data splitter 4 supplies the speech data (block speech data) get by this split process in block unit and the block lengths of the speech data to both the block data memory 5 and the connection data generator 6 .
- the block data memory 5 is equipped with a ring buffer.
- the block data memory 5 receives the block speech data (speech data in block unit) and the block lengths of the speech data output from the block data splitter 4 , then stores temporarily them in the ring buffer, then reads appropriately respective block lengths being stored temporarily, and then supplies the block lengths to the connection order generator 8 . Also, the block data memory 5 reads appropriately the block speech data being stored temporarily and then supplies such block speech data to the speech data connector 9 .
- connection data generator 6 receives the block speech data being output from the block data splitter 4 , then applies a window every block to the speech data located at a start portion of a concerned block and the speech data located at a start portion of a succeeding block by using an A window and a B window, which are changed linearly in a time interval d (ms), as shown in FIG. 2, then adds overlappedly the start portion of the succeeding block to the start portion of the concerned block to generate the connection data of the time interval d (ms), and then supplies such connection data to the connection data memory 7 .
- a value of [0.5 (ms)] to [the shortest one of the block lengths of the concerned block and the succeeding block] can be selected as the time interval d, but the shortest one of the block lengths can provide a smaller capacity of the buffer in the connection data memory 7 .
- connection data memory 7 has a ring buffer, and receives the connection data being output from the connection data generator 6 , then stores temporarily the connection data in the ring buffer, then reads appropriately the connection data being stored temporarily, and then supplies the connection data to the speech data connector 9 .
- the connection order generator 8 includes a writable memory for storing expansion magnifications of respective attributes in time, which are input by operating a digital setting means such as a digital volume by the listener; and a connection order deciding processor for reading the expansion magnifications of respective attributes in time stored in the writable memory at a predetermined time interval being set previously, e.g., at a time interval of about 100 ms, and generating the connection order (connection order required to implement the desired speech speed being set by the listener) of the speech data in unit of block and the connection data in unit of block every moment based on these expansion magnifications, respective block lengths output from the block data storing portion 5 , and the ready-connected information which are output from the speech data connector 9 .
- connection data which correspond to the finally connected block, out of the connection data being output from the connection data memory 7 are replaced/inserted at a timing to satisfy a condition given by
- S i is a total sum of all the block lengths of the block speech data from a start time T 0 which have already been output from the block data memory 5 to the speech data connector 9 before the speech speed is changed
- S o is a total sum of all the block lengths of the block speech data from the start time T 0 which have already been connected
- r (where r ⁇ 1.0) is a target expansion magnification
- L is the block length of the block speech data which have been connected lastly.
- connection order indicating that remaining blocks are connected sequentially after this block is generated and then supplied to the speech data connector 9 .
- connection data corresponding to the block ( 8 ) are replaced/inserted after the block ( 8 ), and then a part, which is located after the part of the block ( 8 ) employed in generation of the connection data, is repeatedly connected.
- the block ( 4 ) has already connected repeatedly once.
- the speech data connector 9 supplies connected contents such as the block speech data, which have already been connected, as the ready-connected information to the connection order generator 8 .
- the speech data connector 9 connects the block speech data being output from the block data memory 5 and the connection data being output from the connection data memory 7 to thus generate a series of speech data.
- the speech data connector 9 supplies a series of resultant speech data to the D/A converter 10 while buffering them.
- the D/A converter 10 includes a memory for storing the speech data and then outputting the speech data in the FIFO manner, and a D/A converting circuit for reading the speech data from the memory at a predetermined sampling rate (e.g., 32 kHz) and then A/D-converting the speech data into speech signals.
- the D/A converter 10 receives a series of speech data being output from the speech data connector 9 , then D/A-converts the speech data into the speech signals, and then outputs resultant speech signals from an output terminal.
- the output voice can be created based on speech speed conversion controlling information indicating any speech speed in response to the operation of the listener, while controlling the order of the block speech data stored previously and the connection data. Therefore, the voice can be output promptly at the desired speech speed even when the listener changes the speech speed by the manual operation, so that it is possible for the listener not to feel the time delay when the speech speed is changed in the middle.
- the speech speed converting device 1 As a result, only by applying the speech speed converting device 1 according to the present invention to various video devices, audio devices, medical devices, etc. such as the television set, the radio, the tape recorder, the video tape recorder, the video disk player, etc., the speed speech of the output voice can be changed instantly in response to the operation of the listener when the speech speed is fitted for the listening capability of the listener by processing the speech of the speaker.
- video devices, audio devices, medical devices, etc. such as the television set, the radio, the tape recorder, the video tape recorder, the video disk player, etc.
- the windows have been applied to the starting portions of respective block speech data by using the A window and the B window, which are changed linearly as shown in FIG. 2, in the connection data generator 6 .
- the windows may be applied to the starting portions of respective block speech data by using windows which have a cosine curve respectively.
- the window may be applied to not only the starting portions of respective block speech data but also the full block length.
- connection data of the block speech data ( 4 ), ( 8 ) and the latter half of the block speech data ( 4 ), ( 8 ) are repeated only once in the connection order generator 8 . But, if the expansion magnification “r” satisfies “r>2”, the same block speech data may be repeated twice or more.
- the speech speed of the output voice can be converted to follow instantly an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
Abstract
An analysis processor applies an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence. A block data splitter splits the input speech data into blocks having the block lengths dependent on the respective attributes. A block data memory sequentially stores speech data split by the block data splitter as block speech data and the block lengths. A connection data generator generates connection data for connecting the adjacent block speech data each other at every moment by using the block speech data. A connection data storing portion sequentially stores the connection data. A connection order generator generates block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the block data storing portion and extension scaling factors in time for the respective attributes. A speech data connector connects sequentially the block speech data and the connection data based on the block connection order. Accordingly, the speed of output speech can be instantly changed in response to an instruction of an operator.
Description
The present invention relates to a speech speed converting method and a device for embodying the same which are employed in various video devices, audio devices, medical devices, etc. such as a television set, a radio, a tape recorder, a video tape recorder, a video diskplayer, etc. and, more particularly, aspeech speed converting method and a device for embodying the same which is able to provide speed-converted speech whose speech speed is fitted for a listening capability of a listener by processing a speech of a speaker.
In general, for example, in the case that one person (listener) listens to the speech of the other person (speaker), when the listening capability, e.g., a speech recognition critical speed (maximum speech speed at which the speech can be precisely identified) of the listener is declined because of aging or any disorder, it becomes often hard for the listener to identify the speech with an ordinary speed or the speech of rapid talking. In such case, normally the listener can make up for the listening capability by using a so-called hearing aid.
However, the conventional hearing aid which is used by the person having declined listening capability or hearing disorder can simply make up for propagation characteristics of an external ear and a middle ear in an auditory organ by virtue of an improvement of a frequency characteristic, a gain control, etc. Therefore, there has been such a problem that decline of the speech identification capability which is mainly associated with degradation of an auditory center cannot be compensated.
In light of the above, recently a speech speed controlled type hearing aiding device has been thought out which can aid the hearing by processing the speech of the speaker such that the speech speed can be adjusted for the listening capability of the listener in substantially real time.
According to this speech speed controlled type hearing aiding device, by executing an expansion process for expanding the speech of the speaker in time, and then storing sequentially the speech obtained by the expansion process into an output buffer memory, and then outputting stored speech, the speech speed of the speaker is changed (slowed down) to compensate the decline of the listening capability of the listener.
However, in the above speech speed controlled type hearing aid in the prior art, there have been problems described in the following.
To begin with, the speech speed controlled type hearing aid in the prior art expands the speech data input as described above by the expansion process, then stores sequentially the speech data obtained by the expansion process into the output buffer memory, and then outputs the stored speech data. Therefore, for example, in case the listener wishes to slow down the speech speed much more or restore the speech speed into the original speed in the middle of listening, the speech speed cannot be restored into the original speed until all the speech data which are stored in the output buffer memory have been output.
For this reason, there has been a problem that, in order to restore the speech speed in the middle of listening, a considerably long delay in time is caused until the existing speech speed can be restored into the original speed.
In addition, such speech speed controlled type hearing aid in the prior art can be employed by not only the above listener who has the declined listening capability but also the listener who has the normal listening capability but wish to listen to the foreign language, for example, in the application field to change (slow down) the speech speed of the speaker in order to compensate their listening capability. However, in this case, there has been a problem that, like the above, a time delay is caused upon changing the speech speed in the middle of listening.
The present invention has been made in light of the above circumstances, and it is an object of the present invention to provide a speech speed converting method and a device for embodying the same which is able to convert the speech speed of the output voice to follow instantly an operation of the listener, and thus to improve extremely the convenience of use on the listener side.
In order to achieve the above object, according to one aspect of the present invention, there is provided a method for instantly changing the speed of speech, comprising the steps of applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence; splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes; storing the split speech data as block speech data and the block lengths sequentially in a buffer and outputting the block speech data and the block lengths sequentially from the buffer; generating connection data at every moment, which are to be replaced or inserted between adjacent block speech data to connect the adjacent block speech data to each other, every block, and then storing the connection data sequentially in another buffer and outputting the connection data sequentially from the other buffer; generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the buffer and extension scaling factors in time for the respective attributes; and connecting sequentially the block speech data output from the buffer and the connection data output from the other buffer according to the block connection order to thus generate output speech data extended in time as compared with the input speech data.
Accordingly, the speech speed of the output voice can be converted to follow instantly an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
In a preferred embodiment of the present invention, the connection data are generated block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlapadding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
In order to achieve the above object, according to another aspect of the present invention, there is provided a device for instantly changing the speed of speech comprising an analysis processor for applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence; a block data splitter for splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes; a block data storing portion for sequentially storing speech data split by the block data splitter as block speech data and the block lengths; a connection data generator for generating connection data at every moment, which are able to be replaced or inserted between adjacent block speech data to connect the adjacent block data each other, by using the block speech data obtained by the block data splitter; a connection data storing portion for sequentially storing the connection data being generated by the connection data generator; a connection order generator for generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the block data storing portion and extension scaling factors in time for the respective attributes; and a speech data connector for connecting sequentially the block speech data output from the block data storing portion and the connection data output from the connection data storing portion based on the block connection order obtained by the block connection order generator to thus generate output speech data extended in time as compared with the input speech data.
In a preferred embodiment of the present invention, the connection data generator generates the connection data block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlap-adding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
In a preferred embodiment of the present invention, the connection order generator includes a read/write memory for storing the extension scaling factors in time for the respective attributes, and a connection order deciding processor for reading the extension scaling factors in time for the respective attributes stored in the read/write memory at a predetermined time interval, and generating the block connection order of the block speech data and the connection data at every moment based on the extension scaling factors, the block lengths output from the block data storing portion, and the already-connected information output from the speech data connector.
Accordingly, the speech speed of the output voice can be converted to follow momentarily an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
FIG. 1 is a block diagram showing an example of a speech speed converting method according to the present invention and a speech speed converting device as an embodiment;
FIG. 2 is a schematic view showing an example of connection data generating steps executed in a connection data generator shown in FIG. 1; and
FIG. 3 is a schematic view showing an example of connection order generating steps executed in a connection order generator shown in FIG. 1.
FIG. 1 is a block diagram showing an embodiment of a speech speed converting device according to the present invention.
A speech speed converting device 1 shown in this figure comprises an A/D converter 2 for converting an input speech signal into a digital speech data, an analysis processor 3 for analyzing attributes of the speech data, a block data splitter 4 for splitting the speech data into block data to generate block speech data, a block data memory 5 for storing the block speech data, a connection data generator 6 for generating connection data necessary for connecting the block speech data, a connection data memory 7 for storing the connection data, a connection order generator 8 for generating connection order of the block speech data and the connection data, a speech data connector 9 for generating a series of speech data by connecting the block speech data and the connection data based on the connection order, and a D/A converter 10 for converting a series of speech data into speech signals.
Then, the speech speed converting device 1 applies analyzing process to the speech data being input by the speaker based on the attributes, then splits the speech data in unit of block having a predetermined time width according to analyzed information derived by the analyzing process, and then stores block data. Also, in order to achieve expansion of the speech data in time, the speech speed converting device 1 generates the speech data to be replaced or inserted between the adjacent block speech data every block, and then stores the speech data. Then, the speech speed converting device 1 generates the block connection order to generate the output speech data corresponding to any voice speed in response to the operation of the listener, and then connects sequentially the speech data (block speech data), which have already been split in unit of block and stored, and to-be-replaced/inserted speech data (connection data), which have already been stored, according to the connection order to generate the output speech data. As a result, the speech speed of the output voice can follow instantly in response to an operation of the listener.
The A/D converter 2 comprises an A/D converter circuit for A/D-converting an input speech signal into a digital speech data by sampling the input speech signal at a predetermined sampling rate (e.g., 32 kHz), and a FIFO memory for receiving the digital speech data output from the A/D converter circuit to store therein and then outputting them in the FIFO fashion. The A/D converter 2 receives the speech signal being input into an input terminal on the speaker side, e.g., the speech signal being output from an analogue sound output terminal of the video device, the audio device, etc. such as a microphone, a television, a radio, etc., then A/D-converts the speech signal into the digital speech data, and then supplies resultant speech data to the analysis processor 3 and the block data splitter 4 while buffering the speech data.
The analysis processor 3 executes sequentially an input process for receiving the speech data being output from the A/D converter 2; a decimation(thinning) process for reducing a deal of succeeding process by lowering the sampling rate of the speech data obtained the input process to 4 kHz; an attribute analysis process for analyzing attributes of the speech data being output from the A/D converter 2 and the speech data obtained by the above decimation process to divide the speech data into voiced sound, voiceless sound, and silent; and a block length decision process for detecting periodicity of the voiced sound, the voiceless sound, and the silent by executing their autocorrelation analysis and then deciding block lengths required to divide the speech data (block lengths required to prevent disadvantages such as change in voice tone, e.g., low voice, due to the repetition of block unit) based on detected results. The analysis processor 3 then supplies resultant split information (block lengths of the voiced sound, the voiceless sound, and the silent) to the block data splitter 4.
In this case, in the above attribute analysis process, a sum of squares of the speech data being output from the A/D converter 2 is calculated by using a window width of about 30 ms, and also power values P of the speech data are calculated at an interval of about 5 ms. Also, the power values P and a previously set threshold value Pmin are compared with each other, and as a result a data area to satisfy “P<Pmin” is decided as a silent interval and also a data area to satisfy “Pmin≦P” is decided as a voiced sound interval and a voiceless interval. Then, zero crossing analysis of the speech data output from the A/D converter 2, autocorrelation analysis of the speech data obtained by the above decimation process, etc. are carried out. Based on these analysis results and the power values P, it is decided whether the data area of the speech data which satisfies “Pmin≦P” belongs to the voice interval with vibration of the vocal cords (voiced sound interval) or the voice interval without vibration of the vocal cords (voiceless sound interval). In this case, attributes such as the noise or the background sound like the music may be considered as attributes of the speech data being output from the A/D converter 2. However, since in general it is hard to automatically discriminate the speech signals precisely from signals of the noise and the background sound, the noise and the background sound are classified into any one of the voiced sound, the voiceless sound, and the silent.
Also, the above block length decide process applies the autocorrelation analyses having different long/short window widths to the speech data, which have been decided as the voiced sound interval by the attribute analysis process, over a wide range of 1.25 ms to 28.0 ms, in which pitch periods of the voiced sound are distributed, then detects the pitch periods (pitch periods which are vibration periods of the vocal cords) as precisely as possible, then decides block lengths based on detection results such that respective pitch periods correspond to respective block lengths. Meanwhile, the above block length decide process applies detects periodicity of less than 10 ms from the speech data in the intervals which have been decided as the voiceless sound interval and the silent interval by the attribute analysis process, and then decides the block lengths based on detected results. As a result, respective block lengths of the voiced sound, the voiceless sound, and the silent are supplied as split information to the block data splitter 4.
The block data splitter 4 splits the speech data being output from the A/D converter 2 based on the block length of the voiced sound interval, the voiceless sound interval, and the silent interval which are indicated by the split information being output from the analysis processor 3. Then, the block data splitter 4 supplies the speech data (block speech data) get by this split process in block unit and the block lengths of the speech data to both the block data memory 5 and the connection data generator 6.
The block data memory 5 is equipped with a ring buffer. The block data memory 5 receives the block speech data (speech data in block unit) and the block lengths of the speech data output from the block data splitter 4, then stores temporarily them in the ring buffer, then reads appropriately respective block lengths being stored temporarily, and then supplies the block lengths to the connection order generator 8. Also, the block data memory 5 reads appropriately the block speech data being stored temporarily and then supplies such block speech data to the speech data connector 9.
Then, the connection data generator 6 receives the block speech data being output from the block data splitter 4, then applies a window every block to the speech data located at a start portion of a concerned block and the speech data located at a start portion of a succeeding block by using an A window and a B window, which are changed linearly in a time interval d (ms), as shown in FIG. 2, then adds overlappedly the start portion of the succeeding block to the start portion of the concerned block to generate the connection data of the time interval d (ms), and then supplies such connection data to the connection data memory 7. A value of [0.5 (ms)] to [the shortest one of the block lengths of the concerned block and the succeeding block] can be selected as the time interval d, but the shortest one of the block lengths can provide a smaller capacity of the buffer in the connection data memory 7.
The connection data memory 7 has a ring buffer, and receives the connection data being output from the connection data generator 6, then stores temporarily the connection data in the ring buffer, then reads appropriately the connection data being stored temporarily, and then supplies the connection data to the speech data connector 9.
The connection order generator 8 includes a writable memory for storing expansion magnifications of respective attributes in time, which are input by operating a digital setting means such as a digital volume by the listener; and a connection order deciding processor for reading the expansion magnifications of respective attributes in time stored in the writable memory at a predetermined time interval being set previously, e.g., at a time interval of about 100 ms, and generating the connection order (connection order required to implement the desired speech speed being set by the listener) of the speech data in unit of block and the connection data in unit of block every moment based on these expansion magnifications, respective block lengths output from the block data storing portion 5, and the ready-connected information which are output from the speech data connector 9.
Then, in the situation that the speech signals in which the voiced sound interval, the voiceless sound interval, and the silent interval sequentially alternately appear are being input, when switching of the attributes of the block speech data can be detected by the ready-connected information being output from the speech data connector 9 as shown in FIG. 3, or when it can be detected that the expansion magnifications of the block speech data being read from the writable memory have been changed even if the block speech data having the same attribute are still connected, it is decided that a starting condition of generating the connection order has been ready. A time at the moment is decided as a time T0.
Then, the connection data, which correspond to the finally connected block, out of the connection data being output from the connection data memory 7 are replaced/inserted at a timing to satisfy a condition given by
where “Si” is a total sum of all the block lengths of the block speech data from a start time T0 which have already been output from the block data memory 5 to the speech data connector 9 before the speech speed is changed, “So” is a total sum of all the block lengths of the block speech data from the start time T0 which have already been connected, “r” (where r≧1.0) is a target expansion magnification, and “L” is the block length of the block speech data which have been connected lastly. Then, a part of the lastly connected block, which is located after a part of the block employed in generation of the connection data, is repeatedly connected again, then the connection order indicating that remaining blocks are connected sequentially after this block is generated and then supplied to the speech data connector 9.
Accordingly, in an example shown in FIG. 3, since the condition given by Eq.[1] can be satisfied at the time point when the block (1) to the block (8) have been connected sequentially, the connection data corresponding to the block (8) are replaced/inserted after the block (8), and then a part, which is located after the part of the block (8) employed in generation of the connection data, is repeatedly connected. In the example shown in FIG. 3, the block (4) has already connected repeatedly once.
The speech data connector 9 supplies connected contents such as the block speech data, which have already been connected, as the ready-connected information to the connection order generator 8. At the same time, based on the connection order output from the connection order generator 8, the speech data connector 9 connects the block speech data being output from the block data memory 5 and the connection data being output from the connection data memory 7 to thus generate a series of speech data. Then, the speech data connector 9 supplies a series of resultant speech data to the D/A converter 10 while buffering them.
The D/A converter 10 includes a memory for storing the speech data and then outputting the speech data in the FIFO manner, and a D/A converting circuit for reading the speech data from the memory at a predetermined sampling rate (e.g., 32 kHz) and then A/D-converting the speech data into speech signals. The D/A converter 10 receives a series of speech data being output from the speech data connector 9, then D/A-converts the speech data into the speech signals, and then outputs resultant speech signals from an output terminal.
In this manner, in the present embodiment, the output voice can be created based on speech speed conversion controlling information indicating any speech speed in response to the operation of the listener, while controlling the order of the block speech data stored previously and the connection data. Therefore, the voice can be output promptly at the desired speech speed even when the listener changes the speech speed by the manual operation, so that it is possible for the listener not to feel the time delay when the speech speed is changed in the middle.
As a result, only by applying the speech speed converting device 1 according to the present invention to various video devices, audio devices, medical devices, etc. such as the television set, the radio, the tape recorder, the video tape recorder, the video disk player, etc., the speed speech of the output voice can be changed instantly in response to the operation of the listener when the speech speed is fitted for the listening capability of the listener by processing the speech of the speaker.
In the above embodiment, the windows have been applied to the starting portions of respective block speech data by using the A window and the B window, which are changed linearly as shown in FIG. 2, in the connection data generator 6. However, the windows may be applied to the starting portions of respective block speech data by using windows which have a cosine curve respectively. In addition, if a buffer capacity of the connection data memory 7 is sufficiently large, the window may be applied to not only the starting portions of respective block speech data but also the full block length.
Moreover, in the above embodiment, as shown in FIG. 3, the connection data of the block speech data (4), (8) and the latter half of the block speech data (4), (8) are repeated only once in the connection order generator 8. But, if the expansion magnification “r” satisfies “r>2”, the same block speech data may be repeated twice or more.
As described above, according to the present invention, the speech speed of the output voice can be converted to follow instantly an operation of the listener, and thus the convenience of use on the listener side can be improved extremely.
Claims (5)
1. A method for instantly changing the speed of speech, comprising the steps of:
applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence;
splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes;
storing the split speech data as block speech data and the block lengths sequentially in a buffer and outputting the block speech data and the block lengths sequentially from the buffer;
generating connection data at every moment, which are to be replaced or inserted between adjacent block speech data to connect the adjacent block speech data each other, every block, and then storing the connection data sequentially in another buffer and outputting the connection data sequentially from the other buffer;
generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the buffer and extension scaling factors in time for the respective attributes; and
connection sequentially the block speech data output from the buffer and the connection data output from the other buffer according to the block connection order to thus generate output speech data extended in time as compared with the input speech data.
2. A method for instantly changing the speed of speech according to claim 1, wherein the connection data are generated block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlap-adding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
3. A device for instantly changing the speed of speech, comprising:
an analysis processor for applying an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence;
a block data splitter for splitting the input speech data having a voiced sound section, a voiceless sound section and a silent section into blocks having the block lengths dependent on the respective attributes;
a block data storing portion for sequentially storing speech data split by the block data splitter as block speech data and the block lengths;
a connection data generator for generating connection data at every moment, which are able to be replaced or inserted between adjacent block speech data to connect the adjacent block speech data each other, by using the block speech data obtained by the block data splitter;
a connection data storing portion for sequentially storing the connection data being generated by the connection data generator;
a connection order generator for generating block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the block data storing portion and extension scaling factors in time for the respective attributes; and
a speech data connector for connecting sequentially the block speech data output from the block data storing portion and the connection data output from the connection data storing portion based on the block connection order obtained by the block connection order generator to thus generate output speech data extended in time as compared with the input speech data.
4. A device for instantly changing the speed of speech according to claim 3, wherein the connection data generator generates the connection data block by block by applying two windows to speech data located at a start portion of a concerned block and speech data located at a start portion of a succeeding block respectively, and then overlap-adding the start portion of the succeeding block to the start portion of the concerned block, each window having the shape of a predetermined line in a predetermined time interval.
5. A device for instantly changing the speed of speech according to claim 3, wherein the connection order generator includes,
a read/write memory for storing the extension scaling factors in time for the respective attributes, and
a connection order deciding processor for reading the the extension scaling factors in time for the respective attributes stored in the read/write memory at a predetermined time interval, and generating the block connection order of the block speech data and the connection data at every moment based on the extension scaling factors, the block lengths output from the block data storing portion, and the already-connected information output from the speech data connector.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP9061015A JP2955247B2 (en) | 1997-03-14 | 1997-03-14 | Speech speed conversion method and apparatus |
JP9-061015 | 1997-03-19 | ||
PCT/JP1998/001063 WO1998041976A1 (en) | 1997-03-14 | 1998-03-13 | Speaking speed changing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
US6205420B1 true US6205420B1 (en) | 2001-03-20 |
Family
ID=13159086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/180,429 Expired - Lifetime US6205420B1 (en) | 1997-03-14 | 1998-03-13 | Method and device for instantly changing the speed of a speech |
Country Status (10)
Country | Link |
---|---|
US (1) | US6205420B1 (en) |
EP (1) | EP0910065B1 (en) |
JP (1) | JP2955247B2 (en) |
KR (1) | KR100283421B1 (en) |
CN (1) | CN1101581C (en) |
CA (1) | CA2253749C (en) |
DE (1) | DE69816221T2 (en) |
DK (1) | DK0910065T3 (en) |
NO (1) | NO316414B1 (en) |
WO (1) | WO1998041976A1 (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6490553B2 (en) * | 2000-05-22 | 2002-12-03 | Compaq Information Technologies Group, L.P. | Apparatus and method for controlling rate of playback of audio data |
US20030165325A1 (en) * | 2002-03-01 | 2003-09-04 | Blair Ronald Lynn | Trick mode audio playback |
US6671292B1 (en) * | 1999-06-25 | 2003-12-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and system for adaptive voice buffering |
US20040002868A1 (en) * | 2002-05-08 | 2004-01-01 | Geppert Nicolas Andre | Method and system for the processing of voice data and the classification of calls |
US20040006482A1 (en) * | 2002-05-08 | 2004-01-08 | Geppert Nicolas Andre | Method and system for the processing and storing of voice information |
US20040006464A1 (en) * | 2002-05-08 | 2004-01-08 | Geppert Nicolas Andre | Method and system for the processing of voice data by means of voice recognition and frequency analysis |
US20040037398A1 (en) * | 2002-05-08 | 2004-02-26 | Geppert Nicholas Andre | Method and system for the recognition of voice information |
US20040042591A1 (en) * | 2002-05-08 | 2004-03-04 | Geppert Nicholas Andre | Method and system for the processing of voice information |
US20040073424A1 (en) * | 2002-05-08 | 2004-04-15 | Geppert Nicolas Andre | Method and system for the processing of voice data and for the recognition of a language |
US20040090555A1 (en) * | 2000-08-10 | 2004-05-13 | Magdy Megeid | System and method for enabling audio speed conversion |
US20050027523A1 (en) * | 2003-07-31 | 2005-02-03 | Prakairut Tarlton | Spoken language system |
US20050149329A1 (en) * | 2002-12-04 | 2005-07-07 | Moustafa Elshafei | Apparatus and method for changing the playback rate of recorded speech |
US20050228672A1 (en) * | 2004-04-01 | 2005-10-13 | International Business Machines Corporation | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US6993246B1 (en) | 2000-09-15 | 2006-01-31 | Hewlett-Packard Development Company, L.P. | Method and system for correlating data streams |
US20060187770A1 (en) * | 2005-02-23 | 2006-08-24 | Broadcom Corporation | Method and system for playing audio at a decelerated rate using multiresolution analysis technique keeping pitch constant |
US20070238443A1 (en) * | 2006-04-07 | 2007-10-11 | Richardson Roger D | Method and device for restricted access contact information datum |
US20080262856A1 (en) * | 2000-08-09 | 2008-10-23 | Magdy Megeid | Method and system for enabling audio speed conversion |
US20100106495A1 (en) * | 2007-02-27 | 2010-04-29 | Nec Corporation | Voice recognition system, method, and program |
US20100169075A1 (en) * | 2008-12-31 | 2010-07-01 | Giuseppe Raffa | Adjustment of temporal acoustical characteristics |
US20130325456A1 (en) * | 2011-01-28 | 2013-12-05 | Nippon Hoso Kyokai | Speech speed conversion factor determining device, speech speed conversion device, program, and storage medium |
US9036844B1 (en) | 2013-11-10 | 2015-05-19 | Avraham Suhami | Hearing devices based on the plasticity of the brain |
US20160379669A1 (en) * | 2014-01-28 | 2016-12-29 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US20170004848A1 (en) * | 2014-01-24 | 2017-01-05 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US20170032804A1 (en) * | 2014-01-24 | 2017-02-02 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9907509B2 (en) | 2014-03-28 | 2018-03-06 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential frequency energy, recording medium and device for performing the method |
US9916845B2 (en) | 2014-03-28 | 2018-03-13 | Foundation of Soongsil University—Industry Cooperation | Method for determining alcohol use by comparison of high-frequency signals in difference signal, and recording medium and device for implementing same |
US9943260B2 (en) | 2014-03-28 | 2018-04-17 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential energy in time domain, recording medium and device for performing the method |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002050798A2 (en) * | 2000-12-18 | 2002-06-27 | Digispeech Marketing Ltd. | Spoken language teaching system based on language unit segmentation |
KR100445342B1 (en) * | 2001-12-06 | 2004-08-25 | 박규식 | Time scale modification method and system using Dual-SOLA algorithm |
KR100486734B1 (en) * | 2003-02-25 | 2005-05-03 | 삼성전자주식회사 | Method and apparatus for text to speech synthesis |
TWI312500B (en) | 2006-12-08 | 2009-07-21 | Micro Star Int Co Ltd | Method of varying speech speed |
JP4390289B2 (en) | 2007-03-16 | 2009-12-24 | 国立大学法人電気通信大学 | Playback device |
JP5093648B2 (en) | 2007-05-07 | 2012-12-12 | 国立大学法人電気通信大学 | Playback device |
CN101989252B (en) * | 2009-07-30 | 2012-10-03 | 华晶科技股份有限公司 | Numerical analyzing method and system of continuous data |
JP6912303B2 (en) * | 2017-07-20 | 2021-08-04 | 東京瓦斯株式会社 | Information processing equipment, information processing methods, and programs |
CN113611325B (en) * | 2021-04-26 | 2023-07-04 | 珠海市杰理科技股份有限公司 | Voice signal speed change method and device based on clear and voiced sound and audio equipment |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0193795A (en) | 1987-10-06 | 1989-04-12 | Nippon Hoso Kyokai <Nhk> | Enunciation speed conversion for voice |
JPH03123397A (en) | 1989-10-06 | 1991-05-27 | Matsushita Electric Ind Co Ltd | Device and method for converting voice speed |
US5073938A (en) * | 1987-04-22 | 1991-12-17 | International Business Machines Corporation | Process for varying speech speed and device for implementing said process |
EP0527527A2 (en) | 1991-08-09 | 1993-02-17 | Koninklijke Philips Electronics N.V. | Method and apparatus for manipulating pitch and duration of a physical audio signal |
US5305420A (en) * | 1991-09-25 | 1994-04-19 | Nippon Hoso Kyokai | Method and apparatus for hearing assistance with speech speed control function |
JPH06202691A (en) | 1993-01-07 | 1994-07-22 | Nippon Telegr & Teleph Corp <Ntt> | Control method for speech information reproducing peed |
JPH06222794A (en) | 1993-01-25 | 1994-08-12 | Matsushita Electric Ind Co Ltd | Voice speed conversion method |
JPH07191695A (en) | 1993-11-17 | 1995-07-28 | Sanyo Electric Co Ltd | Speaking speed conversion device |
JPH0883095A (en) | 1994-09-14 | 1996-03-26 | Nippon Hoso Kyokai <Nhk> | Method and device for speech speed conversion |
JPH09152889A (en) | 1995-11-29 | 1997-06-10 | Sanyo Electric Co Ltd | Speech speed transformer |
US6009386A (en) * | 1997-11-28 | 1999-12-28 | Nortel Networks Corporation | Speech playback speed change using wavelet coding, preferably sub-band coding |
JP3123397B2 (en) | 1995-07-14 | 2001-01-09 | トヨタ自動車株式会社 | Variable steering angle ratio steering system for vehicles |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69024919T2 (en) * | 1989-10-06 | 1996-10-17 | Matsushita Electric Ind Co Ltd | Setup and method for changing speech speed |
DE69428612T2 (en) * | 1993-01-25 | 2002-07-11 | Matsushita Electric Ind Co Ltd | Method and device for carrying out a time scale modification of speech signals |
-
1997
- 1997-03-14 JP JP9061015A patent/JP2955247B2/en not_active Expired - Lifetime
-
1998
- 1998-03-13 KR KR1019980709078A patent/KR100283421B1/en not_active IP Right Cessation
- 1998-03-13 EP EP98907216A patent/EP0910065B1/en not_active Expired - Lifetime
- 1998-03-13 WO PCT/JP1998/001063 patent/WO1998041976A1/en active IP Right Grant
- 1998-03-13 CN CN98800250A patent/CN1101581C/en not_active Expired - Lifetime
- 1998-03-13 DK DK98907216T patent/DK0910065T3/en active
- 1998-03-13 DE DE69816221T patent/DE69816221T2/en not_active Expired - Lifetime
- 1998-03-13 US US09/180,429 patent/US6205420B1/en not_active Expired - Lifetime
- 1998-03-13 CA CA002253749A patent/CA2253749C/en not_active Expired - Lifetime
- 1998-11-13 NO NO19985301A patent/NO316414B1/en not_active IP Right Cessation
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5073938A (en) * | 1987-04-22 | 1991-12-17 | International Business Machines Corporation | Process for varying speech speed and device for implementing said process |
JPH0193795A (en) | 1987-10-06 | 1989-04-12 | Nippon Hoso Kyokai <Nhk> | Enunciation speed conversion for voice |
JPH03123397A (en) | 1989-10-06 | 1991-05-27 | Matsushita Electric Ind Co Ltd | Device and method for converting voice speed |
EP0527527A2 (en) | 1991-08-09 | 1993-02-17 | Koninklijke Philips Electronics N.V. | Method and apparatus for manipulating pitch and duration of a physical audio signal |
US5305420A (en) * | 1991-09-25 | 1994-04-19 | Nippon Hoso Kyokai | Method and apparatus for hearing assistance with speech speed control function |
JPH06202691A (en) | 1993-01-07 | 1994-07-22 | Nippon Telegr & Teleph Corp <Ntt> | Control method for speech information reproducing peed |
JPH06222794A (en) | 1993-01-25 | 1994-08-12 | Matsushita Electric Ind Co Ltd | Voice speed conversion method |
JPH07191695A (en) | 1993-11-17 | 1995-07-28 | Sanyo Electric Co Ltd | Speaking speed conversion device |
JPH0883095A (en) | 1994-09-14 | 1996-03-26 | Nippon Hoso Kyokai <Nhk> | Method and device for speech speed conversion |
JP3123397B2 (en) | 1995-07-14 | 2001-01-09 | トヨタ自動車株式会社 | Variable steering angle ratio steering system for vehicles |
JPH09152889A (en) | 1995-11-29 | 1997-06-10 | Sanyo Electric Co Ltd | Speech speed transformer |
US6009386A (en) * | 1997-11-28 | 1999-12-28 | Nortel Networks Corporation | Speech playback speed change using wavelet coding, preferably sub-band coding |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6671292B1 (en) * | 1999-06-25 | 2003-12-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and system for adaptive voice buffering |
US6505153B1 (en) | 2000-05-22 | 2003-01-07 | Compaq Information Technologies Group, L.P. | Efficient method for producing off-line closed captions |
US6490553B2 (en) * | 2000-05-22 | 2002-12-03 | Compaq Information Technologies Group, L.P. | Apparatus and method for controlling rate of playback of audio data |
US20080262856A1 (en) * | 2000-08-09 | 2008-10-23 | Magdy Megeid | Method and system for enabling audio speed conversion |
US20040090555A1 (en) * | 2000-08-10 | 2004-05-13 | Magdy Megeid | System and method for enabling audio speed conversion |
US6993246B1 (en) | 2000-09-15 | 2006-01-31 | Hewlett-Packard Development Company, L.P. | Method and system for correlating data streams |
US20030165325A1 (en) * | 2002-03-01 | 2003-09-04 | Blair Ronald Lynn | Trick mode audio playback |
WO2003075262A1 (en) * | 2002-03-01 | 2003-09-12 | Thomson Licensing S.A. | Trick mode audio playback |
US7149412B2 (en) | 2002-03-01 | 2006-12-12 | Thomson Licensing | Trick mode audio playback |
KR100930610B1 (en) | 2002-03-01 | 2009-12-09 | 톰슨 라이센싱 | Trick mode audio playback |
US20040002868A1 (en) * | 2002-05-08 | 2004-01-01 | Geppert Nicolas Andre | Method and system for the processing of voice data and the classification of calls |
US20040073424A1 (en) * | 2002-05-08 | 2004-04-15 | Geppert Nicolas Andre | Method and system for the processing of voice data and for the recognition of a language |
US20040042591A1 (en) * | 2002-05-08 | 2004-03-04 | Geppert Nicholas Andre | Method and system for the processing of voice information |
US20040037398A1 (en) * | 2002-05-08 | 2004-02-26 | Geppert Nicholas Andre | Method and system for the recognition of voice information |
US20040006464A1 (en) * | 2002-05-08 | 2004-01-08 | Geppert Nicolas Andre | Method and system for the processing of voice data by means of voice recognition and frequency analysis |
US20040006482A1 (en) * | 2002-05-08 | 2004-01-08 | Geppert Nicolas Andre | Method and system for the processing and storing of voice information |
US7406413B2 (en) | 2002-05-08 | 2008-07-29 | Sap Aktiengesellschaft | Method and system for the processing of voice data and for the recognition of a language |
US7343288B2 (en) | 2002-05-08 | 2008-03-11 | Sap Ag | Method and system for the processing and storing of voice information and corresponding timeline information |
US20050149329A1 (en) * | 2002-12-04 | 2005-07-07 | Moustafa Elshafei | Apparatus and method for changing the playback rate of recorded speech |
US7143029B2 (en) | 2002-12-04 | 2006-11-28 | Mitel Networks Corporation | Apparatus and method for changing the playback rate of recorded speech |
US20050027523A1 (en) * | 2003-07-31 | 2005-02-03 | Prakairut Tarlton | Spoken language system |
US7412378B2 (en) | 2004-04-01 | 2008-08-12 | International Business Machines Corporation | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US20080262837A1 (en) * | 2004-04-01 | 2008-10-23 | International Business Machines Corporation | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US20050228672A1 (en) * | 2004-04-01 | 2005-10-13 | International Business Machines Corporation | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US7848920B2 (en) | 2004-04-01 | 2010-12-07 | Nuance Communications, Inc. | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US20060187770A1 (en) * | 2005-02-23 | 2006-08-24 | Broadcom Corporation | Method and system for playing audio at a decelerated rate using multiresolution analysis technique keeping pitch constant |
US20070238443A1 (en) * | 2006-04-07 | 2007-10-11 | Richardson Roger D | Method and device for restricted access contact information datum |
US7643820B2 (en) * | 2006-04-07 | 2010-01-05 | Motorola, Inc. | Method and device for restricted access contact information datum |
US20100069055A1 (en) * | 2006-04-07 | 2010-03-18 | Motorola, Inc. | Method and device for restricted access contact information datum |
US8204491B2 (en) | 2006-04-07 | 2012-06-19 | Motorola Mobility, Inc. | Method and device for restricted access contact information datum |
US20100106495A1 (en) * | 2007-02-27 | 2010-04-29 | Nec Corporation | Voice recognition system, method, and program |
US8417518B2 (en) * | 2007-02-27 | 2013-04-09 | Nec Corporation | Voice recognition system, method, and program |
US20100169075A1 (en) * | 2008-12-31 | 2010-07-01 | Giuseppe Raffa | Adjustment of temporal acoustical characteristics |
US8447609B2 (en) * | 2008-12-31 | 2013-05-21 | Intel Corporation | Adjustment of temporal acoustical characteristics |
US20130325456A1 (en) * | 2011-01-28 | 2013-12-05 | Nippon Hoso Kyokai | Speech speed conversion factor determining device, speech speed conversion device, program, and storage medium |
US9129609B2 (en) * | 2011-01-28 | 2015-09-08 | Nippon Hoso Kyokai | Speech speed conversion factor determining device, speech speed conversion device, program, and storage medium |
US9036844B1 (en) | 2013-11-10 | 2015-05-19 | Avraham Suhami | Hearing devices based on the plasticity of the brain |
US20170004848A1 (en) * | 2014-01-24 | 2017-01-05 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US20170032804A1 (en) * | 2014-01-24 | 2017-02-02 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9899039B2 (en) * | 2014-01-24 | 2018-02-20 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9934793B2 (en) * | 2014-01-24 | 2018-04-03 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US20160379669A1 (en) * | 2014-01-28 | 2016-12-29 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9916844B2 (en) * | 2014-01-28 | 2018-03-13 | Foundation Of Soongsil University-Industry Cooperation | Method for determining alcohol consumption, and recording medium and terminal for carrying out same |
US9907509B2 (en) | 2014-03-28 | 2018-03-06 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential frequency energy, recording medium and device for performing the method |
US9916845B2 (en) | 2014-03-28 | 2018-03-13 | Foundation of Soongsil University—Industry Cooperation | Method for determining alcohol use by comparison of high-frequency signals in difference signal, and recording medium and device for implementing same |
US9943260B2 (en) | 2014-03-28 | 2018-04-17 | Foundation of Soongsil University—Industry Cooperation | Method for judgment of drinking using differential energy in time domain, recording medium and device for performing the method |
Also Published As
Publication number | Publication date |
---|---|
DE69816221T2 (en) | 2004-02-05 |
DK0910065T3 (en) | 2003-10-27 |
KR100283421B1 (en) | 2001-03-02 |
NO316414B1 (en) | 2004-01-19 |
NO985301L (en) | 1998-12-16 |
EP0910065A1 (en) | 1999-04-21 |
WO1998041976A1 (en) | 1998-09-24 |
CN1101581C (en) | 2003-02-12 |
JPH10257596A (en) | 1998-09-25 |
DE69816221D1 (en) | 2003-08-14 |
KR20000010930A (en) | 2000-02-25 |
CA2253749C (en) | 2002-08-13 |
JP2955247B2 (en) | 1999-10-04 |
CA2253749A1 (en) | 1998-09-24 |
EP0910065A4 (en) | 2000-02-23 |
NO985301D0 (en) | 1998-11-13 |
CN1219264A (en) | 1999-06-09 |
EP0910065B1 (en) | 2003-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6205420B1 (en) | Method and device for instantly changing the speed of a speech | |
US5611018A (en) | System for controlling voice speed of an input signal | |
KR101334366B1 (en) | Method and apparatus for varying audio playback speed | |
EP1944753A2 (en) | Method and device for detecting voice sections, and speech velocity conversion method and device utilizing said method and device | |
EP0829851B1 (en) | Voice speed converter | |
US6085157A (en) | Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound | |
JPS5982608A (en) | System for controlling reproducing speed of sound | |
JP2001184100A (en) | Speaking speed converting device | |
JP3379348B2 (en) | Pitch converter | |
JP3378672B2 (en) | Speech speed converter | |
JP3081469B2 (en) | Speech speed converter | |
JP3373933B2 (en) | Speech speed converter | |
JP3219892B2 (en) | Real-time speech speed converter | |
JP3357742B2 (en) | Speech speed converter | |
JPH09152889A (en) | Speech speed transformer | |
KR100359988B1 (en) | real-time speaking rate conversion system | |
JP3102553B2 (en) | Audio signal processing device | |
JP2002297200A (en) | Speaking speed converting device | |
JP2001154684A (en) | Speech speed converter | |
JPH09146587A (en) | Speech speed changer | |
KR100372576B1 (en) | Method of Processing Audio Signal | |
JPH07210192A (en) | Method and device for controlling output data | |
JPH04367898A (en) | Method and device for voice reproduction | |
JPH0698398A (en) | Non-voice section detecting/expanding device/method | |
JPH06337696A (en) | Device and method for controlling speed conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NIPPON HOSO KYOKAI, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAKAGI, TOHRU;SEIYAMA, NOBUMASA;IMAI, ATSUSHI;AND OTHERS;REEL/FRAME:009733/0798 Effective date: 19981106 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |