WO2001019066A1

WO2001019066A1 - Method, system and software product for transmitting speech on internet

Info

Publication number: WO2001019066A1
Application number: PCT/FI2000/000759
Authority: WO
Inventors: Paavo Eskelinen
Original assignee: Voxlab Oy
Priority date: 1999-09-08
Filing date: 2000-09-07
Publication date: 2001-03-15
Also published as: AU7004200A; FI19991914A

Abstract

The invention relates to a method, system and software product for transmitting speech on the Internet. The method comprises: (402) compressing data containing digitized speech into a file at the transmission by means of compression software; (404) arranging into the file decompression software for decompressing the compression employed; (406) dividing the file into data packets; (408) transmitting the packets to the recipient over the Internet; (410) decompressing at the reception end the received compressed data by using the received decompression software.

Description

METHOD, SYSTEM AND SOFTWARE PRODUCT FOR TRANSMITTING SPEECH ON INTERNET

FIELD OF THE INVENTION

The invention relates to a method for transmitting speech on the Internet, particularly data containing compressed speech.

BACKGROUND OF THE INVENTION

In addition to the public switched telephone network, a call can be implemented using the Internet. In such case there is no circuit-switched connection of fixed data transfer capacity reserved for the call, but speech is transferred in packets using a packet-switched data transfer connection available through the Internet. Significant problems arise from delays that may occur in the data transfer. One way of minimizing the delays is to minimize the amount of data to be transferred.

In telephones connected to the public switched telephone network, speech is coded using PCM (Pulse Code Modulation). PCM is a three-phase process where speech is first sampled at a rate which is twice the highest frequency, i.e. 2 x 4000 Hz = 8000 Hz. The samples are quantized to 256 separate levels. Finally each quantized sample is coded to provide an 8-bit code word, whereby 8000 samples x 8 bits/sample = 64 000 bit/s, i.e. the transfer rate required by a standard telephone connection is 64 kbit/s. In practice the coding only means that the samples are digitized.

Efficient speech codecs have been developed for Internet calls, as well as for mobile networks, to allow the amount of data to be transferred to be minimized. The term 'speech codec' already indicates that the codec is re- sponsible for coding and decoding speech. The specification H.323 of the ITU (Internal Telecommunications Union), for example, defines a speech codec functioning at the rate of 5.3 kbit/s. However, the audio quality of speech produced by this codec is not as good as the quality of speech produced using the PCM. The manufacturers are therefore continuously developing new, more efficient codecs producing audio of increasingly higher quality. The codecs are in fact usually different compression and decompression methods used for packing digitized speech.

The use of proprietary speech codecs involves major problems re- lated to the incompatibility of the speech codecs of different manufacturers, in other words, data compressed with a speech codec cannot be decoded with another speech codec. The reason for this is that the speech codecs of different manufacturers employ different voice compression algorithms, which are usually implemented by software. A prior art solution to the latter problem is that the recipient fetches a speech codec, or at least software for decompressing the compression employed, from the manufacturer's WWW (World Wide Web) pages. This solution involves some drawbacks. The software must be fetched before a connection is activated, but how can the recipient predict an incoming call and, above all, the speech codec that has been used in the call setup? Another problem is that the user usually must pay for the software, unless the software in question is what is known as freeware or shareware, for which payment is usually expected as well.

BRIEF DESCRIPTION OF THE INVENTION It is therefore an object of the present invention to provide a method and an equipment implementing the method which allow the above problems to be solved. This is achieved with the method described below. The method in question is used for transmitting speech on the Internet and it comprises: compressing data containing digitized speech into a file at a transmission end by using compression software. The method further comprises: arranging into the file decompression software for decompressing the compression employed; dividing the file into data packets; transmitting the packets to a recipient over the Internet; decompressing the received compressed data at the reception end by using the received decompression software. The invention further relates to a system for transferring speech on the Internet. The system comprises: compression software at transmission end equipment for compressing data containing digitized speech into a file and for arranging into the file decompression software to be used for decompressing the compression employed; packet transmission software at the transmission end equipment for dividing the file into data packets and for transmitting the packets to the recipient over the Internet; packet transmission software at a reception end for receiving the packets and for assembling the transmitted file from the packets; the reception end being arranged to decompress the received compressed data by using the received decompression software. The invention still further relates to a computer software product for transmitting speech on the Internet, the product comprising software stored into a software storage means and readable into a computer. The software carries out the method steps of: compressing data containing digitized speech into a file at a transmission end by using compression software; arranging into the file decompression software decompressing the compression employed, the recipient using the software at the reception end to decompress the received compressed data; dividing the file into data packets; transmitting the packets to the recipient over the Internet. The preferred embodiments of the invention are disclosed in the dependent claims.

The invention is based on the idea that the recipient of a call does not need worry about obtaining the software for decompressing compressed data packets, but the software is delivered into the recipient's equipment to- gether with the speech packets.

The method and system of the invention provide many advantages. The invention allows speech to be transmitted on the Internet flexibly by using diverse speech codecs, without the drawbacks involved in the prior art. The user does not necessarily need to know the technical details, but s/he may be content that the speech quality and the costs meet his/her requirements. The recipient of speech coded into packets does not need any decoding system to be able to listen to the speech, only a standard computer provided with standard user interfaces and sound cards, or even just an ordinary telephone.

BRIEF DESCRIPTION OF THE DRAWINGS In the following the invention will be described in greater detail in connection with preferred embodiments and with reference to the accompanying drawings, in which

Figures 1A and 1B illustrate different ways of setting up an Internet call; Figure 2 illustrates an example of the content of the data packets to be transmitted;

Figure 3A illustrates an example of a structure of transmission end equipment;

Figure 3B illustrates an example of a structure of reception end equipment; Figure 4 is a flow chart illustrating a method for transmitting speech on the Internet.

DETAILED DESCRIPTION OF THE INVENTION

There are different ways of transmitting speech on the Internet. In the following, two currently used methods will be described by way of example, the invention not being, however, restricted to them.

Figure 1A illustrates the first method. A user 100 typically has a computer 104 which includes a microphone, a sound card and the software needed to convert speech 102 into data. The conversion is made using a speech codec, which can also be used for compressing data. Speech data is transmitted in packets over the Internet 106 to a recipient 112, using for example TCP/IP as the data transfer protocol. In other words, the user's 100 computer 104 and the recipient's 112 computer 108 must be connected to the Internet 106 over an interface 140. The interface 140 to the Internet 106 may be implemented in any prior art manner. A typical way is to use the modem connected to the computer 104 for setting up a public switched connection through a telephone exchange to the server of the Internet service provider. Other alternatives are to use a fixed connection, a cable provided by a cable television network or a wireless radio connection.

A characteristic of the first method is that the speech is coded and converted into packet format in the user's equipment, prior to the Internet interface 140. The server delivers the packets to the recipient's 112 computer 108 which also comprises a speech codec that converts the packets back to speech 110 which is played to the recipient 112 using the sound card and loudspeakers of the computer 108. The above example provides only one alternative for implementing an Internet call, current technology already makes other solutions possible as well.

Figure 1B illustrates another method for implementing an Internet call. The user 100 has a standard analog or digital telephone 114 at his/her disposal. The speech codec of an ordinary analog telephone is located in a switching centre where the speech signal is supplied in an analog form to the speech codec which converts it into a digital form. An ISDN telephone comprises a built-in speech codec, therefore the signal supplied to the switching centre is in a digital form. So-called PBX exchanges, i.e. private exchanges of companies, may comprise a speech codec, in which case the connection from the PBX to the switching centre is digital, or the connection may be analog, in which case the speech codec is located in the switching centre. By calling a specific number with the telephone 114, or by selecting a specific network identifier, the call is connected to a server 118 of an operator providing Internet services. The user 100 is then given the dial tone again and s/he may select the number of the person 112 s/he wishes to call.

The speech codec can thus be located for example in a fixed network switching centre or at the service provider's server. The compression software employed and the software for packet-switched transmission, however, are preferably located at the operator's server. Compressed packets are then delivered over the Internet 106 to the recipient 112 of the call. Speech travels on an interface 152 in an uncoded form, on an interface 154 it travels either coded or uncoded, depending on the location of the speech codec, and on an interface 156 coded packets are transmitted.

It is naturally also possible that the speech codec and the software for packet-switched transmission are in the user's 100 telephone 114, the packets being thus formed already in the telephone 114. In that case the local connection of the user 100 must be digital. Consequently, packets do not need to be formed at the server 118 of the operator providing Internet services, but they only need to be transmitted to the Internet.

The systems illustrated by way of example in Figures 1A and 1B can also be combined. With an equipment integrated into the computer 104, the user can call another user over the Internet 106, the other user having a connection through the server 120 of the service provider to his/her telephone 124 over the public switched telephone network 122, as shown in Figure 1 B.

It can be anticipated that in the future telephone services will be charged according to the amount of bits transferred, instead of the time the line is being occupied, as in current billing. An interesting issue from the user's point of view will therefore be the efficiency of speech coding: the more efficient the speech coding is, the less transfer capacity is needed for transferring the call, and the lower will be the call charge. It is possible that in the future the party making the call may select the speech codec s/he wishes to use. For important calls the user selects a speech codec that ensures good audio qual- ity but requires a large amount of transfer capacity. For less important calls the user selects a speech codec that ensures understandable speech. In practice current speech codecs of fixed network telephones are chips, in mobile telephones they are provided by an optimized digital signal processor with the related software, and in Internet calls by a standard computer processor, the speech codec being implemented by software alone, without any special circuits.

Consequently, there may be any number of different speech codecs available in the future. Each user may have personal preferences or financial reasons, for example, for preferring a specific speech codec. However, the recipient of the call must have the same speech codec as the party initiating the call at his/her disposal, or at least its decoding portion, otherwise the speech coding cannot be decoded at the reception end.

There are currently different standardization bodies, such as the ETSI and the ITU, which set standards for example for accepted speech codecs. A problem that arises is therefore whether a speech codec used by a user is accepted by the standardization body, in which case the speech codecs used by users support it. If the speech codec is not accepted, it is not necessarily supported either, which makes it unsuitable for large scale use. Or, if the user wishes to use a specific speech codec, s/he must make sure in advance that the recipient of the call also has the speech codec in question at his/her disposal.

In Internet calls the speech codec is then transmitted to the recipient by packing it already at the transmitter 100 end into a file comprising compressed speech data. The file is transmitted in a packet-switched format, i.e. it is divided into data packets, over the Internet to the recipient. The computer 110 of the recipient 112 then only releases the speech codec from the packets (or just a speech decoder), installs it and starts to decompress speech from the file consisting of the packets. Another alternative is to run the speech codec only when a speech file is to be listened to.

Figure 2 illustrates an example of the content of the data packets to be transmitted. The rectangular areas depict the packets. For the sake of clarity, details required by the data transfer protocol employed are not shown, but only the payload to be transferred in the packets.

The upper part of the example illustrates the transmission of three packets. Two of the packets comprise only compressed speech data 200C, 200B, one packet comprising both compressed speech data 200A and decompression software 202. The lower example shown in the Figure is otherwise the same as the upper one, except for the decompression software 202A, 202B, which is now longer and therefore requires the transfer capacity of two packets. In practice there are usually more packets than those shown in the Figure. Figure 4 is a flow diagram illustrating the method of the invention.

The execution of the method starts at block 400. In block 402 data containing digitized speech is compressed at the transmission end into a file by using compression software. Next, in block 404, decompression software for decompressing the compression used is added into the file that comprises the compressed data.

In block 406 the file is divided into data packets, for example as shown in Figure 2, decompression software decompressing the compression employed being placed at least in one packet containing compressed data, or into some other packet. In block 408 the packets are transmitted to the recipient over the

Internet. At the reception end in block 410, the compressed data is decompressed using the received decompression software.

In an embodiment, the decompression software installed at the reception end is not made as a permanent part of the reception end equipment, but the software is only carried out in the reception end processor when a received file containing speech is to be listened to. This provides the advantage that the listening of a speech message or a call does not cause permanent changes to the software in the recipient's equipment. The decompression software is run in the random access memory of the recipient's equipment for decompressing data packets or a compiled data file, the speech data obtained being then played to the recipient. When the data has been listened to, the decompression software is not left permanently installed into the recipient's equipment. Another option is to store a decompression software file at the reception end. This may provide an advantage in that a user using the same compression software would not need to supply the decompression software any more to the user in question. Compressed speech data can also be stored as a file at the reception end. In that case the recipient in a way records the call, i.e. s/he may listen again at least to the calling party's portion. The decompression software and the compressed speech data are usually both stored into the same file at the reception end, which makes it easy to arrange the message to be played. In a preferred embodiment, software compressing speech data is arranged into at least one packet. This allows also the recipient of the call to use the compression used by the caller, either during the same call or later.

Although the examples illustrate the implementing of an Internet call, the system can also be used for carrying out other speech transmission applications, such as voice mail. Voice mail is unidirectional communication that does not take place in real-time.

With reference to Figures 3A and 3B, an example of an equipment needed in the example according to Figure 1A will be described. Figure 3A shows the transmission end equipment and Figure 3B the reception end equipment.

A computer 104, 108 comprises a display 300, keyboard 302, mouse 304, sound card 314, at least one loudspeaker 306 connected to the sound card, a microphone 308 connected to the sound card, a device provid- ing access to the Internet, e.g. a modem 310, a mass memory device, such as a hard disk 312, and a central processing unit 320. The central processing unit 320 is used for carrying out the operating system and the application software. From the point of view of the invention, the most important application software is the one providing an interface that allows the sound card 314 to be used. This software can be used for playing for example .wav files and for recording .wav files. In addition, the computer comprises telecommunications means, together with the software involved, such as the modem 310 and packet transmission software 324, that can be used for establishing a connection for example through the public switched telephone network to the server of the Internet service provider. Today's standard computers and their operating systems comprise the described elements. The Windows environment, for example, includes a specific sound reproduction service, MCI, that provides an API (Application Programmers Interdace) speech codec for use. The API is always the same, irrespective of the sound card, because there are a plural number of different drivers in the Windows environment for different sound cards. The MCI alone cannot, however, be used as a speech codec because it is not efficient enough: when voice is to be created, the amount of data may even double.

Furthermore, in accordance with the invention, the software com- prises compression software 326 at least at the transmission end for packing data comprising voice. The transmission end must comprise decompression software 202 which is arranged into the packets together with the speech data 200. In a voice mail application, for example, the speech data 200 and the decompression software 202 decompressing the speech data are received from the network through the telecommunications means 310; the decompression software then decompresses the packets, plays them with a player 308, the sound being created in the sound card 314 and transmitted to the loud- speacker 306 which reproduces the sound. Correspondingly, voice mail may be created by recording speech with the microphone 308 and the sound card 314, in which case the compression software 326 packs the sound into a file. The file is converted into packets using packet transmission software 324 and transmitted into the network using the telecommunications means 310. For the recipient, the file is provided with the decompression software 202.

The invention requires an efficient speech codec that can be accommodated into a small space and implemented for example as a Java class, which allows the amount of data needed for transmitting the codec to be minimized. The computer 104 must then naturally comprise a support for the Java. The Java is not, however, the only technology available, but the speech codec can also be implemented by applying other prior art means.

It is to be noted that although the example describes the invention in connection with an Internet call, the invention is not restricted thereto, but it can be utilized in principle in connection with any technology platform employing packet-switched traffic for implementing a call. Consequently, the speech codec may also be located in a telephone answering machine, in a device of the Nokia Communicator-type, in a PDA (Personal Digital Assistant), etc. Similarly, the telephone connection can be established using an analog or a digital telephone connection, as described above, or over the radio path, using a mobile telephone, for example, or over a cable television network, a wireless subscriber connection (Wireless Local Loop), etc.

Although the invention is described above with reference to an example according to the accompanying drawings, it is apparent that the invention is not restricted to it, but may vary in many ways within the inventive idea disclosed in the claims.

Claims

1. A method for transmitting speech on the Internet, the method comprising

(402) compressing data containing digitized speech into a file at a transmission end by using compression software; characterized in by

(404) arranging into the file decompression software for decompressing the compression employed;

(406) dividing the file into data packets; (408) transmitting the packets to the recipient over the Internet;

(410) decompressing the received, compressed data at a reception end by using the received decompression software.

2. A method according to claim 1, characterized in that the decompression software is not installed at the reception end as a permanent part of the reception end equipment software, but it is only carried out in the reception end processor when a received file containing speech is to be listened to.

3. A method according to claim 1, characterized in that the decompression software is stored as a separate file at the reception end.

4. A method according to claim 1, characterized in that the compressed speech data is stored into a file at the reception end.

5. A method according to claim 1, characterized in that at least one packet is provided with compression software for compressing speech data.

6. A method according to claim 1, characterized in that the transmission of speech implements an Internet call.

7. A method according to claim 1, characterized in that the transmission of speech implements a voice mail system.

8. A system for transmitting speech on the Internet, charac- t e r i z e d in that the system comprises compression software (326) at transmission end equipment for compressing data containing digitized speech into a file and for arranging into the file decompression software (202) to be used for decompressing the compression employed; packet transmission software (324) at the transmission end equipment for dividing the file into data packets and for transmitting the packets to the recipient over the Internet (106); packet transmission software (324) at the reception end for receiv- ing the packets and for assembling the transmitted file from the packets; the reception end being arranged to decompress the received compressed data by using the received decompression software (202).

9. A system according to claim 8, characterized in that the decompression software (202) is not installed as a permanent part of the soft- ware in the reception end equipment, but it is only carried out in the reception end processor when a received file containing speech is to be listened to.

10. A system according to claim 8, characterized in that the decompression software (202) is stored as a separate file at the reception end.

11. A system according to claim 8, characterized in that the compressed speech data is stored into a file at the reception end.

12. A system according to claim 8, characterized in that at least one packet is provided with compression software for compressing speech data.

13. A system according to claim 8, characterized in that the system implements an Internet call.

14. A system according to claim 8, characterized in that the system implements a voice mail system.

15. A computer software product for transmitting speech on the Internet, the product comprising software stored into a software storage means and readable into a computer, characterized in that the software carries out the method steps of

(402) compressing data containing digitized speech into a file at a transmission end by using compression software;

(404) arranging into the file decompression software for decom- pressing the compression employed, the recipient using the software at the reception end to decompress the received compressed data; (406) dividing the file into data packets; (408) transmitting the packets to the recipient over the Internet.