WO2001022711A1

WO2001022711A1 - System and method for distribution of telephone audio data via a computer network

Info

Publication number: WO2001022711A1
Application number: PCT/US2000/025688
Authority: WO
Inventors: Lynda Meyer; Jeffrey Markel; Jeffrey O'connell
Original assignee: Net Technologies, Inc.
Priority date: 1999-09-20
Filing date: 2000-09-20
Publication date: 2001-03-29
Also published as: WO2001022711A9; AU7592600A

Abstract

The instant invention is a system for capturing audio data and bradcasting it accross a computer network, either on demand or in real time. The invention is composed of the DORETELL Interactive Voice Response server (DIVR) (108), which is outfitted with an interface cord (106) through which it can access the telephone network and receive calls placed using standard telephone equipment (104). The DIVR (108) acts according to user profile and other assorted data contained in a system database (112) and can execute workflow and publishing software located on DIVR or on a physically distinct computer. Incoming audio data is passed to a DTRouter (118), which uses data retrieved from the system database (112) to send the audio to one or more encoders (120). Once encoded, the audio data is either broadcast live by a streaming media server or servers (124) or stored on one or more data storage device (122) for playback on-demand.

Description

SYSTEM AND METHOD FOR DISTRIBUTION OF TELEPHONE AUDIO DATA VIA A COMPUTER NETWORK

Applicant hereby claims the benefit of provisional application no. 60/154,769, filed September 20, 1999 and provisional application no. 60/179,831, filed February 2, 2000, which applications are hereby incorporated by reference into this application in their entirety.

COPYRIGHT NOTICE A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION The invention disclosed herein relates generally to audio streaming systems. More particularly, the present invention relates to a system and method for streaming audio data received from a telephone network over a computer network to client workstations.

The Internet has been heralded as perhaps the most important communication achievement since the advent of the printing press. The Internet has provided people across the world with a new medium by which to publish their writings, be they educational or entertainment. The publishing industry has traditionally been associated with a variety of barriers to entry, including the demands of editors and corporate publishing houses. With the rise of the Internet, however, authors have the ability to publish their works in any manner that they feel fit, allowing them to tackle controversial or other issues that would not be deemed acceptable by a publisher.

In the broadcast arena, similar restraints are placed on audio and video authors regarding the scope of issues that they are allowed to present. Audio and video authors can use the Internet in the same manner as authors to disseminate their works without restrictions imposed by third parties. Instead of utilizing the public airwaves, individuals can distribute their message across the public Internet, thereby bypassing the traditional broadcasting infrastructure and its variety of limitations. With the elimination of these barriers to entry, applications such as personal content delivery become possible. When this flexibility is combined with the ubiquity of the public telephone network, individuals and corporations have the ability to deliver any type of audio to disparate computer clients across the globe using simple telephone equipment. Indeed, the applications presented by the marriage of computer and audio technology are limitless.

While there are many different ways through which a computer can receive audio data for retransmission across a computer network, existing systems fail to either provide both live and on-demand broadcasting or offer unacceptable audio quality. One way for a computer system to receive audio data from a telephone network is through the use of a "Gentner Box", which is a type of hybrid audio coupler. The box acts as an interface between the computer and the telephone network. One input on the box carries data to and from the computer, while a separate input is connected to the telephone network. Data is received from the telephone network and passed to the computer, via the Gentner Box, for encoding and distribution. This solution, however, suffers from the significant shortcoming of being able to host only one connection at a time and, therefore, encodes only one session at a time.

Other solutions to the problem involve simultaneously recoding a plurality of audio streams from a telephone network. As audio data is received, it is stored in a file and dropped to a pre-defined directory. On a regular basis, for example, through the use of UNIX cron job, the audio data files are read and passed to an encoder or encoders. While this configuration allows for multiple callers to record audio data at the same time, it fails to allow these data to be broadcast across a computer network in real-time. At most, this system is suitable for on demand broadcast of audio data.

Commercial vendors have also offered solutions to this problem. For example, TellSoft, Inc., of Colorado Springs, Colorado, has developed a system for encoding and broadcasting audio data received from a telephone network. TellSoft offers a "one box" solution to the problem, with no isolation of the several services that must interoperate to perform the desired functionality. This lack of isolation prevents scalability, which allows the system to receive and encode multiple audio signals with optimal quality. This solution, therefore, is not suitable when the quality of the audio recording is an important consideration.

U.S. Patent No. 5,675,507 ('507), entitled "Message Storage and Delivery System", offers another solution to the problem. The '507 patent discusses a system and method for receiving and distributing facsimile, voice, and data messages. The system retrieves a message stored in VOX or AD/PCM format, which are compressed forms of PCM data. This compressed data is then converted to AU or WAV format, depending on the preferences of the user. By using intermediate file formats such as VOX or AD/PCM as the basis for the conversion, the '507 patent provides inferior audio quality.

There is thus a need for a system and method for encoding and broadcasting telephone audio data in an efficient manner, in real time and on-demand, with audio quality that is a revolutionary improvement over that offered by existing systems.

BRIEF SUMMARY OF THE INVENTION It is an object of the present invention to provide a system and method for broadcasting telephone audio data across a computer network. It is another object of the present invention to provide a system and method that disposes of audio data in a customizable manner according to a set of pre-defined preferences.

It is another object of the present invention to provide a web-based client to manage the system and method of broadcasting audio data across a computer network. The above and other objects are achieved by a system thar receives telephone audio data and directly encodes it to a destination file format without an intermediate transformation of the audio data, e.g., it is never saved in an intermediary file format. The process of direct encoding produces significantly improved audio quality.

Some of the above and other objects of the present invention are also achieved by a system and method that accepts a telephone call from a user using telephone equipment and either broadcasts it live over a computer network, such as the Internet, stores it to a networked data storage device, or both. When a call is received, the system retrieves a user's profile data from a system database. The user's profile includes information that is used by the system to determine the disposition of the audio data, such as whether it should be broadcast live by a specific streaming audio server and/or archived on a storage device, the encoders that should be utilized to compress the data, file names for the compressed audio, etc. This retrieved data is formatted according an XML (Extensible Markup Language) schema that defines the format that the data must adhere to in order for system processing to continue. Alternatively, other proprietary message formatting schemes could be utilized by the present invention.

The XML profile and audio data is channeled to audio routing software that sends the incoming data stream to the appropriate encoder or encoders as defined by the user's XML profile. The data is routed to the appropriate encoder where it is compressed according to one of several available codec algorithms. Exemplary codecs used by the system include QuickTime and RealAudio. The data is compressed and placed on a file system for archiving or broadcast by a streaming audio server, all according to the data contained in the user's XML profile.

The system also has workflow and publisher software. Working together, these two subsystems can retrieve the textual address of audio data from the system database, according to the user id contained in a workflow, and deliver it to any number of destinations by any number of means. For example, the location can be written to an email message and delivered to a user. Alternatively, the location text can be written to a file and transmitted via FTP (File Transfer Protocol) to a particular directory of a web server. The server reads the text out of the file and embeds it as a link in a web page, which can be requested by users. When the requested page is received, the link is activated and the audio data is transmitted to the requesting party for playback. System performance is enhanced because the use of an intermediate file format between the receipt and encode points has been eliminated. Superior audio quality is achieved by receiving raw PCM data from a telephone network and directly encoding it into a destination file format, such as RealAudio, QuickTime, or Windows Media formats. These directly encoded audio files are then transmitted to streaming audio clients across a computer network, either live or on-demand.

Performance is also enhanced by distributing system duties between software running on physically different computers, thereby reducing the workload on any one computer. This modularity also allows for scalability because computers running redundant software can be easily brought into the process by directing data to them. For example, if an overwhelming number of users are simultaneously encoding in the QuickTime format, more QuickTime encoders can be brought on line. Furthermore, the process is expedited by simultaneously routing the incoming audio stream to a plurality of encoders. By directing audio files to specific server clusters, the system can encode locally, where large data streams are being generated, and stream globally, distributing the encoded and compressed files to servers anywhere on the network. The combination of these benefits provides increased performance and the optimal utilization of hardware resources. BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated in the figures of the accompanying drawings, which are meant to be exemplary, and not limiting, in which like references are intended to refer to like or corresponding parts, and in which: Fig. 1 is a diagram presenting a configuration of the various hardware and software components of the system, according to one embodiment of the present invention;

Fig. 2 is a diagram presenting a hardware and software embodiment of the present invention configured to run as an Application Service Provider (ASP);

Fig. 3 is a diagram presenting a hardware and software embodiment of the present invention configured to run as an in-house service at a corporation;

Fig. 4 is a flow diagram presenting a high-level overview of the steps involved in encoding and presenting telephone audio data, according to one embodiment of the present invention;

Fig. 5 is a flow diagram presenting a process executed by an interactive audio system in accepting and processing a call, according to one embodiment of the present invention;

Fig. 6 is a flow diagram presenting a continuation of the process executed by the interactive audio system in accepting and processing a call, according to one embodiment of the present invention; Fig. 7 is an XML profile, according to one embodiment of the present invention;

Fig. 8 is a flow diagram outlining a process for accepting audio data and passing it off to an appropriate encoder for compression, according to one embodiment of the present invention; Fig. 9 is a flow diagram outlining a process of encoding audio data in either

QuickTime™, Real™, or Windows Media™ formats, according to one embodiment of the present invention;

Fig. 10 is a flow diagram outlining a subroutine of initializing the QuickTime™ encoder; Fig 11 is a flow diagram outlining a subroutine of initializing the RealAudio™ encoder; Fig. 12 is a flow diagram outlining a subroutine of initializing the Windows Media™ encoder;

Fig. 13 is a flow diagram outlining a process of encoding audio data in the .wav format, according to one embodiment of the present invention; Fig. 14 is a flow diagram outlining a process of scheduling a live broadcast;

Fig. 15 is a flow diagram presenting a high level overview of a generic workflow process, according to one embodiment of the present invention;

Fig. 16 is a flow diagram presenting a specific workflow for approving the publication of audio data, according to one embodiment of the present invention; Fig. 17 is a flow diagram outlining a process of retrieving and formatting text indicating the location of an encoded audio data file, according to one embodiment of the present invention; and

Fig. 18 is a flow diagram outlining a process of using a web based front end to add, edit, and browse data stored in the system database, according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The instant invention is a system for capturing audio data and broadcasting it across a computer network, either on-demand or in real time. The invention is composed of the DOTELL Interactive Voice Response Server (DIVR) 108, which is outfitted with an interface card 106 through which it can access the telephone network and receive calls placed using standard telephone equipment 104. The DIVR 108 acts according to user profile and other assorted data contained in a system database 112 and can execute workflow and publishing software located on the DIVR or on a physically distinct computer. The DIVR 108 also comprises scheduler and loader software that can be used to broadcast hold audio data in anticipation of the start of a scheduled broadcast. Incoming audio data is passed to a DTRouter 118, which uses data retrieved from the system database 112 to send the audio to one or more encoders 120. Once encoded, the audio data is either broadcast live by a streaming media server or servers 124 or stored on one or more data storage devices 122 for playback on-demand. As will be clear to one skilled in the art, the components comprising the system can reside on a single computer or can be distributed across a plurality of physically separate computers electrically coupled through the use of a computer network, such as the Internet. With reference to Fig. 1, one configuration of the present invention is presented. Audio data author 102 uses a telephone 104 connected to a telephone network to place a call to the system. Telephone systems typically transmit data according to the Pulse Code Modulation (PCM) format. The system is connected to the telephone network through 5 an interface device 106. The interface device allows the system to receive PCM data from the telephone network. The interface device also decodes telephone touch-tone signals, also known as Dual Tone Multi Frequency (DTMF) signals. An exemplary interface would be the line of interface cards manufactured by Dialogic™, which are available in versions to connect to T-l audio networks as well as POTS (Plain Old Telephone System) networks. 10 The interface is supplied in the form of a card that is installed within the computer accessing the phone network.

The interface card is installed within the DOTELL Interactive Voice Response Server (DIVR) 108. Software residing in the DIVR 108 receives and validates an audio author attempting to access the system. The DIVR 108 also validates the identity of an audio 15 author by accessing user profiles stored on a system database 112. The system database 112 is an ODBC (Open Database Connectivity) compliant database that is accessible by the DIVR 108 via a network, and is controlled by a Relational Database Management System (RDBMS) such as those available from Sybase™ or Oracle™.

The DIVR 108 initiates workflow and publishing software 110. This software 20 110 is responsible for executing a series of steps or rules in response to an audio recording by an audio author. Each user's profile defines the particular steps to be executed after a recording is complete. Exemplary actions that could be executed in response to a newly recorded message include emailing a link to the recorded audio data to a manager for approval or sending the text of the link to a web server 126 via FTP for inclusion in a web 25 page. The web page is presented to a user via a web browser running on a client PC 128.

The system database 112 is also connected via a network to an administrative web server 114. The administrative web server 114 is configured to issue queries and updates to the system database and format the result sets for presentation on a web page 116. Through this web based front end 114, system administrators can perform actions such as 30 adding users and companies, editing existing user profiles and configuration, and preview the output of a workflow. The system database 112 holds all the data that describes the system's state and the preferences of individual users. User-centric data includes the name and location of the encoders 120 that are be used to compress the user's audio data, the format of the audio data that is being encoded, whether the user is authorized to conduct a live broadcast, and whether the user needs approval before links to the recorded audio data may be posted to clients on a computer network. 5 After DIVR 108 has validated an author and the recording session begins, the

DIVR accepts audio data and passes it off to the DTRouter 118. The DTRouter 118 receives the audio data in discrete blocks from the DIVR 108. Depending on the user profile of the particular audio author, one or more encoders 120 can be specified to compress the audio data. For each audio encoder 120 specified, the DTRouter 118 creates a copy of the received

10 audio data in a buffer. The DTRouter 118 opens TCP/IP (Transport Control Protocol/Internet Protocol) socket connections to each encoder 120 and passes the data off each time a block of data is received, or as the buffer reaches capacity. The DTRouter 118 makes these decisions based on the user's preference data, which is passed to it as profile containing XML data generated by the DIVR 108 from data contained within the system database 112. A process

15 for generating this XML profile is described in greater detail below. By allowing the dynamic routing of audio data to a plurality of encoders based on user preferences, the system can be configured to optimally utilize its resources, e.g., perform load balancing on each encoder. Indeed, the system can also be configured to comprise multiple DTRouters 118 with the user's preference data or load balancing requirements determining which DTRouter

20 118 is used.

The audio data is passed to one or more encoders 120 for compression. Compression encoders include the RealAudio™ codecs (compressor/decompressor) created by RealNetworks™ Inc., the QuickTime™ codecs developed by Apple Computer™ Inc., the Windows Media™ Format from Microsoft™ Corp, or any other compression encoders

25 known to those of skill in the art. Although it lacks the compression benefits of the other encoders, the system can also save data in the WAV format. Indeed, the system can be configured to utilize any number or types of encoders, once again allowing system resources to be optimally utilized.

After the audio data has been encoded, it is passed to streaming media servers

30 124, data storage devices 122, or both. Both data storage devices 122 and streaming media servers 124 can be located anywhere on the network. The decision as to whether and which device to store the audio data on, e.g., an archival storage device 122 or a streaming media server 124, is determined by the author's XML profile. The profile also indicates the name of the file or files that the data is to be saved to. The name and location of the stored data is written to the system database 112 so it can be accessed by the workflow and publishing systems 110. The workflow and publishing system 110 delivers the location of the audio data to an end user. In most instances, the publisher retrieves and formats text containing the location of the audio data. The workflow system takes the formatted link and transmits it via FTP to a web server 126. The web server 126 dynamically incorporates the link into a web page that is being served to client browsers 128. When a user accessing the web site 126 through a client browser 128 clicks on a link to an audio file, the web server retrieves the data from its location or issues instructions to the appropriate streaming media server 124 to begin streaming the audio data to the client.

Fig. 2 presents another embodiment of the present invention. In this configuration, all the system components are operated by a single entity 202. As in the other configuration, the system components can be distributed geographically so as to achieve the best performance possible. This arrangement is also known as an Application Service Provider or ASP. By hosting all the pieces of the system, the hosting company can provide a full audio streaming service to any number of individuals or organizations.

Fig. 3 presents a third configuration of the present invention 302. There will arise situations where a corporation wants to host the audio capturing functionally of the present invention in-house 302. One advantage of this arrangement is ease of maintenance and the ability to customize all facets of the system to the needs of the particular corporation, such as unique privacy considerations. Another benefit of this configuration is that while the core functionality of the system is run in-house and under the direct control of the company, the company need not bother itself with the details of delivering the audio content to the end users.

The corporation hosting the system in-house has control over the DIVR, system database, encoders, and administrative web server 302. The system connects with storage devices and streaming media servers hosted by the licensor of the system or any other trusted third party with the capability to stream and archive audio sufficient to satisfy the needs of the corporation. These devices may be located on any network accessible by the system. Referring to Fig. 4, a high level overview of the programmatic flow of the system is presented. An audio author places a call to the system using standard telephone equipment 402, which is answered by the DIVR 404. The caller's identity is authenticated against his or her record in the system database 406. If the author has supplied data that is invalid, the DIVR drops the call and hangs up the connection 418. If the author's identity is authenticated 406, the DIVR opens a connection to the DTRouter and sends recording meta data 408. The DIVR begins reading audio data received from the telephone network via a telephony interface board 410 and streams the received data to the DTRouter as it arrives 412. The author signals the system to stop recording through the use of pre-defined

DTMF tones 414, which completes the audio recording. Once complete, the DIVR updates the system database with data regarding the recording and its disposition 416. The system disconnects the call 418, while simultaneously initiating a workflow 420 that is executed according to a set of user defined rules and parameters 422. Returning to step 412, as data is received by the DIVR it is channeled to the

DTRouter. The DTRouter receives meta and audio data 424 from either the DIVR or the scheduler and loader system. As the data is received, the DTRouter opens sockets to the appropriate encoders 426, as defined by the audio meta data. The data is passed off over the open connections to the encoders 428, which carry out the encoding of audio data according to a variety of codec algorithms 430. Encoded audio is transmitted to either a streaming media server for live broadcast 432 or an archival storage device for on-demand broadcast at a later date 434. The media server then makes the encoded data available on the network 436. The end result of the process is the availability of audio data created by an author through the use of a telephone 438, broadcast either live or on-demand. Turning to Fig. 5, an overview of the operations performed by the DIVR in answering and processing an incoming call is presented. An audio author (not pictured) places a call using typical Customer Premises Equipment (CPE), e.g., a telephone, connected to a telephone network 502. The DIVR, also connected to the telephone network by way of an interface device, is waiting to accept calls 504. When a call is received, the DIVR acquires the dialed number 506 and attempts to validate it 510. If the number is invalid, the call is terminated and the system waits for a new call 504. If the number is valid, the author is prompted to enter a company identification code 512. The author supplies the id data by pressing the telephone keypad to generate DTMF tones 514 that upon receipt are converted into the numbers represented by the tones. The DIVR uses the input data to formulate a query 516 that is executed in the system 5 database 528. A determination is made whether the author has provided a valid company code 518. If the author has entered an invalid company code, program flow returns to step 512, where the author has another opportunity to provide a valid code. If the author fails to enter a valid company id after three attempts, the system plays a "sorry" message indicating that the author has failed to provide a valid company code 520 and terminate the call,

10 returning to step 504 to await the receipt of another call.

If the author has supplied a valid company id, the system prompts the user to provide a user id 522, which is input via DTMF tones 524. The supplied id code is checked against the system database 526 to determine if the author is a valid user 530. As before, the user is allowed three attempts to enter a valid user id. If a valid id is not provided, the system

15 plays a message indicating that the author has failed to provide a valid user code 520 and terminates the call, returning to step 504 to await the receipt of another call.

If a valid user id is supplied, the system checks the database to determine if the user has exceeded the maximum number of calls allowed 532. If the maximum number of calls has been exceeded, the system plays a "sorry" message indicating this fact 520 and

20 terminates the call, returning to step 504 to await the receipt of another call. If the maximum number of calls has not been exceeded, the system updates the author's record in the system database to reflect the new total number of calls placed 534. The system also determines whether the user is going to be conducting a live distribution of the audio message 536 and issue the appropriate prompts 538 and 540. General recording instructions are then played

25 542 and the system waits for the DTMF tone 544 indicating that recording should begin.

When the author provides the DTMF tone 544 indicating that recording is to being, the DIVR establishes a connection with the DTRouter 546. The DIVR also queries the system database to build an XML profile containing all the parameters needed to encode, store, and stream the telephonic audio data 548. Once the database returns the author's

30 preferences and the XML profile is created, it is sent to the DTRouter.

The system checks to determine if audio data is being generated, e.g., determine whether the author has begun talking 550. If the author is speaking, the DIVR records a packet of PCM data 574 received from the telephone network 552. The system is also listening for DTMF tones 574 to indicate the end of the message. A copy of the raw PCM data is stored in a local cache 554 so the author may review the message without having to retrieve it from a storage device across the network. The data packet is sent off to the 5 DTRouter 556 and control returns to step 550 where the system once again checks for the next packet of audio data.

When the system fails to receive audio data 550, recording ends 558. The system determines if the author has paused the message 560. If the author has indeed paused the message, the system waits 562 for a DTMF tone indicating that recording is to resume

10 564 or a timeout 566 that forces the resumption of the recording process at step 550. If recording has not been paused, the system determines if recording is complete 568. If recording has ended incomplete, e.g., the "end recording" DTMF tone was not received in step 552, a hang-up or line drop has most likely occurred 572 and the system returns to its wait state 504 until another call is received. If the end recording tone has been received, the

15 system plays prompts for the author to choose the recording disposition 570.

Fig. 6 presents the process of determining the disposition of a completed recording. The system presents several disposition options for the author to select from 602. The author makes a selection by generating a DTMF tone 604 that corresponds to the desired choice. If the author has chosen to replay the recording 606, the system plays back the PCM 0 data previously stored in its local cache and then replays the disposition options 608. If the author chooses to discard the recording and re-record the message 610, the local cache is discarded and the message re-recorded 612, passing control back to step 542. If the author decides to save the recording 614, the data indicating the number of recordings made is incremented in the database 616, the names and locations of the recording is stored in the 5 database 618, and the author's workflow is executed 620. The system hangs up the connection 622 and returns control to step 504. Finally, where the author indicates a desire to simply disregard the recording and exit 624, the local cache is discarded and the system hangs up the connection 626, returning control to step 504.

The XML profile is generated when the DIVR queries the system database and 0 retrieves a set of pre-defined audio author parameters at step 548 in Fig. 5. This data is responsible for driving nearly all of the system's decisions regarding the recording and storage of the author's audio data. Fig. 7 presents an exemplary XML profile, as contemplated by one embodiment of the present invention. All data is contained within a set of <ENCODELIST> tags 702. Immediately within the opening <ENCODELIST> tag are tags to identify the company and user id codes for the author of the audio data 704. There are also tags to set standby and priority values, discussed hereinafter, which are used when scheduling a live broadcast 704. The <SOURCE> block 706 describes the format of the audio data being received by the system. The source format, or what the raw audio data looks like, is described in terms of its coding, sampling rate, and size (how many bits). While most telephone systems broadcast data using the compressed μLaw and aLaw formats, the system is configured to accept this data and uncompress it, resulting in Linear PCM data. The remainder of the XML profile is composed of a plurality of <ENCODE> blocks 708. The exemplary XML profile presented in Fig. 7 is comprised of three encode blocks, one for Sun Audio, one for Real Audio, and one for Windows Media, respectively. Each block begins with name of the media that the encoder is going to generate, followed by the name for the file that is to be used to perform a live broadcast. There is also a <CODECLIST> block 710 that contains the name of the codec or codecs thεt are to be used for the compression encoding. In the case of RealAudio, a single stream may be encoded at multiple rates, using multiple codecs. The streaming server and client then negotiate the fastest rate (e.g., highest quality) that can successfully be transmitted over the user's connection. The address of the streaming server to use to perform a live broadcast, as well as the storage device to store an archive copy of the audio data are also provided. The flexibility provided by the use of XML allows the audio data generated by particular authors to be routed to specific servers, possibly ones that are closer to them geographically, located on network segments carrying less traffic, or capable of better performance, based on the identity of the author. Turning to Fig. 8, the process of the DTRouter routing received audio data to an encoder or encoders is presented. Once the server is initialized 802, it sits in a wait state 804 until a connection is received. When a connection from the DIVR is received 806, the XML profile describing the audio data and encoding parameters is received first. The DTRouter parses the XML profile 807 and determines whether it adheres to the specifications of the schema 808. If the XML profile contains malformed data, the session is killed and the DTRouter returns to step 804 and waits for another connection. If the XML profile is valid, the data is loaded into a XMLInfo object 810, which is a data structure used to pass the data to the encoders. A copy of the XMLInfo object is created for each encoder specified in the XML profile. A check is performed to determine if the received XML profile matches that of an existing job that is currently being processed by any of the threads running within the DTRouter processing a plurality of audio calls 812. If the XML profile doesn't match that of an existing job, control is passed to a newly created consumer thread 814 that is responsible for accepting the incoming audio data. The DTRouter also creates a thread or threads to send received audio data to the encoder or encoders specified by the author's XML profile 816. After the appropriate threads have been created and control passed to them, the main thread returns to step 804 and waits for the next connection.

The consumer thread attempts to read data from the open TCP/IP socket, starting with the XML profile 818. A check is made to see if data is available 820, e.g., that the DIVR is sending PCM data that is being captured from the audio author. If data is present, it is read from the socket 822. The data is read and placed in a buffer 824, and a signal is sent to the encoder threads that data is available 826. Control passes back to step 820 to see if another data block is available. If no data is available, the recording has ended and the consumer thread is killed 838.

Operating in parallel are a group of encoder threads, one thread for each encoder that has been specified by the XML profile to encode the audio data. The encoder thread opens a TCP/IP socket to an encoder 828. Once the connection has been opened, it checks to see if the consumer thread has generated the data available signal 830. If data is available, it is read from the buffer 832 and written out to the encoder via the socket 834. Once the data has been written, the encoder thread issues a signal to the consumer thread that the data buffer is available 836. Control returns to step 830 where a check is made to see if the next data block is available. If no data is available, the recording has ended and the encoder thread is killed 838.

Each executing consumer thread is identified by the XML profile that it is processing, which can be thought of as a unique fingerprint. Returning to step 812, the current XML profile is compared with the XML profiles being processed by the plurality of consumer threads running within the DTRouter. If the system determines that an equivalent XML profile (e.g., containing identical corporate and user id codes) is being processed by a currently running thread, a check is made 840 to see if the newly received XML profile has a higher value identified by its <PRIORITY> tag 704. If the received XML profile does not have a higher priority, the data is discarded and the DTRouter returns to step 804 and waits for a new connection. If the XML profile has a higher priority, the preexisting consumer thread is suspended 842 and a new consumer thread is created 844.

Because the existing consumer thread was working in cooperation with an encoder thread, the newly created consumer thread 844 continues passing data to it. A check is made 846 to determine if the newly received XML profile specifies a different archive location than the XML profile being processed by the pre-existing thread that was previously suspended at step 842. If a new archive location is specified, an additional encoder thread is created to accept data from the consumer thread to be encoded to the specified archive 848. If a new archive is not specified 846, the newly created consumer thread continues using only the existing encoder thread or threads to encode the incoming audio. As explained later, this routine is used in the scheduling and loading of live broadcast audio. Fig. 9 presents the process executed by a server configured to encode data in either the QuickTime, Real, or Windows Media formats. When the server process is first started, it initializes a listening socket 902, which waits for an incoming connection from the DTRouter 904. When a connection is initiated, the data is read from the socket 906 by a connection specific thread that is spawned by the main thread 908. The main thread then returns to step 904 to wait for the next incoming connection.

The connection specific thread receives the XML profile that is appended to the audio data as a header 910. The XML profile is parsed 912 and the encoder determines whether the data contained therein is comprised of valid parameters to encode the incoming audio data 914. Where the XML data contained in the profile is invalid, the thread is killed and the connection maintained by it closed 916. If the data is valid, the encoder is initialized 918 and the audio data is read from the socket 920. As long as there is audio data being transmitted from the DTRouter 922, it is read from the socket and encoded by the codec 924. If the encoder compressing the audio data is the QuickTime encoder, a hint track is added to the encoded movie file 926 once all encoding is complete. After the encoding is complete and the encoder is shut down 928, the encoded audio data is stored on a data storage device or streaming media server according to the XML profile 930. The connection specific thread dies and its connection with the DTRouter is closed 916. Fig. 10 presents the process of initializing the QuickTime encoder to compress incoming audio data. The codec is first set 1002. The audio track is then prepared to write the incoming data to 1004. The codec is then set to a state where it is ready to accept data to encode 1006. Initialization is then complete and the subroutine ends 1008, returning control to the main routine.

Fig. 11 presents the process of initializing the RealAudio encoder to compress incoming audio data. The encoder first determines whether the XML profile is instructing the encoder to encode the audio data with more than one codec 1102. If a single codec is indicated, it is initialized 1106. Where multiple codecs are indicated, SureStream is initialized for each codec 1104. SureStream is a proprietary technology developed by Real Networks that allows the incoming audio data to be encoded with multiple codecs and stored within a single file. When a client requests a connection, the SureStream enabled server and client negotiate to deliver the stream with the best encoding that can be supported by the link maintained by the client. The process continues with a determination as to whether the audio is going to be broadcast live 1108, once again determined by the XML profile received by the encoder. If a live broadcast is indicated, the server executing the streaming media server software is initialized to receive and broadcast audio data directly as it is encoded 1110. The encoder then determines whether the audio is to be archived to a storage device 1112. If the audio is to be archived, the output file location is initialized 1114, the encoding engine is set such that it is prepared to begin encoding 1116, and initialization is complete 1118. If archival is not indicated, the codec is set such that it is prepared to begin encoding 1116 and initialization is complete 1118.

Fig. 12 presents the process of initializing the Windows Media encoder to compress incoming audio data. The codec is initialized at step 1202. The encoder then determines whether the audio is going to be broadcast live 1204. If a live broadcast is indicated, the server executing the streaming media server software has an alias added to its directory of files that the server is serving 1206. The encoder then determines whether the audio is to be archived to a storage device 1208. If the audio is to be archived, the output file is created 1210, the codec is set such that it is prepared to begin encoding 1212, and initialization is complete 1214. If archival is not indicated, the codec is set such that it is prepared to begin encoding 1212 and initialization is complete 1214. Turning to Fig. 13, the process of encoding data in the standard, uncompressed .wav format is presented. When the server process storing the audio data as a .wav file is first started, a socket is initialized 1302 that waits to accept connections from the DTRouter 1304. When a connection is initiated, the data is accepted across the socket 1306 by a connection specific thread that is spawned by the main thread 1308. The main thread returns to step 1304 to wait for the next incoming connection.

The connection specific thread receives the XML profile that is appended to the audio data as a header 1310. The XML profile is parsed 1312 and the encoder determines whether the data contained therein contains valid parameters to encode the incoming audio data 1314. Where XML profile is invalid, the thread is killed and the connection maintained by it closed 1316. If the data is valid, the encoder prepares the .wav header, but does not supply a value for the length of the data contained in the file 1318. The encoder reads data from the open socket 1320. As long as there is audio data available 1322, it is written to the .wav data file 1324. Once all the data has been received, the encoder writes the data length to the .wav header and closes the file 1326. After the encoding is complete, the encoded audio data is stored on a data storage device or streaming media server according to the XML profile 1328. The connection specific thread dies and its connection with the DTRouter is closed 1316.

The present invention provides the ability to schedule a live audio telephone broadcast across a computer network. The process performed by the Scheduler and Loader system (S/L system), presented in Fig. 14, is responsible for broadcasting audio "hold" data in the minute or minutes immediately preceding a live broadcast. The system database stores information regarding scheduled events, particularly the time that the hold audio data is scheduled to begin and when it is to end. For example, if a user is scheduled to begin a live broadcast at 2:00 PM, the system begins to broadcast hold audio until the author begins his or her broadcast. There will be instances, however, where the author is unable to make the scheduled broadcast. If the author fails to begin the live transmission after a particular number of minutes have elapsed, data associated with the event instructs the system to terminate the transmission of the hold audio. The process is a continuous loop that polls the system to determine if a live broadcast is scheduled to begin that needs to play a hold audio broadcast until the live author is ready to begin the broadcast. The loop begins with the S/L system querying the system database to determine the events that are scheduled to begin within the next minute 1402. If a user has a live broadcast scheduled to begin during the current minute 1404, the S/L system generates a copy of the user's XML profile 1406 and transmits it to the DTRouter 1408.

The system database stores the audio data associated with the scheduled event, e.g., a generic "waiting to begin transmission" message or a message personalized to the particular user. After the XML profile is sent 1408, the S/L system accesses either a local or networked files system 1412 to open the audio file associated with this scheduled event 1410. Once opened, the audio data is sent to the DTRouter 1414. The system then queries the database to determine if another event is scheduled to being within the current minute 1416. If there is another event scheduled to begin within the minute 1418, the process of broadcasting hold music is re-executed from step 1406.

If no events are scheduled to begin 1418 and 1404, the system once again queries the database to retrieve data for events that are scheduled to end within the current minute 1422. Where no event is ending within the current minute 1424, the system waits one minute 1420 and begins the process over 1402. If the database returns information indicating that the hold audio is to end within the minute 1424, the system generates an XML profile for the event with a negative priority level, which indicates that the broadcast is to be terminated. The XML profile is then transmitted to the DTRouter 1428. A query is generated to retrieve data for additional events that are scheduled to end within the current minute 1430. If the database returns information indicating that the hold audio is to end within the minute 1432, the process of terminating the hold audio broadcast is re-executed from step 1426. Where no event is ending within the current minute 1432, the system waits one minute 1420 and begins the process over 1402.

As previously explained the DIVR initiates workflow software that executes a series of pre-defined actions or events that are tailored to each individual user. The workflow consists of a series of scripts, written in any of a variety of scripting languages such as PERL or PYTHON, a series of compiled executable programs, written in a language such as C/C++, or a combination of the two. The workflow is executed in response to a completed call by an audio author (see Fig. 6, step 620). Fig. 15 presents a generic workflow, according to one embodiment of the present invention. The system executing the workflow passes a series of parameters to the workflow software 1504. The workflow software receives the parameters and uses them to execute a query in the system database 1506 and retrieve the events associated with the workflow 1502. The execution state of the workflow is also set 1508, which prevents multiple copies of the same workflow to be simultaneously executed. The workflow software also retrieves additional meta data 1510 and user data 1512 regarding the recording that is needed for the workflow to execute properly. The software executes the events comprising the workflow 1514 and 1516 until all events have been executed. Upon completion of all events, the workflow terminates 1520.

Fig. 16 presents a specific embodiment of a workflow, according to one embodiment of the present invention. This exemplary workflow is executed when an author, who must have an administrator approve his or her recording, records a call through the system. The workflow is executed after a recording is made 1602. An email message is sent to an administrator explaining that a recording has been made with a hyperlink to a web page 1604. The workflow generates the message by retrieving data from the system database.

The administrator navigates to the administrative web page 1605 by selecting the hyperlink contained in the email message 1604. The administrative web page allows the administrator to set properties for the recorded audio data, such as titling it and setting its "approved" status. The administrator titles the recording 1606, which is checked for validity 1608. If the title is invalid, the administrator is returned to the administrative page presented in step 1605 and is provided with an opportunity to retitle the recording. If the title is valid, the recording's record is updated in the system database 1610. The approver then sets the recording's "approved" attribute 1614. If the recording is not approved 1616, the approver is presented with a screen summarizing the changes made to the recording's record in the database 1618 and the workflow terminates 1620.

Where the administrator approves the recording 1616, the database is updated to reflect the change in approval status 1622. Three actions are then executed simultaneously. A copy of the recording is sent as an email attachment to a special archivist 1626. The administrator approving the recording sees a screen summarizing the changes made to the recording's record in the database 1618. A publisher, explained hereinafter, is also executed 1624. The results of the publisher are transmitted via FTP to a specific web server to be included as part of a web page 1628. After all three concurrently executing steps complete, the workflow terminates 1620. The publisher is also a software subsystem initiated by the DIVR to generate textual data indicating the location of the audio data on either a networked storage device, the address of a streaming sever performing a live broadcast, or both. Like the workflow software, the publisher consists of a series of scripts written in any of a variety of scripting languages such as PERL or PYTHON, a series of compiled executable programs, written in a language such as C/C++, or a combination of the two. Through the application of a workflow, the data can be emailed to a user or passed to a web server and embedded in a web page.

Fig. 17 presents the steps executed by a generic publisher, according to one embodiment of the present invention. The system executing the publisher passes it a series of parameters 1702. These parameters are set according to the publisher's internal data structures 1704. Where one of the parameters passed to the publisher is a query id 1706, additional parameters are retrieved from the system database 1708. Whether or not additional parameters are retrieved, the software constructs a SQL query from the data that it has received 1712, which is executed in the system database 1714. The results of the SQL query are formatted according to the received parameters 1716, which defines a different formatting depending on how the results are going to be used. The formatted results are then returned to the calling system 1718.

Administrators and users with advanced access permissions may view and modify data contained within the system database. An administrator may have access privileges limited to one company in the system, allowing, for example, the addition or editing of users, workflows, publishers and scheduled events within that company. An administrator with greater access privileges, e.g., a "superuser", can add and edit users, workflows, publishers and scheduled events for any company. Additionally, a superuser can add and edit servers, server clusters and companies. Additional functionality is supplied by this web based database front-end allows administrators to browse publishers, scheduled events, severs, and server clusters. Figure 18 presents the generic programmatic flow involved in adding, editing, and browsing.

When adding data, an administrator first selects the appropriate form or template that is needed to add the desired data 1802. For example, the "add user" form would be selected if the administrator wanted to add a new user to the system. When the data fields on the form are completed 1804, the data supplied is checked for conformity 1806. If the data does not conform to predefined patterns 1808, e.g., supplying at least one encoder address, an error message is display 1810. The user is provided with an opportunity to submit a corrected form 1812, which is again checked for accuracy 1806. When the supplied data is correct 1808, the record is added to the database 1814. When editing data, the system retrieves a list of "items" from the database

1816. Exemplary items would be users or companies or servers. The administrator then selects an item from the list of returned items 1818. The user then selects an element from the selected item 1819. For example, if an administrator wanted to edit a user, he or she would first select "users" from the list of items returned from the database. The administrator would then select a particular user from the list of all users. The system retrieves information regarding the selected element from the database 1820 and presents it to the administrator in an editable form 1822. Edits are made to the data through the form 1824, which are then checked to ensure that the new data conforms to pre-defined patterns 1826. If the data doesn't conform to pre-defined patterns 1828, an error message is displayed 1830 and the administrator is provided an opportunity to correct the mistakes 1832. The corrected data is then checked again for conformity 1826. When the data is formatted correctly 1828, the data from the form is written back to the database, thereby updating the desired records 1834.

When browsing, a list of items is retrieved from the database 1836. The administrator selects an item 1838 and an element grouped under the item 1839. Information regarding the selected element is retrieved from the database 1840 and the action executed 1842. The results generated are displayed to the administrator 1844, who can choose to modify the element via the edit interface.

While the invention has been described and illustrated in connection with preferred embodiments, many variations and modifications as will be evident to those skilled in this art may be made without departing from the spirit and scope of the invention, and the invention is thus not to be limited to the precise details of methodology or construction set forth above as such variations and modification are intended to be included within the scope of the invention.

Claims

WHAT IS CLAIMED IS:

1. A method for streaming telephone audio data over a computer network, the method comprising: receiving the audio data via a telephone call from an author of the audio data; directly encoding the audio data into a destination file format; and storing the encoded audio data on a file system accessible by a computer executing audio streaming software.

2. The method of claim 1, the method comprising routing the audio data to one of a plurality of encoders.

3. The method of claim 2, wherein the step of routing the audio data to one of a plurality of encoders is conducted according to a set of preferences associated with the author of the audio data.

4. The method of claim 1 , the method comprising: assigning a location for the encoded audio data; and transmitting the location of the audio data to an administrator for approval before storing the audio data on the file system.

5. The method of claim 1, the method comprising making the location of the encoded audio data available to streaming audio clients in accordance with a set of rules associated with the author of the audio data.

6. The method of claim 1 , wherein the step of encoding is performed by a

RealAudio™ codec.

7. The method of claim 1, wherein the step of encoding is performed by a QuickTime™ codec.

8. The method of claim 1 wherein the step of encoding is performed by a Windows Media™ codec.

9. The method of claim 1 wherein the step of encoding is conducted according to the .wav format.

10. A system for streaming telephone audio data over a computer network, the system comprising: a first computer containing software to receive audio data transmitted over a telephone network; a second computer containing encoding software to directly encode the received audio data into a destination file format; a data storage device to store the encoded audio data; and a third computer containing software to stream the stored encoded audio data.

11. The system of claim 10 wherein the second computer comprises a plurality of computers containing a plurality of different encoding software; and a fourth computer executing software to route audio data to a selected one of the second computers.

12. The system of claim 11, the system comprising a computer containing software to route audio data to one or more of a plurality of computers containing different encoding software according to data associated with the author.

13. The system of claim 11 , the system comprising a computer containing software to route audio data to one ore more of a plurality of computers containing different encoding software according to author preferences collected in real-time.

14. The system of claim 10, wherein the encoding software comprises the

RealAudio™ codec.

15. The system of claim 10, wherein the encoding software comprises the QuickTime™ codec.

16. The system of claim 10 wherein the encoding software comprises the Windows Media™ codec.

17. The system of claim 10 wherein the encoding software encodes according to the .wav format.

18. A system for streaming telephone audio data over a computer network, the system comprising: A computer comprising: first software to receive audio data transmitted over a telephone network; second software to directly encode the received audio data into a destination file format; third software to stream the encoded audio data; and a data storage device to store the encoded audio data; and the computer being electrically coupled to a network accessible by a client computer executing client software to receive the streaming audio data.

19. A method for streaming telephone audio data over a computer network, the method comprising: receiving a telephone call by a computer, the telephone call placed by the author of the audio data; retrieving personal profile data based on the author; directly encoding the audio data into a destination file format based on the data contained in the author's personal profile; and storing the encoded audio data on a file system accessible by a computer executing audio streaming software.

20. The method of claim 19, the method comprising: routing the audio data to a plurality of encoders based on the data contained in the user's preference profile; and encoding the audio data by a plurality of encoders.

21. The method of claim 20 wherein the step of encoding the audio data takes place at substantially the same time.

22. The method of claim 19 wherein the personal profile is data is formatted according to an XML schema.

23. The method of claim 19 wherein the step of encoding the audio data comprises setting encoding characteristics according to data contained in the author's personal profile.

24. The method of claim 19 wherein the step of storing the encoded audio data comprises storing the encoded audio data on a data storage device specified by the author's personal profile.

25. The method of claim 19 comprising executing a workflow based on the author's personal profile.