WO2001058165A2

WO2001058165A2 - System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription

Info

Publication number: WO2001058165A2
Application number: PCT/US2001/003499
Authority: WO
Inventors: Philip S. Angell; Mohammad A. Haque; Jason D. Levine; Frank Gary Bivings, Jr.; Matthew B. Benson
Original assignee: Fair Disclosure Financial Network, Inc.
Priority date: 2000-02-03
Filing date: 2001-02-02
Publication date: 2001-08-09
Also published as: AU2001233269A1; WO2001058165A3

Abstract

a system and corresponding method deliver live media content to two or more client computers over a computer network. An encoded media signal representing a live event is received, where the live event includes at least a language portion. The encoded media signal is encoded for transmission over the computer network. The language portion is converted into a character signal. The character signal and encoded media signal are then synchronized. The synchronized character signal and encoded media signal are then provided to client computers over the computer network in near real time. Recorded media having an aural component may likewise be employed. Users may search and retrieve archived content and associated transcribed text or character string signals.

Description

SYSTEM AND METHOD FOR INTEGRATED DELIVERY OF MEDIA AND ASSOCIATED CHARACTERS, SUCH AS AUDIO AND SYNCHRONIZED

TEXT TRANSCRIPTION

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Patent

Application No. 60/180,143, filed February 3, 2000, and U.S. Patent Application No. 09/498,233, filed February 3, 2000, both currently pending.

TECHNICAL FIELD

The invention relates to the field of communications and, more particularly, to providing audio or other media having an aural component with associated textual streams.

BACKGROUND

The robust growth in demand for both media content and delivery channels has increased the need for novel types of information, news, financials and other services. The Internet and other network technologies have enabled a variety of multipoint media streams, such as news web sites containing streamable video clips, audio clips and other media combinations. One frequent type of news source or media event is a collective meeting or proceeding, in which one or a few speakers discuss information of interest to a wide audience. Those types of settings include sessions of Congress, presidential and other news conferences, corporate analysts' meetings, media conferences and other group events.

Timely delivery of information content may be particularly valuable, such as with sessions of Congress and other governmental bodies. Many interested parties could benefit from prompt knowledge of pending provisions in legislation, rulings in court cases and other deliberations. For instance, individuals or organizations that would be affected by the enactment of pending legislation may want to furnish input to their representatives or constituents may want to take other actions to contribute or adjust to new statutory, regulatory or other programs. The federal government deploys a host of communications facilities situated at a variety of sources, often issuing permits for access to those resources. For instance, Congress permits press access to its chambers and hearing rooms, from which live video and audio feeds are generated for delivery to commercial networks and to news and other organizations.

However, in the case of legislative reporting, there is a particular demand for written records of the legislature's activities. Public and private organizations exist that take down and transcribe the activities of both chambers. Those congressional transcripts are in some cases made available for a subscription fee in hard copy or electronic format within 48 hours from the time of the legislative sessions. This is in contrast to audio or visual feeds for network TV or other delivery, which are often contemporaneously broadcast. The media, the public, interest groups as well as the government bodies themselves would benefit from more timely and robust delivery of both live media and concurrent textual streams of the dialogue.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 illustrates an overall network architecture or system for delivering media and text under one embodiment of the invention.

Figure 2 illustrates an example of a subscriber interface used to view the output produced by the system of Figure 1.

Figures 3 and 4 together illustrate a flow chart of media and textual processing under the system of Figure 1.

Figure 5A is a block diagram illustrating two alternative methods of implementing aspects of the invention described, for example, with respect to Figure 1.

Figure 5B is a flow diagram illustrating a routine for providing simultaneous transmission of audio and synchronized text to a subscriber over a network, using software tools provided by RealNetworks of Seattle, Washington. Figure 5C is a block diagram illustrating a data flow and processing diagram for delivering audio voice and associated text to a subscriber under a Microsoft Windows Media environment.

Figure 5D is a block diagram illustrating a system front end under the embodiment of Figure 5C.

Figure 5E is a flow diagram illustrating a routine for delivering voice and associated text to a subscriber under the system of Figure 5C.

Figure 6 is a schematic diagram illustrating production and development environments for implementing aspects of the invention. Figure 7 is a process flow diagram illustrating the flow of data and functions performed by aspects of the invention.

Figure 8 is a data flow diagram illustrating flow of data through the system of Figure 1.

Figure 9 is a schematic diagram illustrating timing calculations performed by the system of Figure 1.

Figure 10 is an example of a home web page for use by the system of Figure 1.

Figure 11 is an example of a login dialogue box for use by the system in Figure 1. Figure 12 is an example of a customized web page for an individual subscriber.

Figure 13 is an example of a hearings calendar web page.

Figure 14A is an example of the web page of Figure 13 showing Senate hearings for Thursday, May 25, 2000. Figure 14B is an example of a user settings web page.

Figure 15 is a web page showing a selection of a breakdown of Energy and Natural Resources hearings from the web page of Figure 14.

Figure 16 is an example of a hearing subscription web page.

Figure 17 shows the web page of Figure 16, whereby a subscriber has selected live streaming receipt of a hearing.

Figure 18 is an example of the web page of Figure 15 with a keywords selection box. Figure 19 is an example of a web page showing a subscriber's input of keywords after selecting the keywords box of Figure 18.

Figure 20 is an example of an email message provided to the subscriber based on the selected keywords. Figure 21 is an example of a web page showing live receipt of audio and associated transcribed text from a hearing.

Figure 22 is an example of a web page listing background resources for an associated Senate hearing.

Figure 23 is an example of a web page listing committee members. Figure 24 is an example of a web page listing a press release.

Figure 25 is an example of a web page for perπύtting a subscriber to search for keywords within previously transcribed hearings.

Figure 26 is an example of a web page provided as a result of the query entered in the web page of Figure 25. Figure 27 is an example of a web page providing summaries of the context in which the query term is located in the transcribed hearing.

Figure 28 is an example of a web page providing the full hearing transcript with the query term highlighted.

Figure 29 is an example of a web page providing the hearing text with associated audio.

Figure 30 is an example of a web page showing subscriber selection of a block of text.

Figure 31 is an example of an email message generated by the subscriber that includes the block of text selected in Figure 30. Figure 32 is a data model relationship diagram that may be employed with, for example, an SQL database.

Figure 33 is a data model relationship diagram representing an alternative embodiment that may be employed by, for example, an Oracle database. Figure 34 is a flow diagram illustrating a suitable routine for encoding content for query by a user. Figure 35 is a flow diagram illustrating a suitable routine for generating recorded content and character files, such as for storage on a CD-ROM.

In the drawings, identical reference numbers identify identical or substantially similar elements or blocks. To easily identify the discussion of any particular element or block, the most significant digit or digits in a reference number refer to the Figure number in which that element is first introduced (e.g., element 1104 is first introduced and discussed with respect to Figure 11).

A portion of this disclosure contains material to which a claim for copyright is made. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure (including

Figures) as it appears in the Patent and Trademark Office patent files or records, but reserves all other copyrights whatsoever.

The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.

DETAILED DESCRIPTION OF THE DEPICTED EMBODIMENTS

The following description provides specific details for a thorough understanding of, and enabling description for, embodiments of the invention. However, one skilled in the art will understand that the invention may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the invention.

Overview

The invention relates to a system and method for the integrated delivery of media and associated characters such as synchronized text transcription, in which a computing system or dedicated network may collect, process and deliver unified audio, video and textual content on a live basis to subscribers. In one embodiment, front-end audio or video servers receive and encode the audible or video activities, for example, of a legislature, press conference, musical/multimedia composition, corporate analyst meeting, town meeting or other event. The raw, digitized media feeds from the event are transmitted to a centralized distribution server, which, in turn, delivers the digitized stream of the event to a transcription facility, where automated and/or human transcription facilities decode the language component. After speech recognition and editing take place, the textual content is synchronized with the original audio, video or other media and delivered to subscribers, for instance via a Web site interface. Subscribers may configure the delivery modes according to their preference, for instance, to silently parse the textual stream for keywords, triggering full-screen, audible, wireless or other delivery of the audio or video content when a topic of interest is discussed. Subscribers may alternatively choose to view and hear the media and textual output continuously, and they may access archives for the purpose of reproducing text for research or editorial activities. Furthermore, the system stores or archives all encoded audio/video content and associated transcriptions (when such transcriptions are created). Thus, subscribers may retrieve the archived content and transcriptions after the live event. Subscribers may perform queries of archived content and transcriptions to retrieve portions of content files and associated transcriptions that match the query, so that a subscriber may view portions of transcriptions and listen to associated audio/video content related to a query. First described below is a suitable computing platform for implementing aspects of the invention. Second, audio and text processing performed by the system are described. Third, details regarding the system are provided. Fourth, a suitable user interface is described. Thereafter, a suitable data model is described. Finally, enhancements and some alternative embodiments are presented.

Suitable Computing Platform

A suitable computing environment or platform for implementing aspects of the invention will now be described with respect to Figures 1 and 2.

Figures 1 and 2 and the following discussion provide a brief, general description of a suitable computing environment in which aspects of the invention can be implemented. Although not required, embodiments of the invention will be described in the general context of computer-executable instructions, such as routines executed by a general purpose computer, e.g., a server or personal computer. Those skilled in the relevant art will appreciate that aspects of the invention can be practiced with other computer system configurations, including Internet appliances, hand-held devices, wearable computers, cellular or mobile phones, multiprocessor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, minicomputers, mainframe computers and the like. Aspects of the invention can be embodied in a special purpose computer or data processor that is specifically programmed, configured or constructed to perform one or more of the computer-executable instructions explained in detail below. Indeed, the term "computer," as used generally herein, refers to any of the above devices, as well as any data processor.

Aspects of the invention can also be practiced in distributed computing environments, where certain tasks or modules are performed by remote processing devices, that are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN) or the Internet. In a distributed computing environment, program modules or subroutines may be located in both local and remote memory storage devices. Aspects of the invention described herein may be stored or distributed on computer-readable media, including magnetic and optically readable and removable computer discs, as well as distributed electronically over the Internet or over other networks (including wireless networks). Those skilled in the relevant art will recognize that portions of the invention reside on a server computer, while corresponding portions reside on a client computer. Data structures and transmission of data particular to aspects of the invention are also encompassed within the scope of the invention.

In the embodiment of Figure 1, a congressional session or other event is intended to be recorded and delivered to subscribers with a corresponding textual stream. In the illustrative embodiment, an audio signal transducer, such as a microphone array 102, is installed in a congressional chamber, auditorium or other event site. The microphone array 102 is connected directly or indirectly to an audio server or encoder 104 located at the event site. Often, the encoder is connected to receive the audio signal by way of multi-patch terminals, pool feeds, web casts, and the like.

The audio encoder 104 may be or include a computer workstation having one or more high-resolution audio digitizer boards along with sufficient CPU, memory and other resources to capture raw sounds and other data for processing in digital form. In one embodiment, the audio encoder 104 may use as an encoding platform the commercially available RealProducer™ software, by RealNetworks to produce a digitized audio stream. The audio encoder 104 may include or be coupled to a multiport central hub that receives inputs from microphones in one or more hearing rooms. As an example, one audio encoder may be located in a Senate office building and receive 16 inputs associated with microphones located in various hearing rooms, while two similar servers may be located in House office buildings and be coupled to 24 inputs associated with microphones in House hearing rooms. The audio encoder may include a single interface for managing all encoding sessions or all input audio streams from the microphones. In one embodiment, the audio encoder is a quad Pentium III 550 MHz computer with two gigabytes of RAM, and running the Windows NT™ operating system. Unnecessary applications that consume processing overhead, such as screensavers and power management features are deleted. The audio encoder also includes an analog-to-digital converter, such as the Audi/o analog-to- digital converter by Sonorus of Newburg, New York, that can accommodate eight audio inputs simultaneously. Additionally, the audio encoder may include a sound card, such as the Studi/o sound card by Sonorus, that may accommodate 16 digital audio channels. Further details may be found at http://www.sonorus.com. In the embodiment illustrated in Figure 1, after capturing the audio and spoken words of the event, the resulting raw, digitized audio stream is transmitted over a communications link 106 to a remote distribution server 108 acting as a distribution and processing hub. (The terms "transmitted", "transmission" and similar terms include posting or storing information or an information signal or file to be retrieved by an external device.) The communications link 106 joining the audio encoder 104 and the distribution server 108 may be or include any one or more of, for instance, the Internet, an intranet, a LAN (Local Area Network), a WAN (Wide Area Network) or a MAN (Metropolitan Area Network), a frame relay connection, an Advanced Intelligent Network (AIN) connection, a synchronous optical network (SONET) connection, a digital TI, T3 or El line, Digital Data Service (DDS) connection, DSL (Digital Subscriber Line) connection, an Ethernet connection, an ATM (Asynchronous Transfer Mode) connection, a FDDI (Fiber Distributed Data Interface) connection or a CDDI (Copper Distributed Data Interface) connection. Also, the Communications link 106 may be or include any one or more of a WAP (Wireless Application Protocol) link, a GPRS (General Packet Radio Service) link, a GSM (Global System for Mobile Communication) link, or other wired or wireless, digital or analog interfaces or connections.

The distribution server 108 incorporates a database 110 for the mass storage of synchronized collections of audio, video and textual information related to individual media events collected by one or more audio servers 104 or other front-end sources. In one embodiment, such additional sources may include a portable text-scanning or OCR device, such as the Hewlett-Packard CapShare™ device, to capture and transmit textual information, such as press releases, schedules, transcripts or other data, from the event site along with other media using infrared or other connections to communications link 106. Any type of scanner may be employed, such as a flatbed scanner for scanning documents related to an event and converting them into an electronic format (such as ASCII.pdf, etc.), which can then be electronically forwarded to the distribution server. Such devices may also provide raster, bitmapped or vector-based images, in addition to, or in lieu of, text generated from scanning a document. In one embodiment, the distribution server 108 includes an Apache web server and RealG2 server by RealNetworks. The system may run the Linux 6.1 operating system by RedHat and the MySQL database. The distribution server 108 employs PERL, C/C++ and Personal Home Page (PHP) by Zend Technology. Of course, the distribution server 108 may be or include, for instance, a workstation ranning the Microsoft Windows NT , Unix, Linux, Xenix, Solaris™, OS/2™, BeOS™, Mach, Apache, OpenStep™ or other operating system or platform software. In the illustrative embodiment of Figure 1, the distribution server 108 directs the raw, digitized audio stream via a communications link 112, which may be or include similar connections as communications link 106, to a processing facility 140.

The processing facility 140 may be a separate facility or other internal, local or remote engine dedicated to transcribing the raw media input into character or other format, such as ASCII English or other text or other forms. The processing facility 140 may incorporate a voice recognition server 114 to receive the digitized audio or other media streams for processing and conversion. The voice recognition server 114 may in one embodiment include one or more speech recognition modules 146, such as the commercially available Dragon™ Professional or IBM ViaVoice™ products. In one embodiment, a separate interface to the Dragon speech recognition module may be developed using a Software Development Kit ("SDK"). The interface or overlay may be similar to existing court-reporting system interfaces. Thus, if a transcription agent is familiar with and performs court reporting functions, they may readily employ the system. The interface may further allow agents to create speaker macros to facilitate transcription processes, such as identifying speakers. For example, a transcription agent may say "one Mac", which will pull previously prepared text that was entered when preparing for the hearing that will identify Speaker 1 as "Senator Orin Hatch". This permits the transcription agent to avoid having to provide speakers' titles and possibly spell their names out each time they speak. The interface also automatically wraps and time stamps the generated text according to a format needed to encode and display it.

The speech recognition module 146 may be capable of speaker- independent operation. Different or specialized versions of the speech recognition module 146 may be employed within the voice recognition server 114 to enhance accuracy, upgrade functionality or provide special foreign language or other features, according to the transcription needs. The voice recognition server 114 may be attended by a human transcription agent or "voice writer" to monitor and operate the speech recognition module 146 and other components in order to ensure the smooth flow of first-stage conversion from voice to text. It may be advantageous to program the speech recognition module 146 with particular vocabulary words likely to be spoken at the event and with the speech profile of the human transcription agent before processing the media stream. The voice recognition server 114 may be a dual motherboard computer having approximately one-half a gigabyte of RAM, and where unnecessary utilities that consume processing overhead (such as screensavers, energy management functions, and the like) are removed to make the computer more stable. Peripheral inputs unnecessary to the voice writing function are eliminated.

In another embodiment, the audio server 104, speech recognition module 146 and other elements may cooperate to recognize and split different voices or other audible sources into separate channels, which, in turn, are individually processed to output distinct textual streams.

The voice recognition server 114 thus invokes one or more speech recognition modules 146 preferably with oversight or monitoring by a human transcription agent to resolve the digitized verbal content generated by the audio server 104 into a raw textual stream, for instance ASCII-coded characters. Output in other languages and formats, such as 16-bit Unicode output, is also possible. The role of the transcription agent may include the maintenance and operation of the speech recognition module 146, monitoring the raw textual stream and other service tasks. The human transcription agent may listen to the mcoming audio stream via headphones and substantially simultaneously repeat the received audio into a microphone. The transcription agent's voice is then digitized and input to the speech recognition module 146, where the speech recognition module has been trained to the particular transcription agent's voice. Thus, the speech recognition module 145 has improved accuracy because the mcoming audio stream is converted from the speech patterns and speech signatures of one or more speakers to the trained voice of the transcription agent. The transcription agent's role, however, is intended to be comparatively limited, and, generally, is not or not frequently to involve semantic judgments or substantive modifications to the raw textual stream. It may be noted that the role of or need for the transcription agent may be reduced or eliminated in implementations of the invention, depending on the sophistication and accuracy of the speech recognition module 146, as presently known or as developed in the future. Once the initial conversion from original media is done, the raw textual stream may be delivered over a local connection 118, such as an RS232 serial, FireWire™ or USB cable, to a scopist workstation 120, which may also be located within the processing facility 140 or elsewhere. The scopist workstation 120 may be a personal or server computer executing text editing software presented on a Graphical User Interface (GUI) 122 for review by a human editorial agent, whose role is intended to involve a closer parsing of the raw textual stream and comparison with the received audio stream.

The tasks of the editorial agent stationed at scopist workstation 120 include review of the raw textual stream produced by the voice recognition server 114 to correct mistakes in the output of the speech recognition module 146, to resolve subtleties, such as foreign language phrases, to make judgments about grammar and semantics, to add emphasis or other highlights and generally to increase the quality of the output provided by the system. The editorial agent at the scopist workstation 120 may be presented with the capability, for instance, on the agent GUI 122 to stop/play/rewind the sfreaming digitized audio or other media in conjunction with the text being converted to compare the audible event to the resulting text.

In one embodiment, data compression technology known in the art may be employed to fast-forward the media and textual stream for editing or other actions while still listening to audible output at a normal or close to normal pitch.

The editorial agent at the scopist workstation 120 may attempt to enhance textual accuracy to as close to 100% as possible. The system outputs the audio and text streams with as little lag time from event to reception as is possible to provide an experience akin to a "live" radio (or television broadcast) for the subscriber. However, some degree of delay, including that resulting from processing time in the servers, network lag and human response time of the transcriber, editorial agent or other attendants, may be inevitable. The total amount of delay from event to reception may vary according to the nature of the input, network conditions, amount of human involvement and other factors, but may generally be in the range of 15 minutes or less. The voice recognition server 114 may provide timestamps to the received audio or generated text to synchronize the audio with the text, as described herein. Alternatively, time stamps may be added to the audio by the distribution server before forwarding it to the processing facility, and to the text after it is received from the processing facility. Furthermore, the voice recognition server 114 and/or scopist workstation 120 scan the text for increased accuracy. The Dragon software allows a user to provide electronic files (Word, notepad, etc.) to help increase the accuracy of the voice recognition. When a spoken word is input to the software, the word is compared with an internal word list. Such input files help to create the internal word list. If the word is not found in the internal word list, the software will prompt the agent to train that word into the system. The next time that word is input, the system will recognize the word and text will be generated correctly. The processing facility 140 may provide automated task assignment functions to individual transcription and editorial agents. For example, the processing facility receives a notification from the distribution server of hearings to be transcribed for each day. The processing facility may automatically assign hearings to individual agents, or post all hearing transcription assignments to a central web page or file that all agents may access and choose to accept to transcribe/edit for that day. After all editorial corrections and enhancements are entered at the scopist workstation 120, the edited textual stream is delivered via a communications link 124, which may likewise be or include a similar link to the communications link 106, to a text encoder module 126 incorporated within the distribution server 108. The communications link 124 may also be or include, for instance, a telnet connection initiated over the Internet or other network links.

The text encoder module 126 receives the corrected textual stream and converts the stream into, in an illustrated embodiment, a RealText™ stream adhering to the commercially known Real encoding standard for further processing. The converted RealText™ stream may be transmitted back to the distribution server 108 via a connection 128, which may be, for instance, a lOObaseT connection to a processor 142. The finished, edited, corrected, converted RealText™ stream representing the audible, visible or other events being transcribed is then sent to the distribution server 108, synchronized and stored in database 110 with the raw digitization of the media from the event for delivery to subscribers.

The synchronization may be implemented, for instance, using the Wall Clock function of the commercially available Real software. The Wall Clock function allows multiple media streams to be synchronized using internal timestamps encoded into each stream. As the streams are received on the client or recipient side, they are buffered until all streams are at the same internal time in relation to each other. Once the streams are aligned in time using timestamps and other information, the player within the client workstation 136 may start playing the streams simultaneously. The distribution server 108 may store the finished composite stream or portions thereof in database 110 in RealText™ or a variety of other formats, for instance, in XML, HTML, ASCII, WAV, AIFF, MPEG, MP3, Windows™ Media or others. The distribution server 108 may include web serving functionality, such as the ability to deliver web pages requested by other client computers. For example, a distribution server may be or include a G2 RealServer produced by RealNetworks. Alternatively, a separate web server may be employed. The distribution server 108 may have the ability to broadcast, rebroadcast or hold one or more media streams simultaneously. The distribution server 108 may accept any type of media stream, including audio, video, multimedia or other data stream capable of being sensed by a human or processed by a computer. Additionally, the distribution server 108 may synchronize any combination of streams and types of media. The arrival of finished RealText or other stream into the database

110 may trigger a start code that releases the synchronized media and processed textual streams for delivery to subscribers to the service of the invention over a dissemination link 130. The dissemination link 130 may, again, be or include a similar link to communications link 106, such as a single or multiple digital TI or other communications channel.

The dissemination link 130 may furthermore be or include a Personal Area Network (PAN), a Family Area Network (FAN), a cable modem connection, an analog modem connection such as a V.90 or other protocol connection, an Integrated Service Digital Network (ISDN) or Digital Subscriber Line (DSL) connection, a BlueTooth wireless link, a WAP (Wireless Application Protocol) link, a Symbian™ link, a GPRS (General Packet Radio Service) link, a GSM (Global System for Mobile Communication) link, a CDMA (Code Division Multiple Access) or TDMA (Time Division Multiple Access) link such as a cellular phone channel, a GPS (Global Positioning System) link, a CDPD (Cellular Digital Packet Data), a RIM (Research in Motion, Limited) duplex paging type device, an IEEE 802.11-based radio frequency link, or other wired or wireless links.

The dissemination link 130 includes TCP/IP connections over the Internet 132 to one or more subscriber link 134, which in turn may be or include links similar to communications link 106, for delivery to one or more client workstations 136. In one embodiment, any one or more of the communications link 106, communications link 112, communications link 124, communications link 130, communications link 134 or other communications links may be or include self-healing or self-adjusting communication sockets that permit dynamic allocation of bandwidth and other resources according to local or global network conditions. The client workstation 136 may be or include, for instance, a personal computer running the Microsoft Windows 95, 98, 2000, Millenium , NT™, Windows CE™, Palm™ OS, Unix, Linux, Solaris™, OS/2 ™, BeOS ™, MacOS™ or other operating system or platform. The client workstation 136 may also be or include any microprocessor-based machine such as an Intel x86-based device or Motorola 68K or PowerPC device, microcontroller or other general or special purpose device operating under programmed control.

The client workstation 136 may also include electronic memory such as RAM (random access memory) or EPROM (electronically programmable read only memory), storage such as a hard drive, CD-ROM or rewritable CD-ROM or other magnetic, optical or other media, and other associated components connected over an electronic bus (not shown), as will be appreciated by persons skilled in the art. In the modern pervasive computing environment, the client workstation 136 may also be or include a network-enabled appliance such as a WebTV unit, radio-enabled Palm Pilot or similar palm-top computer, a set- top box, a game-playing console such as Sony Playstation or Sega Dreamcast , a browser-equipped cellular telephone, other TCP/IP client or other wireless appliance or communication device.

The combined, synchronized media and finished textual stream arriving over the subscriber links 134 from the database 110 may be viewed on a client GUI 144 in conjunction with a browser and/or an administrative module 138 running on the client workstation 136 pennitting authentication of subscribers, and access to and manipulation of the information content delivered by the invention. More particularly, a subscriber may use the client GUI 144 on the client workstation 136 to invoke or log on to a web site for his or her information subscription, and enter password and other information to view the synchronized output stream according to his or her delivery preference. Schedules of different types of media events, in searchable database or other form, may, in another embodiment, be presented on the client GUI 144 to assist in event selection, as described herein.

For instance, a subscriber may choose to view the entire information stream produced by the system, including audio, video and synchronized textual output on the client GUI 144 using speakers 148, headphones and other output devices for further review, as shown in Figure 2.

Conversely, a subscriber may enter commands using the administrative module 138 and client GUI 144 to have the information stream delivered silently or in a background process, with an alert function activated. The alert function may scan the mcoming textual stream at the point of the distribution server 108 or client workstation 136 for the presence of keywords chosen by the subscriber, upon the detection of which a full screen may pop up showing the surrounding text, video or other content. Alternatively, upon detection of a keyword, the alert function may deliver other information such as a message or notice via email, an Inbox message in Microsoft Outlook™, an online instant message, an IRC (Internet Relay Chat) message, a pager message, a telephone call or other electronic notification.

In another embodiment, the user may choose to receive the informational content in a silent mode while viewing the entire textual stream, but with the ability to highlight portions of the textual stream to hear the audio output associated with that portion. This, for instance, may be useful for a subscriber wishing to discern emphasis, inquiry, irony or other inflections or subtleties that may not be evident in textual form.

A subscriber operating the client workstation 136 may likewise choose to highlight, cut, paste, stream to hard or removable drive or otherwise store or archive one or more portions of the information content delivered for later processing, word processing, retransmission or other uses. In another regard, subscriber access via the subscriber links 134 may permit a web site or other entry portal to allow a subscriber to access prior audio events or content for archival or research purposes. Likewise, the subscriber may manipulate the administrative module 138 to schedule the delivery of the sfreaming service according to specified dates and times, events of interest and associated delivery modes, as well as other settings.

In this respect, the database 110 within distribution server 108 may be configured to be searchable according to discrete search terms, particular fields related to header descriptions of the event, or on other bases. In this regard, the database 110 may be configured with a decision support or data mining engine to facilitate the research functions. An example of subscriber choices for manipulating the client GUI 144 and associated administrative choices is illustrated in Figure 2.

Audio and Text Processing

General media and translation processing according to the above embodiment will now be described with reference to the flow charts of Figures 3 and 4. Unless described otherwise herein, the blocks depicted in Figure 3 and 4 and the other Figures are well known or described in detail in the above-noted and cross-referenced provisional patent application. Indeed, much of the detailed description provided herein is explicitly disclosed in the provisional patent application. Most of the additional material for aspects of the invention will be recognized by those skilled in the relevant art as being inherent in the detailed description provided in such provisional patent application, or well known to those skilled in the relevant art. Those skilled in the relevant art can implement aspects of the invention based on the flow charts of Figures 3 and 4 and the detailed description provided in the provisional patent application.

In block 300, processing begins. In block 302, audio or other input from an event is collected and delivered to the audio server 104. In block 304, the raw audio, video or other signals are digitized. In block 306, the digitized audio data is transmitted to the distribution server 108. In block 308, the digitized audio stream, in RealAudio™ format, or otherwise, is transmitted to the processing facility 140. In block 310, the speech recognition module 146 is invoked to output an ASCII text or other stream corresponding to the audio content. In block 312, the ASCII text stream is output to the scopist workstation 120. In block 314, the ASCII text stream is edited by an editorial agent at the scopist workstation 120 using the agent GUI 122. In block 316, the edited or corrected textual stream is transmitted to the text encoder module 126. In block 318, the corrected or edited ASCII text is converted to an advanced text format, such as RealText™, using software tools provided by RealNetworks.

In block 320, the reformatted textual stream is stored and synchronized with the audio or other media source within the database 110. The integrated media/textual information is then prepared for subscriber access. In block 322, one or more subscribers access the distribution server 108 and are validated for use. In block 324, a subscriber's delivery profile is checked to set delivery mode, such as full sfreaming content, background execution while searching for alert terms, or other formats or modes described herein.

In block 326, the integrated audio or other media along with the textual stream are delivered according to the subscriber's service profile, whether triggering an alert or other mode. In block 328, subscriber requests for archival linking to related sources or other nonsfreaming services may be processed as desired. In block 330, processing ends. System Details

Referring to Figure 5A, further details regarding two alternative methods of implementing aspects of the above embodiments will now be described. The left-hand portion of Figure 5A depicts a system for receiving mcoming audio and adding to its synchronized text using software tools provided by RealNetworks. The right-hand side of Figure 5A depicts an alternative or additional embodiment using software tools provided by Microsoft Corporation of Redmond, Washington. A vertical dashed line schematically differentiates between the two embodiments. Referring first to the left-hand side of Figure 5A, the audio server

104 includes a RealAudio™ encoder 502 that receives the mcoming audio stream from an audio source and converts it into an encoded audio stream to be transmitted over the communications link 106 to a RealServer 504. To ensure synchronization between the audio and text generated therefrom, the encoded audio is broken into discrete subfiles at, for example, one minute intervals represented by file segments 106 in Figure 5 A (numbers 1, 2 . . .), which are stored in the archive or database 110. As described below, the Wall Clock tool provided by RealAudio employs timestamps in each of the file segments 506, and in the RealText™ encoded text stream (provided over the communications link 124) to synchronize them.

The Wall Clock function is supposed to maintain synchronism between a RealText™ file and a RealAudio file, regardless of the length or duration of each file. However, tests have shown that the function does not work and thus synchronism is lost. Therefore, as described herein, a solution employed by an aspect of the invention creates several individual one minute files and stitches them together to provide appropriate synchronism. The distribution server 108 retrieves each stored file segment 506, stitches them together, and combines them with the encoded text stream, to generate a synchronously combined audio and text file representing a single audio event or discrete media content. A human operator may associate a title or name to the combined file, which forms one of several titles or event names of a playlist 508. The client workstation 136 includes a browser or communication software to establish a communication link with the distribution server 108 via the network 132. The client workstation 136 also includes a Real Player application for receiving a subscriber-selected title or audio event from the playlist 508 for replay and display, as described herein.

Referring to Figure 5B, details regarding synchronizing audio with text is shown as a routine 520. Beginning in block 522, the RealServer receives the encoded audio stream and, under block 524, stores the encoded audio stream in discrete one-minute RealAudio format files in the database 110. In other words, the system receives the mcoming encoded audio stream, parses it into discrete blocks, and stores each block as a separate file. "Parsing" simply refers to breaking the mcoming information signal into smaller chucks. While one-minute chucks are described, the system may divide the stream into larger chucks, however the resulting synchronism will have a correspondingly greater granularity. If shorter than one minute blocks are used, then the system may provide even closer synchronism between the audio and text streams; however, greater processing is required to store and retrieve a resulting greater number of discrete files.

In block 526, the RealServer constructs the playlist 508 that identifies each consecutive one-minute file segment. In block 528, the RealServer receives the initial line of text (such as in ASCII format) from the scopist workstation 120 during a telnet session.

In block 530, a RealServer executes a script to initiate two substantially simultaneous processes. The first process starts the RealText process to encode the received text into RealText format. The second starts a G2SLTA process to simulate a live broadcast of audio by identifying from the playlist and retrieving the first one-minute file. The G2SLTA, RealText and other functions are provided by the software development kit (the RealG2 SDK) distributed by RealNetworks. In block 532, the RealServer transmits the RealText encoded text and RealAudio file to a requesting subscriber. In block 534, the RealServer determines whether an end of hearing flag is received in the current RealAudio file. If it is, then the routine ends in block 536. If not, then in block 538, the RealServer retrieves the next audio file identified in the playlist and receives the next lines of ASCII text. The routine then moves back to block 530 where the RealServer encodes the received text and retrieves for transmission the next audio file.

The stitching together of the consecutive one-minute audio files may be performed to generate a single complete stored audio file representing the complete hearing. Of course, if the hearing is lengthy, the hearing may be broken into two or more separate files, where each file is considerably longer than the single minute. Thus, a lengthy hearing may be broken into separate parts, e.g. to ease transcription. Each audio file (and associated text or other character transcription file) includes a start of hearing flag indicating the beginning of the hearing or other content and an end of hearing flag indicating the end of the hearing/content.

Referring to the right-hand side of Figure 5A, under the alternative embodiment, a Windows Media™ audio encoder 510 receives the mcoming audio stream and converts it into an encoded audio stream, which is transmitted over the communication link 106 to a Windows Media™ server 512. The Windows Media™ server 512 routes the encoded audio for storage in the archive or database 110 over a communications link 514, and forwards the audio over the communications link 112 to the processing facility 140. The distribution server 108 then retrieves the stored audio (represented by line 516) and combines it with the text received over the communication links 124. The client workstation 136, employing not only a browser or communication facility, but also a Windows Media™ player, requests and receives the audio and associated and synchronized text for replay on the workstation.

Refeixing to Figure 5C, the mcoming voice audio is split and sent to the transcription agent at the voice recognition server 114, and the raw text is then transmitted to an editing agent at the scopist workstation 120. The incoming audio is also delayed for approximately five minutes under a delay block 540. Such a delay may simply involve storing the audio in the archive 110 for five minutes and then retrieving it. An encoder block 542 receives the delayed audio and combines it with the edited text from the workstation 120 to create an encoded ASF format stream. The ASF (Advanced Sfreaming Format) refers to an open file format specification for sfreaming multimedia files containing text, graphics, sound, video and animation. ASF does not define the format for any media streams within the file. Rather, it defines a standardized, extensible file "container" that is not dependent on a particular operating system or communications protocol, or on a particular method (such as HTML) used to compose the data stream in the file. The Windows Media™ server 512 receives the merged audio and text ASF stream and streams it to a client browser 544, such as Internet Explorer or Netscape Navigator, running on the subscriber workstation 136. The client browser 544 plays the stream with a media player, such as Windows Media™ Player, and displays the text via Java scripts. The encoder 542 may be implemented in Visual Basic. The encoder receives the audio input from a sound card in the audio server 104, and receives the text from a TCP/IP connection with the processing facility 140.

Referring now to Figure 5D, data flow will be described. A voice or other audio stream, after being digitized, is input to a voice recorder block 546 running on the Windows Media™ audio encoder 510 of the audio server 104. First, the voice recorder encodes the voice signal, through a sound card, into an ASF voicestream. Second, the ASF voice stream is received by the media server 512 and broadcast to the encoder 542. The encoder may be a Visual Basic program that encodes voice and text, so that the client browser, using the Media Player, may play the voice ASF broadcast. The encoder uses a TCP/IP connection to get the text input from the scopist workstation 112. The encoder then merges the text with the delayed voice into a combined ASF stream and sends it to the media server 512 for broadcast.

Referring to Figure 5E, a basic process flow is shown as beginning in a block 550 where the mcoming voice audio is encoded by the voice recorder 546 into an audio ASF stream. In block 552, the media server 512 receives the encoded stream and broadcasts it to the encoder 542. In block 554, the encoder combines the audio with a received text stream to create a combined audio and text ASF stream. In block 556, the media server 512 receives the combined ASF stream and broadcasts it to a client browser. In block 558, the client browser 544 receives the combined audio and text ASF stream and replays it on the subscriber workstation 136. While two methods of provided encoded media synchronized with corresponding character streams are described herein, those skilled in the relevant art will recognize that other methods are possible for providing such synchronized content to subscribers within a packet switching network (e.g. the Internet). Referring to Figure 6, an example of a hardware platform for developing and distributing stored media and associated text or character files to subscribers includes a development environment 602 as a separate computer platform for testing content distribution. A Redundant Array of Intelligent Inexpensive Disks ("RAID") 604 stores the encoded media (audio, video, etc.) and associated text or character files. In one embodiment, a RAID 5 system is employed. A RAID 5 system employs large sector striping data with a rotating parity stripe. As a result, a RAID 5 system does not employ a dedicated parity drive and therefore write performance is better than other RAID systems. The RAID 5 system also provides better storage efficiency than other RAID systems. Fmlher details regarding RAID systems may be found, for example, at http://www.vogon.co.uk. The RAID systems store archives of transcripts and associated audio for hearings or other content. While RAID systems are depicted, alternative embodiments may employ any known data storage system, such as automated tape cartridge facilities produced by Storage Technology of Louisville, Colorado or by ADIC of Redmond, Washington. Additionally, some subscribers may maintain their own secure database and archives. The data model (described below) and other aspects of the system may be licensed or sold to third parties or subscribers, particularly those wishing to maintain their own secure database of archived content. The system functionality described herein may be provided by a secure subscriber server, independent from the publicly accessible system described herein. Users of such a secure system may simply access the secure database and receive synchronized sfreaming media and text after being first authorized by the system. .Indeed, the development environment of Figure 6 represents such an alternative embodiment of a separate, secure system. A G2 server (sold by RealNetworks) and database server 606 provide an interface with the RAID system 604, and communication with a web server 610, through a firewall 608. The firewall 608 may be implemented by a secure server or computer running any variety of firewall or network security products. The web server 610 may include an authentication server or implement authentication services. The web server 610 communicates with the network 132 to receive requests for web pages and files stored in the RAID system 604, and delivers results of such requests to a test base of workstations or computers operated by test subscribers who are coupled to the network.

A separate production system 614 is quite similar to the development environment 602, but it includes substantially more storage and processing capabilities. As shown in Figure 6, the production system employs three RAID systems 604. Two G2 servers and a separate single database server 606 are coupled between the firewall computer 608 and the RAID systems 604. A Domain Name Server ("DNS") 616 is coupled between the web and authentication server 610 and the network 132. The DNS converts domain names into Internet protocol ("IP") addresses. IP addresses consist of a string of four numbers up to three digits each, which are associated with IP address text names (e.g., www.hearing.com). A separate DNS helps speed resolving IP addresses and facilitates content exchange with subscribers' workstations. The DNS, web and authentication servers and/or G2 and database servers may be located close to an Internet backbone, such as at Uunet facilities or facilities provided by Akami. The production environment 614 represents a suitable implementation of the distribution server 108. As shown, a separate web server, such as an Apache web server, is provided for handling web page delivery functions, while a separate server handles audio and text file retrieval and sfreaming, such as G2 servers by RealNetworks. Of course, in an alternative embodiment, a single server may perform all the functions, or even more servers may be employed to furtlier compartmentalize various functions performed by the system.

Functional aspects of the above hardware will now be described with respect to Figures 7 through 9. Referring to Figure 7, a programming schematic illustrates functionality performed by the above hardware and represents how the system starts processes, manages functions and stores data. The system includes five main processes: broadcasting under block 702, audio encoding under block 704, processing under block 706, archiving and synchronizing under block 708 and web interfacing under block 710. Beginning with the broadcasting block 702, live audio (or other media) is streamed from an event, such as from Senate and House hearing rooms, where the audio is controlled through audio technicians or operators and multiport patch panels, which may be positioned at or near the hearing rooms.

Under the audio encoding block 704, an audio encoder 712 receives the live audio feed, converts it from analog to digital, and then encodes the digitized audio for distribution or routing to the processing block 706. The audio encoder may be the RealMedia server described above. A G2 server 714 in the processing block 706 receives the encoded audio and rebroadcasts the live encoded audio stream in RealMedia format to a voice writing block 716. The voice writing block 716 receives the live audio stream and, using voice recognition software, creates a text sfream that is routed to a scoping or editing block 718. The scoping block 718 cleans up or edits the text stream to fix errors and provide additional content or value to the text. The scoping block 718 then feeds or sfreams the accurate or edited text stream to an RT receiver 720.

The RT receiver receives the edited text stream and routes it to a keyword alerts block 722, a RealText™ encoder 724 and a database server 726. The keyword alerts block 722, scans the text sfream for keywords that have been previously submitted by subscribers, as described herein. The RealText encoder block 724 employs the RealNetworks RealText encoder to encode the text into RealMedia text stream for use by Real Player client software residing on subscriber workstations. The database server block 726 stores the text sfream in a predefined data structure, which forms part of the archiving process.

An aichiving block 728 forming part of the archiving and synchronizing process 708 stores the audio and text streams for all processed hearings or other events received from the processing block 706. The database server block 726 interfaces with the archiving block 728, stores text and media streams for accurate retrieval and subscriber querying, and maintains an updated index of all stored and archived audio, text and other content files (such as a "playlist" in the RealAudio embodiment). A synchronizing block 730 receives the audio and text streams from the archiving block 728, and synchronizes text and audio streams for transmission to the web interface process 710. Audio and text stream blocks 732 in the web interface process 710 receive the audio text streams from the synchronizing block 730 and provide such synchronized streams to requesting subscribers' workstations.

Referring to Figure 8, a schematic block diagram illustrates data flow between processes under the system of Figure 1. After the mcoming raw media is encoded by the media encoder or audio server 104 at a time equal x, then the encoded media is routed by the media server or distribution server 108 to the transcription generation facility or processing facility 140. The distribution server stores the encoded media and retrieves it after some delayed time y. The time delay corresponds to the time it takes the processing facility to generate the transcription for the encoded media.

As shown in Figure 9, the voice writing process 716 receives the encoded media at the time x where x corresponds to the start time of the hearing, audio event or beginning of the aural media. The scoping process 718 receives the raw text produced by the voice writing process, edits it and provides it to the RT receiver process 720. The mcoming encoded media includes a hearing ID that uniquely identifies each hearing or other audio content, which may be provided by the G2 server 714. In block 902, the RT receiver process sets the time x to the current system time. In block 904, the RT receiver process receives a line from the scopist workstation and sets a delay value y to the current system time. With hearings, for example, the incoming spoken words are parsed into lines, where each line may represent, for example, approximately 85 character spaces, approximately 15 words or an average line of text on a document, although many other methods of parsing mcoming audio and transcribed text may be performed. For example, if the system is transcribing audio into text, where the text is to be displayed on a small device (such as a palm-top computer or cell phone), then each line will represent parsing of the mcoming audio into smaller increments. In block 906, the RT receiver process 720 determines a delta value indicating how long from the beginning of the hearing the currently received line was spoken. Delta represents the current system time y minus the initial time x. In block 908, the RT receiver process stores in the database the currently transcribed line, the value y and/or the value delta. Thereafter, the process loops back to block 904, and blocks 904 through 908 are repeated until the transcription is completed. As the encoded audio is being transcribed, the transcribed text is routed back to the distribution server 108, from the processing facility 140, with the timestamp indications (such as the value delta), as shown in Figure 8. The distribution server then combines the received text with the audio or multimedia content that has been delayed by the value y to thereby synchronize the text with the original media, and delivers it over the communication line 130. Suitable User Interface

Referring to Figures 10 through 31, representative computer displays or web pages will now be described with respect to requesting and receiving audio and corresponding associated character streams, in particular, a congressional hearing and associated transcript. The web pages may be implemented in Personal Home Page ("PHP") as PHTML scripts. PHP is a scripting language similar to JavaScript or Microsoft's VB script. Like Microsoft's Active Server Page, PHTML contains programming executed at a web server rather than at a web client (such as a browser). A file suffix ".phtml" (or ".php3") indicates an HTML page having a PHP script. PHP is an embedded, server-side scripting language similar in concept to Microsoft's ASP, but with a syntax similar to Perl. PHP is used on many web sites, particularly with Apache web servers. PHTML files may represent user interface pages, such as web pages described below, while PHP files represent files and executables running on the server. Interaction between different PHTML scripts in web page forms provided from subscribers via subscriber workstations 136 are handled by the distribution server 108 (or web servers 610) with "categories functions." Categories, as generally referred to herein, refer to character codes (such as two-character codes) that distinguish functional code groups. Functions, as generally referred to herein for pages, are actions within a categoiy. Most functions originate as either a hidden item in a form (e.g., automatically executed when the page is served) or when the submit buttons on web page forms are selected. Categories need only be unique within one PHTML file. Functions need only be unique within a category. Within each PHTML file, functions specific to a category are prepended with a file identifier and category identifier. For example, all functions in a company administration file (cau^nin.phtml) for a Contact Information category are prefixed with "CAO." Each category has a dispatch function that takes the function variable (e.g., "$func") and passes it to the particular function that handles it. The dispatch function is prefixed with the file's prefix code and suffixed with the category's code (e.g., "function CADispatchCI($func)"). Each file also has a main category dispatch function that calls the individual dispatch functions, such as in the form "function CACatDispatch($cat, $func)," where "$cat" refers to a category variable.

The web pages may also be implemented in XML (Extensible Markup Language) or HTML (HyperText Markup Language) scripts that provide information to a subscriber or user. The web pages provide facilities to receive input data, such as in the form of fields of a form to be filled in, pull-down menus or entries allowing one or more of several entries to be selected, buttons, sliders or other known user interface tools for receiving user input in a web page. Of course, while one or more ways of displaying information to users in pages are shown and described herein, those skilled in the relevant art will recognize that various other alternatives may be employed. The terms "screen," "web page" and "page" are generally used interchangeably herein. While PHP, PHTML, XML and HTML are described, various other methods of creating displayable data may be employed, such as the Wireless Access Protocol ("WAP").

The web pages are stored as display descriptions, graphical user interfaces or other methods of depicting information on a computer screen (e.g., commands, links, fonts, colors, layouts, sizes and relative positions, and the like), where the layout and infoπnation or content to be displayed on the page is stored in a database. In general, a "link" refers to any resource locator identifying a resource on a network, such as a display description provided by an organization having a site or node on the network. A "display description," as generally used herein, refers to any method of automatically displaying information on a computer screen in any of the above-noted formats, as well as other formats, such as email or character/code-based formats, algorithm-based formats (e.g., vector generated), or matrix or bit-mapped formats. While aspects of the invention are described herein using a networked environment, some or all features may be implemented within a single-computer environment. To more easily describe aspects of the invention, the depicted embodiments and web pages are at times described in terms of a subscriber's interaction with the system. In implementation, however, data input by a subscriber is received by the subscriber workstation 136, where the workstation then transmits such input data to the distribution server 108. The distribution server 108 then performs computations or queries or provides output back to the subscriber workstation, typically for visual display to the subscriber, as those skilled in the relevant ait will recognize.

Referring to Figure 10, an example of a home page 1000 is shown for beginning a session or interaction with a web site for accessing congressional hearings online (such as at the URL http://www.hearingroom.com). From the home page, a new user may navigate to various additional pages (not shown) that provide information regarding the system, without requiring the user to become a subscriber. Such information may include information regarding subscription levels. Such information may include the name of the subscription level (gold, silver, platinum, and the like), maximum number of users per subscription level, maximum number of live hearings per week per subscription level, and the like. The subscription levels may include a "View all Live" premium subscription level that would permit such a subscriber to view all hearings live (at a significantly higher annual fee than other subscription levels). Clicking a login button 1002 causes the distribution server 108 to provide or serve up to the requesting computer (such as the subscriber workstation 136) a login dialog box 1002 shown at Figure 11. The subscriber enters a user name and password in input fields 1002 and 1004 before clicking an okay button to be authorized to log on to the system.

Each time a subscriber logs on to the system, a new session begins. Under one embodiment, the system uses session management routines provided by the personal home page version 4 software ("php4"). Under an alternative embodiment, a script may provide similar functionality. When the subscriber connects, the subscriber's browser running on his or her computer (e.g., workstation 130) sends a cookie to the distribution server 108. If the cookie does not exist, the server creates one. The cookie contains a unique identifier that the web server software (such as php) reads in a local file, which contains a session state for that session. Subscriber authentication may be handled by a web server, such as an Apache web server.

Referring to Figure 12, an example of a custom web page as displayed to the logged-in subscriber is shown. The page displays the subscriber's name in a name portion 1202 ("Demonstration, Inc."), and an introductory message 1204 ("Hearing lineup for Thursday, July 20th, 2000"). Displayed below in a portion 1206 are short descriptions of the hearings requested to be delivered to the subscriber. A background link 1208 allows the subscriber to click thereon to receive background information with respect to each listed hearing, as described below. An archive button 1210 links to a page to permit a subscriber to search for and/or order an archived transcript (such as the page of Figure 25). A subscribe button 1212 allows the subscriber to subscribe to and receive a transcript of an upcoming hearing (such as by providing a page as shown in Figure 13).

Referring to Figure 13, an example of a web page 1300 for accessing information regarding hearings and to subscribe to receive a transcription of the hearing is shown. A hearings calendar link 1302 allows the subscriber to click thereon to receive a chronological list of upcoming hearings. A future hearings by committee link 1304 allows the subscriber to click thereon to receive a list of future hearings sorted by committee. A calendar 1306 displays the days of the current month, with links for each day. Clicking on the link for a particular day allows the subscriber to receive a web page displaying a list of all hearings for that day. For example, clicking on the day May 20^th in the calendar 1306 causes the system to display a day listing 1308 that provides the subscriber with access to Senate and House of Representative hearings information.

By clicking on the "+" sign 1310 or a committee details link 1312, the Senate hearings for the following committees are displayed, as shown in Figure 14A: Commerce, Science and Transportation; Energy and Natural Resources; and Health, Education, Labor and Pensions. A Committee Details link 1312 provides information on the committee, although this link may be omitted. By clicking on the "+" sign 1402, for example, the system provides a web page 1500, shown in Figure 15, that displays particular Energy and Natural Resources hearings for that day. As shown in Figure 15, each hearing is listed with its title, the Senate committee or subcommittee conducting the meeting, and its time and date. An order button 1502 allows the subscriber to order a transcript of the hearing, while a background link 1504 allows the subscriber to receive background information, as described below.

Referring to Figure 14B, an example of a settings page 1450 is shown, which may be retrieved by clicking on the settings button 1214 (Figure 12). A subscriber may edit contact information, change passwords, change keyword groups (described below), determine account information (under "Assets"), and edit company information as shown.

Clicking on the order button 1502 causes the system to provide a subscription web page, such as the page 1600 shown in Figure 16. With respect to each hearing, the subscriber may click one of three radio buttons: a live sfreaming hearing button 1602 to receive the transcript in near real time, a two-hour transcript button 1604 to receive the transcript with a two-hour time delay, and a next day transcript button 1606 to receive a transcript the day after the hearing. Alternatively, the two-hour option may be replaced by a "same day" option. A buy button 1608 allows the subscriber to purchase or subscribe to the selected hearing, while a cancel button 1610 cancels the transaction and returns the subscriber to the previous screen. Clicking the buy button causes the system to deduct the appropriate amount from the subscriber's account, while clicking the cancel button credits the subscriber's account.

After clicking the live sfreaming hearing button 1602, and clicking the buy button 1608, as shown in Figure 17, the subscriber is returned to the hearing listings page 1500 as shown in Figure 18. As shown, the hearing ordered by the subscriber no longer has associated with it the order button 1502, but now has associated with it a keywords button 1802. By clicking on the keywords button, the system provides a keywords page 1900, as shown in Figure 19. Keyword entry fields 1902 allow the subscriber to enter one or more keywords that the subscriber wishes the system to identify within a hearing, and to provide notification to the subscriber when such terms are uttered during the hearing. As shown in Figure 19, five keywords are entered in five keyword fields. The subscriber may enter as few as one word, and the system can provide more than five fields. An update keywords button 1904 allows the subscriber to select the keywords and return to the previous screen. A duplicate group button 1906 allows the subscriber to copy the words in these fields for use as keywords for other hearings. A delete group button 1908 allows the subscriber to delete all entries within the keyword entry fields 1902. As shown in Figure 20, when a select number of keywords are identified within a hearing transcript, the system provides an email message 2000 to the subscriber, providing the subscriber with notification that one or more of the subscriber's keywords have been uttered during a selected hearing. As shown in Figure 20, the email message includes a subject line 2002 that identifies the email message as an alert ("LiveWireAlert"), and the title of the hearing. The body of the message indicates to the subscriber which keywords have been uttered (in this case, "Richardson" and "Exxon"), and provides a link 2006 to allow the subscriber to click thereon to immediately listen to the hearing in progress. In this example, the system provides an email notification when two of the subscriber's keywords were identified within the transcript of the hearing. In other embodiments, as few as one, and as great as all fields entered by the subscriber may be required to be located within the transcript before the system provides an email notification to the subscriber. Additionally, in alternative embodiments, more detailed query constructs may be created, such as using Boolean connectors, and the like. In operation, the keyword alerts process 722 scans the text generated by the processing facility 140 for any keywords entered by one or more subscribers. The system accumulates multiple mentions of the same keyword until a threshold is exceeded. For example, the system may require five mentions of one keyword, and a single mention of a second keyword in a subscriber's list of keywords before providing an alert to the subscriber. Thus, the system does not provide alerts every time a keyword is mentioned, but only for occurrences where a keyword is mentioned a sufficient number of times to indicate that the substance of the hearing may correspond to that keyword.

The system provides not only an alert to the subscriber, but also a portion of the franscript that includes the one or more keywords and associated text to provide a context for the located keywords within the transcript. For example, the system may provide the line of transcript before and after the line containing the one or more keywords (similar to that shown in Figure 27). Based on this alert and portion of transcript text, the subscriber may wish to purchase or obtain the entire transcript. Thus, the alert may include additional information to permit the subscriber to obtain the hearing and transcript (such as the link 2006). Thus, future hearings not specifically ordered by a subscriber may be scanned, and the system may provide corresponding alerts. Of course, many other methods of establishing a query, scanning or analyzing results from a transcription, notifying a subscriber and presenting a summary portion of the transcript may be employed. If the alert is provided to a subscriber's portable computer (such as cell phone, palm-top computer, etc.), the system may provide an augmented version of alert notification to that described above. For example, the system may provide a smaller context for the alert terms (such as only the five words that precede and follow the keyword). The subscriber may order a full copy of the transcript to be received over the portable device, or receive a much greater summary or larger context of the proceedings that include the search terms. The subscriber may be able to listen to the audio over a cell phone, for example. The subscriber may have to pay additional amounts for such service.

Referring to Figure 21, if the user wishes to simply listen to a hearing live, while viewing the corresponding transcript, a web page or window, such as a window 2100 shown in Figure 21, is provided to the subscriber. The window includes a heading 2102 that identifies the hearing, time and date, and provides the background link 1504 for additional information. The conventional Real Player controls 2104 provide a graphical user interface for the user to control the delivery of audio (such as pausing, stopping, rewinding, adjusting volmne, etc.), although speed, pause and other functions may not be available for the delivery of live audio. A transcription portion 2106 displays the text transcription previously created by the processing facility 140.

Referring to Figure 22, if the subscriber clicks on any of the background information links 1502 in Figure 15, the system displays detailed information regarding the associated hearing, such as that shown on page 2200. A hearing details section 2202 provides details regarding the date, time, committee, location and full name of the hearing. A member list link 2204 links to a list of committee members, while a supporting materials section 2206 provides related information regarding the particular hearing identified in the hearing details section, such as provided by a link 2208. As shown in Figure 23, by clicking the member list link 2204, a list of committee members is presented as organized by Republicans and Democrats. Each committee member has an associated link 2302, whereby clicking on such link causes the system to display details regarding that particular committee member. Clicking on any of the supporting materials links 2206, such as a link 2208, causes the system to display the associated materials, such as a press release displayed in a window 2400 shown in Figure 24.

Human researchers perform research based on the hearing and who is scheduled to testify at the hearing. All background research is preferably from a primary source, and includes original material retrieved from well-respected and rehable sources. The research provided by the system may be in-depth public information with numerous citations. General background information may be avoided, but instead information that is relevant and not redundant is provided to subscribers in a single format.

Before each hearing, researchers procure primary research materials, including government reports, press releases and opinion reports, from the Internet or other sources, where such research is posted to the system and associated with a particular upcoming healing, to be retrieved when a subscriber clicks the background link 1504. Researchers are taught to select only the most insightful and pertinent research materials for a hearing. In one embodiment, only three to six pieces of background material are posted for each hearing (or more if it is a high-profile hearing). The researcher performs the sifting of information for the subscriber to edit out superfluous or general background information. As a result, a subscriber retrieves only the most important material when accessing the background link.

Researchers may also be responsible for retrieving prepared testimony from hearing rooms and scanning that testimony into electronic form to be posted with other background information on the system. Such researchers may also be charged with the task of making physical audio connections in certain hearing rooms and testing functionality of these connections. Such researchers or "legislative interns" who have need to access hearing rooms are issued a Capitol Hill Press Pass from the Senate Radio/TV Gallery to carry out such duties. Furthermore, researchers may also do internal research for the system that does not get to published to subscribers. For example, researchers may train the voice recognition software, where such fraining focuses on jargon, abbreviations, scientific terms, foreign language usage, proper names and any other terms that may be received in the mcoming audio sfream. Such fraining may be particular to an upcoming healing

Referring to Figure 25, if a subscriber clicks on the archive button 1210, the system displays an archive page 2500 to permit the subscriber to search through stored hearing transcripts for keywords. A keyword search field 2504 allows the subscriber to enter a keyword to be searched, while a find button 2506 initiates the query. An advanced search link 2508 allows the user to access enhanced search tools, such as multifield Boolean searching, and the like.

Figure 26 shows an example of a query result screen 2600 that lists three hearings that include the search term "AOL." A hearing listed in the query results without a order button indicates that the subscriber has previously ordered that hearing, and thus need not pay additional costs to access the full transcript. A listen link 2602 allows the subscriber to listen to the hearing as it has been previously stored in the database of the system. A read link 2604 allows the subscriber to retrieve from the database only the text portion of the transcript to view. A view results button 2606 allows the subscriber to view the line of the transcript containing the search term, and the single lines that precede and succeed it, as shown in a screen 2700 of Figure 27. If the subscriber selects the read link 2604, the entire stored franscript is displayed to the subscriber, such as that shown on page 2800 of Figure 28. Alternatively, the subscriber may click the listen link 2602 to view both the transcript text and listen to the audio, as shown in Figure 29. As shown in Figure 30, the subscriber may select portions of the franscript text, such as a block of text 3000. The subscriber may then cut and paste the selected text, such as pasting the text into an email message 3100 shown in Figure 31. As a result, the subscriber may readily employ such transcribed hearings for use in other documents (such as word processing documents), for routing to other individuals (such as via email), or many other purposes.

When a subscriber receives a live stream of media and associated transcription, a client-side search routine may provide local searching capability to the subscriber. For example, the workstation 136 may provide a search module, with a user interface similar to that in Figure 25 or 19, that permits the subscriber to enter key words or query search terms. The workstation, in turn, analyzes the sfreaming text received for the key words. When a key word or query string is found in the mcoming text, the system provides an indication or alert to the subscriber. Thus, the subscriber may perform additional tasks on the workstation, and then receive a pop up dialog box or other notification to permit the subscriber to redirect his or her attention back to the sfreaming audio and transcription.

The subscriber may also employ other known search tools, such as the common "Find" function (often available under a key combination of "Alt"- "F"). Thus, the subscriber may search in retrieved text for one or more search terms.

Figure 26 shows an example of a listing of three hearings and important information regarding each hearing, such as the hearing title, its date, time, and the like. This information is captured and stored in the database. The information is stored in the database as separate fields for each hearing to permit subscribers to readily search and retrieve archived hearings. Of course, a greater number of fields may be stored for each hearing to permit greater searching capabilities of the database, although fewer fields may be employed. Suitable Data Model

Shown below are examples of a few tables defining data structures for data objects stored in the database 110 and employed by the distribution server 108 and other portions of the system of Figure 1. The tables below identify various data objects (identified by names within single quotation marks preceding each table), as well as variables or fields within each table (under the "Column Name"). The tables also identify the "Data Type" for the variable (e.g., integer ("int"), small integer ("tinyint"), variable character ("varchar"), etc.). The "Constraints" column represents how the column number is generated. A "Description" column provides a brief description regarding some of the fields. Each table includes a "Primary Key" that is a unique value or number within each column to ensure no duplicates exist, and one or more "Key" values representing how extensively the tables are indexed. In general, the data objects in the database may be implemented as linked tables as those skilled in the relevant art will appreciate.

Table structure for table 'hearings*

Description of the Table:

Contains a list of the hearings.

The "Excuse" field in the hearings table above identifies why audio could not be received or recorded from a hearing, such as lack of working microphones in the hearing room, problems with the audio server 104, and the like. The "Keywords" field allows the system to identify keywords, such as metadata, that can be used by search engines to identify a particular hearing. Rather than employing the Keywords field, the system may simply use the "Name" field for searches, where the name represents the title of the hearing. The "ResearchDone" field represents whether a human intern or other individual has perfoπned the required research regarding a hearing, such as obtaining the members' names, any list of witnesses for the hearing, and background research regarding any relevant documents for the hearing (such as scans in .pdf or other format of previously prepared testimony). This research is used when the subscriber clicks on the background link 1504. The "Status" field represents one of eight flags indicating status of a particular hearing: (1) whether audio is to be recorded and stored for a hearing; (2) whether a transcript is to be obtained from the recorded audio; (3) whether the audio recording is in progress; (4) whether transcription of the audio is in progress; (5) whether the audio encoding storage is complete; (6) whether the audio files have been stitched together to complete a complete single audio file (as described above); (7) whether transcription of the audio has been complete; and (8) whether the hearing is complete. Table structure for table "infoTypes'

Description of the Table:

Types of background information.

Table structure for table "keywords'

Description of the Table:

User keywords for LiveWireAlerts. Keywords should be deleted after a hearing is concluded.

Table structure for table "membercomm'

Description of the Table:

Associates members with one or more committees.

Table structure for table "members'

Description of the Table:

Lists the members of Congress.

Other tables may include the following. An "Assets" table may store a list of a company's or subscriber's total assets, where a company may have multiple assets with different expiration dates due to different subscription programs (as described herein). Higher level logic may ensure that the assets list makes sense, whereby if one asset has a negative value, assets of a positive value are applied to offset the negative one. A "Billing" table allows subscribers to create separate billing entries for separate billing events for themselves. A zero or negative hearing ID value means that the billing event was canceled, and that the funds were restored to the company's assets; entries should not be removed from the billing table.

A "BillingNotes" table stores notes regarding a subscriber's billing entry. It is a separate table from the billing table to allow the billing table to use a fixed row size. A "Building" table may keep track of building abbreviations and names, such as "RHOB" representing an abbreviation for the Rayburn House Office Building. A "ClientCodes" table may represent subscriber code numbers. A "CmteLabei" table may store labels for committees, so that committee names will not have to all start with "Committee on" or "Subcommittee on." A "CmteRanks" table may store rankings of members in a committee. A "Cmtes" table provides a list of committees, including fields to indicate whether the committee is for the House or Senate, an internally determined committee number, and an internally determined subcommittee letter.

A "Company" table describes a company or subscriber to the system. A "Contacts" table stores internal contact lists for the system, including names, phone numbers, addresses, etc. A "Costs" table may store a pricing structure for the system, and may include an access field representing an access level of the subscriber. A "FileTypes" table may store hearing background information to allow the system to properly display it to subscribers. A "Hearcmte" table may associate hearings to one or more committees. The hearcmte table stores the associations between hearings and committees. A similar table is used to store the relations between members and committees. The hearcmte table structure is: row number, hearing number, committee number. The row number is unique within the table; hearing number points to a hearing, and committee number points to a committee. In this way, a hearing can easily have many committees associated with it. A "Hearinglnfo" table may store background information for hearings, including links to appropriate stored documents or other background materials described herein.

A "MoreAccess" table may provide finer-grained access controls to an administrative section of the system. A "MoreAccessNames" table may contain English names for the MoreAccess table's field names. "Politparty" and "states" tables may list political party affiliations and states, respectively. A "Rooms" table may list buildings and room numbers in which a hearing may be located. A "SubscriptionLevels" table may list subscription levels for subscribers. A "Subscriptions" table may list all the subscriptions made on the system. A "Tasks" table may list tasks for processing by the system. A "TransactionTypes" table may list different purchase options for hearings provided to subscribers by the system. An "UnavailableReasons" table may list reasons why a hearing cannot be covered. A "Users" table may list users or subscribers to the system. A "WhatsNew" table may list information new to the web site or provided by the system, which may be displayed to subscribers. Figure 32 is a data model relationship diagram that shows the relationship between the various tables described above. In general, the system constructs a dynamically expandable database that links audio or other media, associated transcriptions with respect to that media, and other associated content, such as the background information described herein, and subscriber and business information. The system overall acts as a production, storage and retrieval system. Those skilled in the relevant ait can create a suitable database from the schema described herein using any known database, such as MySQL, Access by Microsoft or any Oracle or Sequel database. For example, Figure 33 shows an example of a database schema for implementing the data model described herein within an Oracle database.

Enhancements and Some Alternative Embodiments

While the system has been described above in one configuration, various enhancements or modifications may be made. The hardware platform, and associated software and software tools, may, of course, employ future versions or upgrades thereto. A more robust search capability may be provided than that described above. For example, subscribers may be able to search not only hearing transcripts, but also search through all background materials associated with such hearings. Users may be able to provide additional refinements of searches, such as by searching sessions of Congress (e.g., "second session of the 106 congress").

In addition to receiving audio from microphones positioned within hearing rooms, individuals may attend hearings with a laptop and microphone to locally receive and encode the audio from the hearing. The laptop may encode the audio and then transmit it via a wired or wireless connection to the distribution server 108. Rather than receiving and replaying each one-minute audio file to create a stitched complete audio file, production may be enhanced or expedited, whereby each one-minute audio file need not be replayed for each one-minute of audio to create the resulting complete file. The system may provide simply the audio of a hearing to a subscriber, without the associated transcription. Of course, providing such audio will be at a lower price, and may be offered on a pay per listen basis.

Rather than employing Apache web servers that perform authentication, the system may be more cookie-based for each session, whereby a password may be used only once. Live-wire alerts, or alerts regarding the system's recognition of a subscriber's key term in received audio may be performed using various telecommunications means, such as paging a subscriber, placing a prerecorded call to the subscriber's cell phone, sending an email message over a wireless link to a subscriber's palm-top or hand-held computer, and the like. Audio may be split so that one audio source may be effectively duplicated in real time to be sent to two or more locations, such as to the archiving facility and to the processing facility. The system may require subscribers to indicate at a specified time before a hearing whether the subscriber wishes to receive the transcription of the hearing to thereby provide sufficient time to gather background information. The audio server 104 may include automated or manual gain control to adjust line level and improve signal-to-noise ratio of audio input thereto. A digital signal processor may be employed to help split voices apart when multiple speakers are talking at once. The processing facility 140 may include speech recognition modules trained for individual voices of House or Senate speakers (such as fraining separate modules to the voices of separate senators). In other words, the archive stores "famous" voice files relating to speeches or other recorded audio with respect to particular people. These files may be used to train the voice recognition system. As a result, fewer transcription agents would be required. Fmthermore, these files may be sold to others. Voice-over Internet protocol (IP) functionality may be employed by the distribution server 108 and processing facility.

The distribution server may employ data mining techniques and audio mining. For example, Dragon™ provides search tools for searching for words or phrases within audio files. As a result, subscribers may be able to search for keywords in audio files that have not been transcribed. The audio mining tools may review an audio file and generate an index. The index could then be published to subscribers who may perform text searches of the index and retrieve archived audio files. Known data mining techniques may be employed to search for patterns of desired data in stored transcripts. Such data mining tools and techniques may be provided to subscribers over a web interface.

Various subscription and revenue models may be employed by the system. In one embodiment, subscription to the system is sold on a yearly, declining-balance subscription model. Subscribers choose the hearing coverage and the timing for receiving transcripts (real time, two-hour delay, next day, etc.). The cost of each service ordered is deducted from the annual subscription fee. For example, live sfreaming of a congressional hearing may cost $500 for a "silver subscriber" (who pays a $5,000-a-year subscription fee), $400 for a "gold subscriber" (who pays a $10,000-a-year subscription fee) and only $300 for a "platinum subscriber" (who pays a $15,000-a-year subscription fee).

Subscribers or other users of the system may employ a pay-per-view (or pay-per-listen) model where access to a single hearing and associated transcript may cost, for example, $750. As described above, the database schema includes customer administrative functions and fields to ensure that subscribers are billed correctly. The system may also employ additional fields to permit subscribers to track client and matter numbers and time spent for a matter to a subscriber's clients. For example, a subscriber to the system may be a law firm that, in turn, may be required to track and bill its clients on a client and matter level.

System administrators and other individuals within an organization operating the system must ensure that the database fields are kept up to date. The database is used to populate web pages and provide information regarding hearings to subscribers. For example, the database must keep an accurate list of committees, including committee names, member names, committee descriptions, a list of subcommittees and links to all committee web sites. With respect to committee members, the database must include accurate information regarding the title (such as senator, representative, etc.), party affiliation, full name, role (e.g., chairman, secretary, etc.), member web sites, member biographies, etc. Hearing locations must also be maintained, such as location date, location description, room number, etc. Furthermore, background information regarding the hearing must be maintained by the database, including opening statements, prepared testimony, member list, witness list, and related materials (e.g., web sites, charts, diagrams, scanned text, etc.). Furthermore, the database must maintain accurate information regarding subscribers, such as the subscriber's name (typically a company name and logo), company address, account contact information, technical contact information (such as a technical support person at the subscriber's location), subscription level, subscription length (such as in months), etc. In addition to corporate subscribers, the database may furthermore maintain individual subscriber accounts and associated information. While the transcription of congressional hearings are generally described herein, aspects of the invention have broad applicability to various other types of content or media. For example, aspects of the system in Figure 1 may be used to search and recall recorded music that contains sung lyrics, specific recorded events derived from original audio proceedings (or video proceedings having an audio component), such as plays and other performances, speeches, motion pictures and the like. Live media events may be improved by providing sfreaming text together with the live event. Each of these media events or "contenf ' is digitally encoded and stored in archives or the database of the system. Such stored archives may be accessible to subscribers via the Internet. Users may search databases of text or other characters associated with the stored original content, receive and display variable length "blocks" of text corresponding to their search terms, and click on such recalled terms to hear and/or see the original content. The system described above creates an interactive link between the text and the original content.

Figure 34 shows an example of such an alternative embodiment as a routine 3400. Beginning in block 3402, the system receives and encodes original content, such as any recorded or live proceeding or performance in which at least a portion of that content is to be encoded into a character stream. For example, any event having spoken, sung, chanted or otherwise uttered word may be used. In block 3404, the encoded content is stored in discrete files in the database 110 by the distribution server 108. Two or more files may be created for a particular event, such as separately recording not only the event itself, but individual components of the events (such as recording a concert, and recording individual files representing vocals, guitar, drums and other microphone outputs). In block 3406, a database index is updated to reflect the new file or files stored therein. In block 3408, the system creates and stores a character file or files based on the newly stored content file (such as a text transcription of lyrics in a recorded song). In block 3410, the system links the newly created character file with the original content file. In block 3412, the system updates the database index to reflect the newly created character file and its link with the original content file. The system may also, or alternatively, create a new index representing the character file, such as an index reflecting lyrics of the song. All this information is stored by the system.

In block 3414, the system receives a user query, such as several words from the song's lyrics. The system may also provide additional search tools or filters (such as in a web page) that allow the user to further refine a search. For example, if the user is searching for a particular song, additional filters may indicate the type of music (e.g., rock 'n roll, blues, jazz, soundtrack, etc.) and/or artist. Such additional search or query refinements or filters helps speed the user's search and provide more relevant search results. In block 3416, the system searches the database based on the query and, in block 3418, provides the search results: one or more character files with linked and associated content files. The user may view the character file to see, for example, the lyrics, and/or receive the content, such as sfreaming audio for the song. In block 3420, the system may also permit the user to request or order a copy of the content file. For example, if the content file is a song, the user may, after providing payment or authorization, receive a download version of the song, or order a physical version (e.g., CD, cassette, etc.) of the song. As a result, in this example, if a user knows several words of a song, he or she may use search tools provided by the system to identify songs containing those words within the lyrics of the song, listen to some or all of the song, and view associated text (lyrics). While music has been used as an example, various other types of content may similarly be stored, encoded, linked, searched and requested/purchased.

The above processing system may be modified to create or produce content and associated character files for storage and distribution on permanent recordable medium, such as CD-ROMs. Such an application may apply to a broad variety of text materials, such as poetry, plays, speeches, language learning audio recordings, literature and other types of multimedia content from which text was originally derived under the voice recognition process described above. The associated text may be read from a computer screen or other device, and such text file may be searched so that a user may click on a search line of text to hear the actual audio associated with that portion of text. Figure 35 is an example of a routine 3500 for producing such CD- ROMs. Beginning in block 3502, the system receives and encodes original content as one or more new files. If the original content is already encoded, then the system need not perform any encoding functions. In block 3504, the system creates a character file from the original content file. For example, if the original content is a speech, movie or play, then the system creates a text transcription of the words spoken. In block 3506, the system links the character file with the content file and, in block 3508, creates an index for the file or files. In block 3510, the content file, character file and index are recorded on a recordable medium, such as a CD-ROM. Of course, various other recordable mediums are possible, such as magnetic tape, or other optical, magnetic or electrical storage devices. The index may include not only an association between the spoken word and the time it occurred during the original content, but other information, such as discrete portions of the original content. If the original content is a play, then the index may include act and scene designations. Thus, the user may employ search tools to retrieve not only specific lines in a screenplay but certain portions of the play. For music, the index may include not only lyrics, but also refrains, bridges, movements or other portions within the music.

A subset of CD-ROMs or other created physical medium may be used for teaching, such as teaching a foreign language. A foreign language audio file may be linked with two text files: a foreign language text file and an English (or primary language) file. The text may be presented to a student in English and the equivalent or comparable foreign language text simultaneously presented with audio content. By clicking on segments of the foreign language text, a student may hear the actual spoken word corresponding to the foreign language text. In an automated mode, the foreign language spoken word may be output to the student, together with scrolling or sfreaming text simultaneously provided in synchronism. The text and audio files may become part of a larger text document or file, such as an entire curriculum or larger book with associated audio. While audio and linked text files may be provided to students on a CD-ROM, under an alternative embodiment, students may log into a system and receive such files via the Internet or other network. Aspects of the invention may be applied to interactive text and audio technology in distance learning or fraining via the Internet. As a result, the system can be used to create interactive lectures, classroom discussions, etc., that are accessible by students via the Internet, both in real time and as on-demand lectures archived by the system. Students may receive not only the audio and associated text but also video with respect to the event. Furthermore, static displays of information, such as Powerpoint presentations, diagrams from an electronic whiteboard, slides and other images can be included and linked with the event. The audio, text and any additional content, such as video or static images, may all be stored in not only the database 110 but also on recordable media such as CD- ROMs. Students or users may search for relevant issues and subjects by performing text searches in the text files, or searches of the audio files (as noted above).

The above description of the system and method for integrated delivery of media and synchronized transcription is illustrative, and variations in configuration and implementation will occur to persons skilled in the art. For instance, while the processing facility 140 has been illustrated in terms of a single remote site handling all the sfreaming media content distributed by the distribution server 108, transcription and other processing services could be distributed to a variety of locations having different computing, communications and other resources. Moreover, the finishing and synching of the integrated text and media stream could be executed within the processing facilityl40 when provisioned with sufficient processing, storage and other resources. Also, an event site could have one or more audio servers 104 or other front-end media capture facilities to process source media. Moreover, multiple events could be processed at the same time, to generate a combined output stream.

Various communication channels may be used, such as a LAN, WAN, or a point-to-point dial-up connection, instead of the Internet. The server system may comprise any combination of hardware or software that can support these concepts. In particular, a web server may actually include multiple computers. A client system may comprise any combination of hardware and software that interacts with the server system. The client systems may include television-based systems, Internet appliances and various other consumer products through which auctions may be conducted, such as wireless computers (palm-top, wearable, mobile phones, etc.).

Conclusion

As generally described in detail above, a system for capturing audio, video or other media from events or recordings combines digitized delivery of the media with accompanying high-accuracy textual or character streams, synchronized with the content. Live governmental, corporate and other group events may be captured using microphones, video cameras and other equipment, whose output is digitized and sent to a transcription facility containing speech recognition workstations. Human transcription agents may assist in the initial conversion to text data, and human editorial agents may further review the audio and textual streams contemporaneously, to make corrections, add highlights, identify foreign phrases and otherwise increase the quality of the transcription service. Subscribers to the service may access a web site or other portal to view the media and text in a real time or near real time to the original event, and access archival versions of other events for research, editing and other purposes. Subscribers may configure their accounts to deliver the sfreaming content in different ways, including full content delivery and background execution that triggers on keywords for pop-up text, audio, video or other delivery of important portions in real time. Subscribers may set up their accounts to stream different events at different dates and times, using different keywords and other settings. Various live media events may be archived for later transcription. All archived content files are indexed in a database for rapid or accurate retrieval by subscribers, who may order transcriptions of such archived files if no transcription had been performed earlier. Likewise, all transcription files associated with content files are indexed to permit efficient access and retrieval. Subscribers may construct a query to search the database, and receive as query results, a list of one or more files. Subscribers may access portions of such files to view franscriptions associated with the queiy, and listen to associated audio or other content corresponding to and synchronized with the transcription. The term "transcription", as generally used herein, refers to converting any aural content into a corresponding character file. The character file is generated from and relates to the original aural file. In general, "characters" refers not only to text (such as ASCII characters), but also pictographs (such as pictures representing a word or idea, hieroglyph, etc.), ideograms (e.g., a character or symbol representing an idea or thing without expressing a particular word or phrase for it) or the like. The term "characters", as generally used herein, includes any symbols, ciphers, symbolizations, phonograms, logograms, and the like. The term "language" generally refers to any organized information communication system employing characters or series of characters, both human and machine- readable characters. Machine readable characters include computer codes such as ASCII, Unicode, and the like, computer languages and scripts, as well as computer readable symbols such as bar codes. The term "content" refers to any information, while "media" refers to any human generated content, including the human generated content described herein.

The system may be modified to receive audio music files or streams, and convert the music into notes and other musical notation reflecting the music. The system may generate separate audio signals corresponding to each instrument in the music, measure frequency and duration of each note, and compose a representation of a musical score associated with the music.

Of course, any oral audio component may be transcribed to create a corresponding (and synchronized) character file that may be in any language (not necessarily the language of the speaker). The system may accept as input a group conversation between several individuals, each speaking a different language. The system may create separate transcription files for each speaker, so that separate transcription files in each of the different languages is created for each of the speakers.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising" and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words "herein," "above," and words of similar import, when used in this application, shall refer to this application as a whole, and not to any particular portions of this application.

The above description of illustrated embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. The teachings of the invention provided herein can be applied to other media delivery systems, not necessarily for the audio and text delivery system described above. The elements and acts of the various embodiments described above can be combined to provide further embodiments.

Any above references and U.S. patents and applications are incorporated herein by reference. Aspects of the invention can be modified, if necessary, to employ the systems, functions and concepts of the various patents and applications described above to provide yet further embodiments of the invention.

These and other changes can be made to the invention in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims, but should be construed to include all media delivery systems that operate under the claims to provide a method for providing link character sfreams with associated aural content. Accordingly, the invention is not limited by the disclosure, but instead the scope of the invention is to be determined entirely by the claims.

While certain aspects of the invention are presented below in certain claim forms, the inventors contemplate the various aspects of the invention in any number of claim forms. For example, while only one aspect of the invention is recited as embodied in a computer-readable medium, other aspects may likewise be embodied in a computer-readable medium. Accordingly, the inventors reserve the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention.

Claims

CLAIMSWe claim:

1. A method for delivering live media content to two or more client computers over a computer network, comprising: receiving an encoded media signal representing a live event, wherein the live event includes at least a language portion, and wherein the encoded media signal is encoded for transmission over the computer network; converting at least the language portion of the received encoded media signal into a character signal that represents the language portion; synchronizing the created character signal with the encoded media signal; and providing the synchronized character signal and encoded media signal to the client computers over the computer network after a delay with respect to when the live event transpired, wherein the delay is of a substantially short duration so as to approximate live fransmission of the synchronized character signal and encoded media signal to the client computers.

2. The method of claim 1 wherein the language portion comprises human speech, wherein the character signal comprises a text sfream, and wherein converting the language portion comprises automatically converting the human speech language portion to corresponding text characters in the text stream.

3. The method of claim 1 wherein converting the language portion comprises converting the language portion to corresponding raw text characters under direction of a transcription agent and editing the raw text characters under direction of an editorial agent to resolve subtleties and foreign language phrases, to make judgments about grammar and semantics, or to add emphasis.

4. The method of claim 1 wherein the computer network is the Internet, wherein the method further comprises providing at least a first web page to the client clients to permit ordering of the encoded media signal of the live event and the synchronized character signal, and a second web page to display the character signal and replay the encoded media signal.

5. The method of claim 1 wherein the language portion comprises human speech from two or more speakers, wherein the character signal comprises a text stream, and wherein converting the language portion comprises converting the human speech language portion to corresponding text characters in the text sfream for each of the two or more speakers.

6. The method of claim 1 wherein the encoded media signal is a multimedia media signal.

7. The method of claim 1, further comprising: receiving a search queiy from one of the client computers; searching at least the synchronized character signal based on the search query; and providing a notification where the search query is found in the synchronized character signal, wherein the notification comprises displaying a notification window on the one client computer, sending an email message to the one client computer, sending a page message over a paging network or placing a prerecorded notification call to a telephone.

8. The method of claim 1, further comprising: receiving an order to receive the synchronized character signal and encoded media signal from one of the client computers associated with a subscriber, wherein the order indicates whether to receive the synchronized character signal and encoded media signal live or delayed; deducting a first amount from an annual subscription amount paid by the subscriber if the order indicates live reception; and deducting a second amount from the annual subscription amount if the order indicates delayed reception.

9. The method of claim 1 wherein the encoded media signal comprises music, wherein the language portion comprises human singing, and wherein the character signal comprises a text stream of song lyrics.

10. The method of claim 1 wherein the encoded media signal comprises a dramatic work, wherein the language portion comprises human speech, and wherein the character signal comprises a text stream.

11. The method of claim 1 wherein the encoded media signal is a lecture by a teacher, and wherein the method further comprises fransntitting to the client computers static images produced from an electronic whiteboard, slides or computer generated images associated with the lecture.

12. The method of claim 1 wherein the language portion comprises human speech, including speech from a notable person, wherein the character signal comprises a text sfream, and wherein converting the language portion comprises: training a speech-to-text program to the speech of the notable person, and automatically converting the human speech language portion to conesponding text characters in the text stream.

13. A system for delivering encoded content to client computers, comprising: means for receiving an encoded media signal representing either a live event or pre-recorded content, wherein the live event or pre-recorded content includes at least an aural portion; means for transcribing at least the aural portion of the received encoded media signal into a character signal; means, coupled to the means for receiving and transcribing, for synchronizing the created character signal with the encoded media signal; and means for storing the synchronized character signal and encoded media signal to permit searching and retrieval of a portion of the character signal in response to a query, wherein the retrieved portion contams a portion of the character signal and a corresponding synchronized portion of the encoded media signal.

14. The system of claim 13, further comprising means for encoding an audio signal produced from the live event to create the encoded media signal.

15. The system of claim 13 wherein the means for transcribing comprises voice recognition means for creating a raw transcription stream, and editing means for editing the raw transcription stream to create the character signal.

16. The system of claim 13, further comprising means interface means for providing display descriptions to client computers, wherein the display descriptions present the synchronized character signal and encoded media signal to a user.

17. A computer-readable medium whose contents cause a computer system to provide content to a client computer, comprising: receiving a search query having a character string; searching a stored character file for the character string, wherein the character file has a synchronous association with an encoded media file; and retrieving a portion of the character file that contains the character string for display by the client computer, and retrieving a corresponding synchronized portion of the encoded media file for replay by the client computer, wherein the retrieved portion of the character file and corresponding synchronized portion of the encoded media file are not at a beginning or end of the character file and encoded media file, respectively.

18. The computer-readable medium of claim 17 wherein the stored character file and encoded media file are stored on an optically readable disk.

19. The computer-readable medium of claim 17 wherein the computer-readable medium is a logical node in a computer network receiving the contents.

20. The computer-readable medium of claim 17 wherein the computer-readable medium is a computer-readable disk.

21. The computer-readable medium of claim 17 wherein the computer-readable medium is a data fransmission medium fransnfrtting a generated data signal containing the contents.

22. The computer-readable medium of claim 17 wherein the computer-readable medium is a memoiy of a computer system.

23. The computer-readable medium of claim 17 wherein the encoded media file is music, and the character file comprises lyrics.

24. The computer-readable medium of claim 17 wherein the encoded media file comprises an audio media event and the character file comprises a text transcription thereof.

25. The computer-readable medium of claim 17 wherein the encoded media file comprises a foreign language audio file and the character file comprises a text transcription thereof in English and in the foreign language.

26. The computer-readable medium of claim 17 wherein the encoded media file comprises a music file and the character file comprises a musical score thereof.

27. A method for providing encoded media and associated text from a server computer to a client computer over a public computer network, comprising: receiving an encoded media sfream representing a media event having a language component; parsing and storing the encoded media stream into a series of consecutive files, wherein each file has a size substantially smaller than a size of the encoded media sfream representing the media event; retrievmg and converting the language of each of the series of consecutive files into an associated text sfream; substantially simultaneously encoding the text stream for transmission and retrieving the series of consecutive files; and transmitting to the client computer over the public computer network the encoded text stream and the retrieved series of consecutive files.

28. The method of claim 27 wherein the media event is a live media event, wherein parsing and storing comprises constructing a file list that identifies each of the consecutive files and an order of the files, and wherein retrieving and converting comprises associating timestamps with separate lines of text in the text stream, wherein each timestamp is related to an offset from a beginning of the encoded media stream representing the media event.

29. A system for generating synchronized media and textual streams, comprising: a first interface to at least one sfreaming media source; a distribution server, communicating with the first interface, the distribution server storing the sfreaming media; and a second interface to a franscription engine, the second interface receiving the sfreaming media and outputting a textual sfream from the transcription engine corresponding to the sfreaming media, the textual sfream being synchronized with the sfreaming media for output to a recipient.

30. The system of claim 29 wherein the at least one sfreaming media source comprises an audio server outputting digitized audio to the first interface.

31. The system of claim 29 wherein the distribution server comprises a database for storing the sfreaming media and the textual sfream, and the distribution server synchronizes the textual stream and the streaming media for storage in the database.

32. The system of claim 29 wherein the transcription engine comprises a voice recognition server executing a speech recognition module outputting a raw franscription sfream, the voice recognition server presenting the raw franscription sfream to a transcription agent to monitor the generation of the raw franscription sfream.

33. The system of claim 29 wherein the transcription engine comprises an editorial workstation, the editorial workstation receives a raw franscription sfream and presents the raw franscription stream to an editorial agent to edit and output as the textual sfream.

34. The system of claim 29, further comprising a third interface, communicating with the distribution server, the third interface outputting the sfreaming media and textual sfream to a client workstation.

35. The system of claim 29, further comprising a third interface communicating with the distribution server, wherein the third interface comprises an Internet connection.

36. The system of claim 29, further comprising a client workstation comprising an administrative module, the aα ninistrative module managing the delivery of the sfreaming media and the textual sfream.

37. The system of claim 29, further comprising a client workstation comprising an administrative module, wherein the adminisfrative module comprises delivery configurations, the delivery configurations comprising at least one of full delivery of the sfreaming media and the textual stream, background delivery of the sfreaming media and the textual stream, scheduling of the delivery of the sfreaming media and the textual stream, delivery of an alert based upon detection of a keyword in the textual sfream, and delivery of the sfreaming media and the textual sfream based upon detection of a keyword in the textual sfream.

38. The system of claim 29 wherein the at least one sfreaming media source comprises a video server outputting digitized video to the first interface.

39. The system of claim 29 wherein the textual sfream comprises textual output in a plurality of languages.

40. The system of claim 29 wherein the textual stream comprises textual output coπesponding to a plurality of speakers.

41. The system of claim 29 wherein the textual stream comprises a plurality of channels, each coπesponding to one of a plurality of speakers.

42. A method of generating synchronized media and textual streams, comprising: a) receiving sfreaming media from at least one sfreaming media source; b) storing the sfreaming media in a distribution server; c) outputting the sfreaming media to a franscription engine; d) generating a textual sfream coπesponding to the streaming media in the franscription engine; and e) synchronizing the streaming media with the textual sfream for output to a recipient.

43. The method of claim 42 wherein the at least one sfreaming media source comprises an audio server outputting digitized audio.

44. The method of claim 42 wherein the distribution server comprises a database for storing the sfreaming media and textual sfream and wherein synchronizing the sfreaming media and the textual sfream is performed by the distribution server for storage in the database.

45. The method of claim 42, further comprising executing a speech recognition module and outputting a raw transcription sfream, and presenting the raw franscription steam to a franscription agent to monitor the generation of the raw franscription sfream.

46. The method of claim 42, further comprising receiving the raw transcription sfream and presenting the raw transcription stream to an editorial agent to edit and output as the textual stream.

47. A computer-readable medium containing a data structure having information for display and to provide content to a user of a client computer coupled to a computer network, the information comprising: a first display description defining a media order electronic page for ordering over the computer network encoded media and synchronized character signal generated from the encoded media, wherein the first display description includes at least one user-selectable field to indicate whether to receive the synchronized character signal and encoded media live or delayed; a second display description defining a query electronic page having at least one user input field for inputting at least one search term to be identified in the synchronized character signal or encoded media; and a third display description defining a presentation electronic page for displaying the synchronized character signal and providing encoded media to the user over the client computer.

48. A system for delivering audio and text media to client computers over a public computer network, comprising: at least one audio server computer configured to receive live audio media and encode the live audio into an encoded audio sfream; at least one processing computer system coupled to receive the encoded audio sfream and configured to convert the encoded audio sfream into a coπesponding franscribed textual sfream; at least one media distribution server computer coupled to a database and coupled to receive the encoded audio sfream from the audio server computer and the coπesponding transcribed textual sfream from the media distribution server computer, wherein the media distribution server computer is configured to synchronize and store in a database file the encoded audio sfream with the coπesponding transcribed textual stream, and to provide the synchronized encoded audio and franscribed textual sfreams to at least one of the client computers over the public computer network after a delay with respect to when the live audio media was generated, wherein the delay is of a substantially short duration so as to approximate live fransmission of the audio media to the client computer; and wherein the media distribution server computer is further configured to: receive a search query from the client computer; and to search at least the encoded audio and transcribed textual stream, and provide a notification where the search query is found in the textual sfream, or to search the stored database file of the encoded audio and franscribed textual sfreams in the database and to provide a portion of the encoded audio and transcribed textual sfreams from the database based on the search query.

49. The system of claim 48 wherein the public computer network is the World Wide Web, wherein the client computer includes a web browser, wherein the audio server computer is further configured to digitize the live audio media before encoding the digitized live audio media into the encoded audio sfream; wherein the at least one processing computer system comprises at least one transcription computer and at least one editorial computer, wherein the franscription computer is coupled to receive the encoded audio stream from the media distribution server computer and includes a speech-to-text converter to convert the encoded audio sfream into a coπesponding raw transcribed textual sfream under direction of a transcription agent, and wherein the editorial computer is coupled to receive the raw franscribed text from the franscription computer and includes a text editing module to edit the raw transcribed textual sfream to produce the corresponding transcribed textual stream under direction of an editorial agent; and wherein the media distribution server computer is coupled to receive the coπesponding transcribed textual sfream from the editorial computer, and includes a web server to provide two or more web pages to the client computer to permit subscribers to request live or archived audio media and associated text.