WO2004036543A2 - Methods and apparatus for audio data monitoring and evaluation using speech recognition - Google Patents
Methods and apparatus for audio data monitoring and evaluation using speech recognition Download PDFInfo
- Publication number
- WO2004036543A2 WO2004036543A2 PCT/US2003/033040 US0333040W WO2004036543A2 WO 2004036543 A2 WO2004036543 A2 WO 2004036543A2 US 0333040 W US0333040 W US 0333040W WO 2004036543 A2 WO2004036543 A2 WO 2004036543A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio segment
- audio
- searching
- data
- operable
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5183—Call or contact centers with computer-telephony arrangements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/2227—Quality of service monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42221—Conversation recording systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/2218—Call detail recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5175—Call or contact centers supervision arrangements
Definitions
- the present invention relates to the field of audio data monitoring, such as the monitoring of telephone calls and, more specifically, to leveraging voice recognition technology to provide new and improved features and functionality for use in audio data monitoring.
- Such new and improved features and. functionality include user programmable rules-based quality monitoring of telephone calls, speech and data SQL integration for fast and efficient searches of audio data for spoken words, phrases, or sequences of words, the provision of speech cursors indicating the location of words or phrases in audio data, automated quality monitoring, as well as other features and functions described herein.
- Prior art telephone call monitoring typically consisted of recording telephone calls and the manual monitoring of only a select few (e.g., 5%) of the recorded calls by a call center employee or supervisor. Searching for particular words or phrases must be performed manually by listening to segments of audio recordings. Such manual call monitoring is tedious, time consuming, laborious, and costly.
- CTI Computer Telephony Integration
- CTI middleware providing a software bridge between computers and telephone systems in contact centers.
- CTI functions to bringing together computer systems and telephone systems so that their functions can be coordinated.
- Functionality made possible by core CTI technology include: Interactive Voice Response (IVR) integration, which transfers caller-entered IVR information to Customer Support Representative (CSR) desktop PCs, Screen Pop and coordinated call- data transfer between CSRs.
- IVR Interactive Voice Response
- CSR Customer Support Representative
- CTI applies computer-based intelligence to telecommunications devices, blending the functionality of computers and computer networks with the features and capabilities of sophisticated telephone systems over an intelligent data link to gain increases in CSR productivity, customer satisfaction and enterprise cost savings.
- CTI combines the functionality of programmable computing devices with the telephony network through the exchange of signaling and messaging data between the switching systems and a computer.
- CTI's principal undertaking is to integrate various call center systems and platforms, including PBXs, LANs, IVR/VRU systems, predictive dialers, the desktop PC and Internet-based applications.
- a common CTI function is the "screen pop" or "smart call handling".
- the screen pop uses telephony-supplied data typically A ⁇ I (automatic number identification), D ⁇ IS (dialed number identification service) and/or IVR-entered data to automatically populate a CSR's desktop application screen with information related to the transaction, such as a customer's profile or account information, scripts or product information.
- a ⁇ I automated number identification
- D ⁇ IS dialed number identification service
- IVR-entered data to automatically populate a CSR's desktop application screen with information related to the transaction, such as a customer's profile or account information, scripts or product information.
- Closely related to the screen pop application is an application often referred to as "coordinated call-data transfer."
- a typical scenario for this application might proceed as follows.
- a Tier 1 CSR receives a customer call.
- the Tier 1 CSR realizes that the customer will have to be transferred to a Tier 2 CSR to satisfy the customer inquiry.
- coordinated call-data transfer functionality allows the transferring CSR to send both the call and the updated screen data to the receiving CSR.
- the receiving CSR has more data and is able to more efficiently and effectively conduct the next customer interaction.
- IVR integration typically rounds out most basic CTI implementations.
- IVR integration information a customer enters into an IVR system is automatically displayed on a CSR's desktop PC when the customer elects to speak directly to a CSR.
- information collected by the IVR system can be used to trigger a screen pop.
- customers are relieved from having to repeat basic information when transferring to a live CSR.
- the customer is able to carry on with the live CSR where he or she left off with the IVR system.
- CTI functionality has four principal benefits including (i) increased CSR productivity; (ii) more competent customer service; (iii) faster access to customer information; and (iv) long distance cost savings. With CTI, CSR productivity increases significantly.
- CSRs are relieved from having to ask customers for routine information or for information the customer has already provided, either to another CSR or to another call center device. Time spent keying in database access information and waiting for resulting information is eliminated. With these process improvements, the overall call processing time is reduced, allowing CSRs to process more calls more efficiently in the course of a typical day.
- screen pop functionality alone, the typical call center should be able to realize a 10 to 15 second reduction in average call processing times.
- the screen pop functionality offers a significant savings to a contact center when implementing "core" CTI functionality. When there are frequent transfers of customer's calls, either from an IVR system or between CSRs, the reduction in average call processing times can be even greater.
- CTI Another benefit of CTI is the ability to deliver more competent customer service.
- customers are recognized by name as soon as they reach a live CSR.
- customers are relieved from having to repeat routine information every time they are transferred to a different call center location.
- CTI is transparent, as it provides the customer with a seamless interaction, and giving the customer a favorable impression of the organization as a competent, customer-focused operation.
- CTI further supports upselling and cross-selling existing customers. Having fast access to customer information is a critical requirement to being able to upsell and cross- sell effectively. By allowing CSRs to access customer information as they make voice contact with the customer, CSRs are better able to plan up-sale and cross-sale proposals.
- CTI An additional benefit of CTI is reduced long distance charges per call. CTI allows the call center to process calls faster, the technology can result in considerable reductions of long distance charges.
- a typical call or Contact Center 100 may include a switch 102 such as an Automatic Call Distributor (ACD) and or Private Branch Exchange (PBX) connected to a communications network, such as the Public Switched Telephone Network (PSTN) for receiving calls from and making calls to customer telephones 101.
- Switch 102 is connected to and cooperates with Interactive Voice Response system 103 for automatically handling calls (e.g., playing messages to and obtaining information from callers, etc.) and with CTI Server 104 for routing calls to CSRs.
- CTI Server 104 is also connected to Switch 102 for receiving call information such as DNIS and ANI, and to CSR Workstation 105 for providing information to a CSR.
- CSR Workstation 105 may connect to Database 106 directly and/or receive information form Database 106 through CTI Server 104 when an appropriate connection (not shown) is available.
- a CSR has access both to CSR Workstation 105 and to CSR Telephone 107 for conversing with customers and retrieving data from and inputting data into Database 106 and performing other call handling actions using CTI Server 104, IVR 103 and Switch 102.
- a typical call processing session may proceed as follows.
- a customer call from telephone 101 comes into ACD/PBX switch 102.
- the call gets routed to IVR 103.
- Switch 102 sends ANI, DNIS to CTI Server 104.
- IVR 103 requests call data from CTI Server 104. 3a.) The call data is sent to IVR 103 from CTI Server 104.
- IVR 103 sends call data to the CTI Server 104.
- IVR 103 transfers the call back to Switch 102.
- CSR Workstation 105 requests data and the CTI Server 104 sends it. 7.) Data sent to CSR Workstation 105 triggers a call to Customer Database 106.
- the data from the caller data triggers a call to the Customer Database 106 to populate the CSR Screen 105 with the customer data as the voice arrives.
- One of the tasks in running a call or Contact Center is to ensure that the system is properly operating and that each CSR is trained and efficiently handles interactions with customers. Such quality assurance tasks are often supported by call monitoring systems and methods.
- a method of analyzing audio data includes steps of processing an audio segment into a format suitable for rapid searching; determining an appropriate set of rales to apply to the audio segment; and searching the audio segment in accordance with the rules.
- the method may include a step of referencing the audio segment wherein the audio segment has been previously stored in an electronic media or a step of recording the audio segment.
- the step of processing may include processing the audio segment into a format suitable for rapid phonetic searching.
- the step of processing may include a step of identifying symbols corresponding to discrete portions of the audio segment, which symbols may represent respective phonemes of a set of phonemes characteristic of speech.
- the' step of searching may include the steps of: attempting to find a match within the audio segment of a target phrase; and in response, determining whether the target phrase is present within the audio segment, at or above a specified confidence level.
- a step of triggering an event may occur in response to the step of determining.
- a step of triggering an event as a result of the searching step resulting in matching a given phrase at or above a specified confidence level and/or in not finding a match for a given phrase at or above a specified confidence level may result in incrementing a statistical parameter.
- searching may include a combination present (or absent) in a specified order and/or temporal relationship (with respect to each other and/or within the audio segment) within the audio segment.
- a method may further include analyzing CTI data associated with the audio segment; and providing an indication of satisfaction of a criteria in response to the steps of searching and analyzing.
- the CTI data may include (i) called number (DNIS), (ii) calling number (ANI) and or (iii) Agent Id (a unique identifier of the agent that handled the call)
- the method may further include a step of performing order validation. Order validation may include comparing a parameter of an order associated with the audio segment with a content of the audio segment resulting from the searching step.
- the step of searching may include a step of searching for a target phrase, the method further comprising a step of performing order validation including determining whether an order associated with the audio segment is consistent with a result of the step of searching for the target phrase.
- a step of entering data for the order may also be included wherein the step of performing order validation includes validating whether the data is reflected within the audio segment.
- a method of processing audio data may include the steps of importing call data; selectively, responsive to the call data, analyzing an audio segment associated with the call data, the step of analyzing including processing the audio segment into a format suitable for rapid searching; determining an appropriate set of rales to apply to the audio segment; and searching the audio segment in accordance with the rules.
- a system for analyzing audio data may include an audio processor operable to process an audio segment into a format suitable for rapid searching; logic operable to determine an appropriate set of rules to apply to the audio segment; and a search engine operable to search the audio segment in accordance with the rules.
- the system may further include an electronic media having stored therein the audio segment and circuitry for retrieving the audio segment from the memory and providing the audio segment to the audio processor.
- the system may further include an audio recorder operable to store the audio segment.
- the audio processor maybe operable to process the audio segment into a format suitable for rapid phonetic searching and the search engine is operable to search the audio segment for phonetic information.
- the search engine may be operable to identify symbols corresponding to discrete portions of the audio segment.
- the symbols may represent respective phonemes of a set of phonemes characteristic of speech.
- Figure 1 is a diagram of a Contact Center
- FIG. 2 is a block diagram of system for processing, storing and searching speech
- Figure 3 is a block diagram of a computer integrated telephony (CTI) system incorporating audio processing according to an embodiment of the invention
- Figure 4 is a dataflow diagram of the embodiment depicted in Figure 3;
- Figure 5 is a screen shot of a workstation display depicting an application manager used to access CTI system components including systems and functionalities according to embodiments of the invention
- Figure 6 is a screen shot of a workstation display depicting a speech browser main display used to browse and filter calls, playback audio, search for and retrieve audio associated with calls, and implement speech-processing of audio;
- Figure 7 is a screen shot of a workstation display depicting a system control or commander feature used to start and stop system operations and to provide system status information;
- Figure 8 is a screen shot of a workstation display depicting a speech resources feature used to display system utilization information
- Figure 9 is a screen shot of a workstation display depicting a speech mining browser used to implement simplified searching of audio segments
- Figure 10 is a screen shot of a workstation display depicting a speech mining browser used to implement advanced searching of audio segments;
- Figure 11 is a screen shot of a workstation display depicting a rules implemented by a rales engine defining action to be taken upon receipt of a call
- Figure 12 is a screen shot of a workstation display depicting speech processor functions used for the batch processing of audio files;
- Figure 13 is a screen shot of a workstation display depicting a progress indicator showing batch processing of audio files
- Figure 14 is a screen shot of a workstation display depicting a speech statistics setup feature used to configure real-time graphic display of system statistics including statistics indicating the occurrence and or non-occurrence of particular target phrases in associated audio segments and/or associated with selected categories of calls;
- Figure 15 is a screen shot of a workstation display depicting a sample graph of system statistics including the counts of specified target phrases identified at or associated with particular agent workstations;
- Figure 16 is a screen shot of a workstation display depicting a speech reporting feature used to create selected reports
- Figure 17 is a screen shot of a workstation display depicting a sample report generated by the system including speech-related statistics
- Figure 18 is a block diagram of a contact center according to an embodiment of the invention.
- Figure 19 is a flow diagram depicting a method of collecting, processing, organizing, and searching speech segments according to an embodiment of the invention.
- an automated call monitoring system capable of automatically analyzing all telephone calls as they are recorded, which is also capable of reviewing and monitoring previously recorded calls. It would be further advantageous to be able to easily search for spoken words, phrases or word sequences in the recorded audio using speech recognition technology.
- a contact center In a modern contact center, there is more to voice logging than just recording audio. There are many reasons why a contact center has a voice, or call, logger: liability, training, and quality are some examples. To be useful, logged conversations must be located by some reasonable criteria in a timely manner.
- a contact center manager may receive a call from a caller who may be dissatisfied with service provided by a CSR during a recent call.
- the contact center manager may ask for the caller's name, time and date of the call, and the name of the agent they spoke to.
- the task of locating the call recording in any voice logger if daunting.
- it may be approximately known when the caller called (or at least when they think they called, given time zone differences) it may be difficult to identify the CSR handling the call.
- the manager must search for the recording, knowing that it will take hours to locate the right one, and that the correct recording may never be found.
- a voice logger is more than a simple tape recorder, with sufficient data recordings that can be quickly located and played back.
- the voice logger may be integrated into a contact center's infrastructure, preferably to the ACD/PBX switch.
- the voice logger may be integrated with the IVR and CSR workstation software.
- One arrangement to integrate a call logger is to merge data from the billing output of the switch (SMDR) into the logged call records.
- SMDR billing output of the switch
- the SMDR (The term SMDR is used generically to encompass all billing outputs) output of a switch contains the time / day of the call, the phone number of the party in the PSTN, the extension of the party on the switch, and the involved trunk ID.
- SMDR integration is its relative ease of implementation and low cost.
- Many commercially available switches include a SMDR port by default.
- the SMDR port is usually an RS232 port that outputs billing records at the completion of calls.
- the SMDR port may already be in use by the billing system such that, to share the data, an RS232 splitter device may be employed.
- CSR ID may not be included as an output field such that, in a free seating environment, it may be difficult to directly identify and locate calls for a particular CSR. Further, recorded call segments that span conferences and transfers may be difficult to accurately be accounted for. Another problem sometimes encountered is caused by systems using some form of proprietary fixed data format. In such cases, it may be difficult to obtain assistance from the switch manufacturers to update its SMDR format to accommodate advanced voice logging features.
- CTI Computer Telephony Integration
- ACD/PBX Computer Telephony Integration
- CTI can include the use of CTI middleware.
- Commercially available ACD/PBX switches typically include such CTI capability.
- An advantage to the use of CTI is that almost any available data can be collected and stored with the recording. In its simplest form DNIS, ANI/CLID, collected digits, and agent ID can be obtained and stored. Additionally, more complicated integrations can be performed.
- CSR entered data, data from a CRM system, and data from an INR can be collected and attached to recordings. Contacts that span multiple agents can be retrieved together.
- PBX/ACD features such as free seating are easily accommodated. As new sources of data become available, they can be integrated into the CTI solution.
- a CTI based system is not dependent on the clock settings of the switch.
- the CTI system receives the event messages in realtime and records the data in the call logger as the data becomes available. If there is no current CTI solution in a center, many of the other benefits of CTI (such as screen pop and cradle to grave reporting) can be realized at the same time. That is, the installed system becomes a base upon which other advanced contact center features can be built and provide for more efficient operations.
- a supervisor simply asks the caller for their account number (or for any other data used to uniquely identify callers) and executes a search in the call logging system. The supervisor is quickly given access to the call recording and can evaluate and handle the situation.
- embodiments of the present invention include audio data monitoring using speech recognition technology and business rales combined with unrestricted, natural speech recognition to monitor conversations in a customer interaction environment, literally transforming the spoken word to a retrievable data form.
- VIP VorTecs Integration Platform
- embodiments of the present invention enhance quality monitoring by effectively evaluating conversations and initiating actionable events while observing for script adherence, compliance and/or order validation.
- SESIS, Inc. is the successor in interest to VorTecs, Inc., and provided improved systems, Sertify providing a feature rich embodiment of the Spotlt! system by VorTecs, and Sertify-Mining providing enhanced features to the Minelt! product.
- Embodiments of the present invention use programming language to instruct a computer to search audio data, such as a recorded telephone conversation, and take certain actions as a result of detecting or not detecting desired spoken words, phrases, or sequences of words.
- a command set may be used to enable the search that includes, but is not limited to Said, SaidNext, SaidPrev, and Search.
- a set of objects may be used for manipulating search results, including but not limited to SpeechResults (an enumerator), and SpeechResult (physical results of search).
- the embodiments of the present invention can enable searches for sequences of spoken words, rather than just words or phrases.
- the present invention can either locate a particular word (e.g., Said ⁇ supervisor>), a phrase (e.g., Said ⁇ talk to your supervisor>), or a sequence (e.g., Said ⁇ talk>; SaidNext ⁇ supervisor>; SaidNext ⁇ complaint>), where the words in the sequence are not necessarily adjacent.
- a virtual index may also be provided that points to time offsets within a voice communication. For example, when searching for a sequence of words, a speech cursor may be automatically advanced to the time offset when a word or phrase in the sequence is searched for and located. Subsequent searches for subsequent words within the sequence can then continue, leaving off from the location of the previous search as indicated by the speech cursor. Speech cursors may also be used to place a constraint on the portion of the audio data that is to be searched. For example, a speech cursor may be advanced to 15 seconds before the end of a call to monitor whether the agent says "thank you" at the end of the call.
- Embodiments of the present invention significantly decrease the amount of manual involvement that is required for monitoring agent activity. It provides a facility to actively monitor for script adherence by scoring key performance indicators, ensures compliance by identifying required statements are made in the context of the conversation and through order validation by lifting entered data from an order, creating a variable rale and comparing the entered data to a structured confirmation. Of equal importance is the ability to identify required words or phrases that were omitted in an interaction with a customer. Flexible rale implementation provides the ability to define, create, track, act on, and report monitored results.
- Embodiments of the present invention examine both sides of every call, and using customer-defined business rales, reduces speech to data in a fraction of the time it takes the actual conversation to occur and combines it with traditional data forms to administer monitoring sessions by scoring agents, determining compliance and identifying the most important calls for further examination. Performance statistics may be delivered to the agent desktop, which provides near real time self evaluation and motivation.
- call center managers can electronically assess agent script adherence, determine regulatory compliance, perform order validation and potentially eliminate third party verification costs.
- marketing information can be gathered by mining the audio data to test the effectiveness of campaigns, and evaluate product, price and promotion strategies.
- Embodiments of the present invention provide the following features and functions: • Automates the quality monitoring process;
- Embodiments of the present invention may be implemented using the following standards and technology: XML
- Embodiments of the present invention may integrate speech recognition software with audio recording equipment and CTI links.
- CTI or recording events signal the end of a recording
- the system executes business rules to determine if the contact should be monitored.
- the system sends the audio into a queue to be processed by call center employees.
- the system executes business rales that analyze the recorded speech.
- the business rales enable searches for words or phrases, and take actions upon locating (or not locating) the words or phrases, such as collecting statistics, displaying alerts, and generating reports.
- the business rales are flexible and customizable, and support if/then/else handling, such as Microsoft'sTM VBA.
- Embodiments of the present invention are particularly applicable to financial services markets, outsourcers, insurance carriers, health services, correctional facilities, and any other market segments where telephone call monitoring is applicable.
- the embodiments of the present invention may be modified to provide the following applications: compliance assurance (e.g., with a script or rales), order validation (e.g., to assure that a telephone order was properly entered into a computer system), marketing (e.g., gathering of customer data and opinions), quality control, security, evaluation, service level guarantees (e.g., to check whether an agent/operator says "thank you", "have a nice day", etc.), training, rewards and incentives, as well as other applications.
- compliance assurance e.g., with a script or rales
- order validation e.g., to assure that a telephone order was properly entered into a computer system
- marketing e.g., gathering of customer data and opinions
- quality control e.g., security, evaluation, service level guarantees (e.g., to check whether an agent
- Embodiments of the present invention may be incorporated into and invoked as part of a CTI system.
- An embodiment of the present invention for the retrieval of audio data is exemplified by a product of VorTecs, Inc. known as "Spot It! Spot It! may be used in connection with VorTecs, Inc.'s Mine It! Product, that latter incorporating features of embodiments of the invention which is the subject of the above-referenced concurrently filed application.
- SER Solutions, Inc. the successor in interest to VorTecs, Inc. provides improved systems including Sertify, a feature rich embodiment of Spotlt! and Sertify- Mining providing enhanced features to that of the Minelt! product.
- a block diagram of Minelt! Is present in Figure 2.
- Sertify is a rales based call monitoring application embodying aspects and features of the present invention, being designed to be compatible with customer interaction infrastructures that listens to calls and automatically executes actionable events based on the result. Sertify augments existing recording systems to provide a greater level of automation, enhanced operational flexibility, and a comprehensive electronic analysis of customer contacts including spoken word.
- a system configuration is shown in Figure 3 including a Server 301 connected to and receiving data from Data Sources 302, Voice Information Processor (NIP) 305, and Audio Source 307.
- PBX 304 is connected to NIP 305 which, in turn, is connected to TagIT! 306 which, supplies its output to Audio Source 307.
- Server 301 includes both Core and Application Services,
- the Core Services include Configuration Manager 308, Node Manager 309 and State Manager 310.
- the Application Services include Voice Server 311, Speech Queue 312, Speech Worker 313, Rules Engine 314, Xml Database 315, and Report Server 316.
- a dataflow for processing audio data is depicted in Figure 4.
- audio from Audio Source 401 and VLP 402 are supplied to Voice Server 403.
- the combined audio files from Voice Server 403 are made available to Rules Engine 404 which applies one or more Rules 405 to selectively provide appropriate audio segments to Xml Database 406 and Speech Queue 407.
- Xml Database 406 associates the audio segments with Call Data, CTI Data and Customer 410.
- Speech Queue 407 makes the audio segments available to Speech Worker(s) 408 which processes the audio segments to provide Searchable Audio Format 409.
- the searchable format may convert the audio into a series of symbols, such as phonemes, that represent the speech and can be searched and otherwise handled as discrete data.
- Figures 5 - 17 depict screen shots of a speech processing interface according to an embodiment of the present invention.
- an initial screen of an application manager provides a single, integrated interface for accessing all components of a suite of programs including those providing for the capture of audio and data and mining of the captured data.
- Figure 6 depicts a speech browser providing an interface for (i) browsing calls, (ii) filtering calls, (iii) audio playback and queuing to exact moments when phrases are detected, (iv) speech mining, and (v) speech-processor (batch processing). By selecting an item from any one viewport, all other may be configured to automatically filter their results to match the selection.
- a speech resources component depicts in Figure 8 displays current system utilization. It may be used to observe the rate of requests and how fast the system is keeping up with the requests, together with other system information.
- the speech mining interface depicted in Figure 9 can be invoked from the Speech Browser toolbar.
- the speech mining interface includes a Simple ( Figure 9) and Advanced ( Figure 10) dialog for selecting the records of phrases that are to be located.
- a speech- query and database-query can be performed together and the unified result presented to a user in the main Alerts, Call History, and Speech viewports. The audio can then be navigated in the same way that regular historical data can be navigated.
- Figure 10 depicts the advance tab of the speech mining interface allowing users to build more complex queries against their data.
- the advanced tab allow users to create SQL and speech-queries that are integrated into a single query.
- Definition of rules is supported by the interface depicts in Figure 11.
- the rales that the rales engine maintains determine what actions are to be taken when a call is presented to the system.
- two important functions have been implemented: StartCall() and Speech().
- the StartCall() rule determines if a call should be monitored by the system.
- the Speech() rules determined what actions to take when a piece of audio has been processed by the system and is ready to be searched. In this case, the rale displays a warning each time the user mentions the phrase "application”, "manager”, “engineer”, or "tabby cat".
- a dialog displayed upon start of the speech processor is depicted in Figure 12.
- the speech processor is a feature of the speech browser that is used for monitoring calls that have not yet been processed by the system. Normally, calls are automatically processed by the system as they take place. This feature allows users to process call that were purposely not processed automatically or to process old call that existed prior to system availability.
- the speech processor will process the set of calls that are currently being displayed in the speech browser.
- a typical use of the system is to first use the speech mining feature to constrain the calls to the one that have been selected for processing, and the invoke the speech processor for the calls that have been selected.
- Speech processor progress may be displayed by an appropriate progress indicator as depicted in Figure 13, showing calls as processed by the system. Once processed, the calls can be searched at high-speed. Processing may include conversion of the audio into a series of symbols representing the speech, e.g., phonetic information.
- Figure 14 depicts a speech statistics setup display.
- the speech statistics component is used for displaying real-time graphics of statistics that are maintained by the business-rules of the system. For instance, a statistic can be created to count the number of times that a specific phrase is heard, is missing, or to calculate statistics based on any other measures.
- a graph such as depicts in Figure 15 may displayed and updated in real-time. A user can watch as the graph dynamically changes over time to observe trends, not only with speech-related statistics, but with statistics than can be calculated by speech, CTI, system, and user-data.
- Reports may be defined using, for example, the speech reports setup screen depicted in Figure 16.
- the speech reports component is used to report on statistics that are maintained by the business-rules of the system. For instance, a statistics can be created to count the number of time that specific phrase is heard, found to be missing, or to calculate statistics based on any other measure. An example of a resulting report is shown in Figure 17. Once the speech reports are setup, such a report will be displayed. A user can examine the report to observe performance trends, not only with speech-related statistics, but with statistics that can be calculated by speech, CTI, systems and user-data.
- a speech mining interface is invoked from a speech browser tool bar within an application such as Sertify
- the interface offers a simple and advanced dialog box for implementing search criteria.
- the tool allows for analysis of words, phrases and the ability to combine audio searches with other available data collections (such as CTI data or call-related data).
- the interface accesses a database query tool that includes speech as data, as well as traditional data forms.
- the unified content is presented as an inventory of audio files that are indexed and point to the exact location in the dialogue where the target utterance resides.
- Embodiment of the present invention provide the following features and functions: • Treats voice as data;
- a contact center 1800 includes:
- Audio data monitoring (this component may be incorporated into various ones of the platforms depicted as appropriate ) - A system that uses speech processing and automated rales to analyze calls for quality monitoring purposes and order validation.
- Public Switched Network 1801 - This is the public switched telephone network that provides a high quality voice connection between a customer and a call center.
- Workforce scheduling 1802- This is a system that uses historical call data to create a staffing forecast in order to meet a specified service level for how long it will take before a call is answered.
- ACD 1803 - Automatic Call Distributor is a voice switching platform that connects to PSTN 180 land to local extensions. Call center agents log in to ACD 1803 which associates a set of skills with each agent. When calls come in for a given skill, normally determined by the dialed number, ACD 1803 will distribute the calls to the set of agents that have the appropriate skill, normally in a round robin fashion.
- Agent reports contain agent specific information such as time on the system, calls handled, avg talk time, longest talk time, etc.
- Dialer 1805- A system for predictive dialing. In predictive dialing calls are launched on behalf of a group of agents. Because not all calls may result in a live connect, the number of calls dialed is normally higher than the number of available agents. This system enhances productivity because the system only connects live answers and agents do not have to dial calls or listen to call progress such as ringing or busy signals.
- IP 1806 - This is an IP gateway so that VOIP calls can be handled by
- ACD 1803 in the same fashion as calls that arrive over PSTN 1801 r R 1807 - Interactive Voice Response (aka VRU or voice response unit) - a system that allows automated call handling.
- the system can accept touch tone input, access data, and using text to speech, speak the data to the caller.
- a common example is a bank application where you can call and get your balance.
- SR 1808- Speech Recognition is an add on to IVR 1807 that allows IVR
- CTI 1807 to accept voice input in addition to touch tone input.
- CTI 1809 - A computer telephony interface middleware server that interfaces to the proprietary CTI interface of ACD 1803 and allows
- CTI clients to receive events and exert control over contacts.
- Router 1810 -An add on application to the CTI middleware for intelligent call routing. When a call arrives, CTI data from the call is used to access information and route the call appropriately, for example putting a high value customer at the head of the queue.
- Call Recording 1811- A system that makes digital recordings of calls within the contact center.
- Agent Groups 1812 - The human employees of the contact center that handle voice calls.
- Agent Desktop 1813 A computer interface that runs programs which support the agent interactions with callers.
- Legacy Apps and Data 1814 Computer systems that contain data about the callers and the business. Used for routing decisions and to provide information to the callers.
- Email 1815 A server for processing email messages. Properly skilled agents can handle email interactions as well as voice interactions.
- WWW 1816 A web server that can host self service applications. Self service web applications can be used to offload work from contact center agents by providing information.
- Audio Processor 1817 An audio server according to an embodiment of the invention, providing for the processing of audio from Call
- Recording 1811 generation of searchable audio segments, and supporting data mining.
- a method for capturing and searching audio associated with respective calls is depicted in the flow chart of Figure 19.
- a telephone conversation occurs at step 1901.
- This conversation may be carried over the public switched telephone network, or it may be over a data network using Noice over IP technology, or it may be a hybrid where some of the voice transmission is over the PSTN and some uses VOIP.
- audio is captured from the conversation of step 1901 and a digital representation is made and stored within a computer system. If the recording is done through a digital PBX or a VOIP switch, then the capture may be accomplished through a direct data stream. Another option is an analog tap of a phone, in which case the voice is digitized as part of the process of making the recording. It is common for devices which record audio to compress the digital representation to conserve computer storage.
- Step 1903 includes functionality provided by a CTI middleware product that can connect to a digital PBX or ACD and receive information associated with a call from the digital PBX or ACD. Although not a required component, it provides additional functionality. Examples of information that can be associated with a call are the callers number (CLLD/ANI) the number dialed (DNIS) the local extension that received the call, and in the case of an ACD, the agent id of the person that handled the call.
- CLLD/ANI the callers number
- DNIS number dialed
- ACD agent id of the person that handled the call.
- Speech processing 1905 is alerted when a reference to an audio segment is added to the queue, it invokes the speech engine to pre process the audio into an intermediate format.
- the intermediate format is a representation of the audio that is optimized for rapid searching. Some representations that are suitable for rapid searches are a statistical model of the phonemes or a text representation of the contents of the audio.
- Data entry occurs at 1909.
- agents often enter data about a call into a computer system during the call.
- An example could be the length of a subscription. This is also not a required element.
- this data is also associated with an audio file and can be used to create dynamic rules at 1906.
- a process for offline rales creation is provided at 1910.
- Such rales can be static or dynamic.
- Static rales are fully defined at rale creation time and do not involve any data elements that are only known at run time.
- An example of a static rale would be "generate an alert if at any time on the call there is at least a 70% confidence that the audio contains Take your business elsewhere".
- Dynamic rales contain some template information and the rale can only be fully formed when the audio and it's associated data is known.
- a dynamic rale would be "Generate an alert if the audio does not contain "Thank you for calling my name is ⁇ agentid ⁇ how may I help you" where the name of the agent that is handling the call is substituted for ⁇ agentid ⁇ .
- a set of individual rules are then gathered into a rale set, and further logic is defined for a rale set to control when that set is applied. This logic can use any information that is known about an audio segment.
- rales may contain some phrase that is to be used to search the audio, and this phrase is entered by typing into an interface. It should be noted that other methods of entering phrases, such as speaking them into the system may be employed in the future.
- the logic processing according to 1906 is executed when an intermediate file is created. Rules determination considers the information known about the audio and determines which rales sets to apply to the audio. More than one rale set may be applied to a single instance of audio. If any of the applicable rules sets contain dynamic rales, then, at 1906, the data substitutions are made to create a rale applicable to the audio segment. There is a loop between steps 1906, 1907 and 1908. Since rales execution contains branching logic, the rales are executed in step 1906, but as part of that execution searches may be performed and corresponding actions may be initiated (step 1908). ). A speech queue is used to allow search requests (step 1907) to be performed by any available speech worker At step 1907 any searches required to support the rales execution are performed.
- Searches are performed against the intermediate file created at step 1905. If the intermediate format is a statistical model of the phonemes, then the search string must be represented as a set of probable phonemic representations of each word in the search string. If the search string was entered as text, a mapping of the text to a plurality of possible phoneme strings is performed in this step. (Note that a single text phrase may map to more than one symbolic representation.) If the intermediate file is text, then no format conversion is required. Once the intermediate file and search string are in a common format, a pattern match is performed, and a confidence is returned that the search pattern exists within the processed audio. When a search is performed for a specific phrase by a speech process, a list of result hypotheses are returned from the speech recognition engine.
- Each result in the list is given an associated "confidence score” that indicates the probability that the result is, in fact, a correct result.
- the distribution of confidence scores is typically not uniform across all search phrases and therefore a "confidence threshold" value is determined for each search phrase that indicates what the lowest acceptable confidence threshold for a search result may be in order to be considered by the system to be a correct result.
- the process of threshold determination is performed by first determining a set of calls that represent a test or training set. A specific phrase is selected, a search is performed, and the resulting list of result hypotheses will be returned. A human listener is then used to listen to the list of result hypotheses and to determine at what point in the result distribution that the confidence scores fail to be accurate. As the listener inspects search results, they are queued to the exact point in each call that the candidate result was located and allows the listener to only listen to a small portion of each call in order to determine the appropriate threshold.
- alerts and statistics may be stored in a relational database.
- the present invention provides advantageous methods and apparatus for audio data analysis and data mining using speech recognition.
- this disclosure there is shown and described only the preferred embodiments of the invention and but a few examples of its versatility. It is to be understood that the invention is capable of use in various other combinations and environments and is capable of changes or modifications within the scope of the inventive concept as expressed herein.
- embodiments of the invention have been described in connection with contact centers, CTI and other telephony based application, embodiments of the invention are equally applicable to other environments wherein speech, audio, and other real-time information may be collected, stored and processed for rapid searching.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003282940A AU2003282940C1 (en) | 2002-10-18 | 2003-10-20 | Methods and apparatus for audio data monitoring and evaluation using speech recognition |
CA2502533A CA2502533C (en) | 2002-10-18 | 2003-10-20 | Methods and apparatus for audio data monitoring and evaluation using speech recognition |
EP03774874A EP1565907A4 (en) | 2002-10-18 | 2003-10-20 | Methods and apparatus for audio data monitoring and evaluation using speech recognition |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US41973702P | 2002-10-18 | 2002-10-18 | |
US60/419,737 | 2002-10-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004036543A2 true WO2004036543A2 (en) | 2004-04-29 |
WO2004036543A3 WO2004036543A3 (en) | 2004-07-22 |
Family
ID=32108132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2003/033040 WO2004036543A2 (en) | 2002-10-18 | 2003-10-20 | Methods and apparatus for audio data monitoring and evaluation using speech recognition |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1565907A4 (en) |
AU (1) | AU2003282940C1 (en) |
CA (1) | CA2502533C (en) |
WO (1) | WO2004036543A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008109501A1 (en) | 2007-03-05 | 2008-09-12 | Calabrio, Inc. | Monitoring quality of customer service in customer/agent calls over a voip network |
US20100161604A1 (en) * | 2008-12-23 | 2010-06-24 | Nice Systems Ltd | Apparatus and method for multimedia content based manipulation |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8751240B2 (en) | 2005-05-13 | 2014-06-10 | At&T Intellectual Property Ii, L.P. | Apparatus and method for forming search engine queries based on spoken utterances |
US7752043B2 (en) | 2006-09-29 | 2010-07-06 | Verint Americas Inc. | Multi-pass speech analytics |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5535256A (en) | 1993-09-22 | 1996-07-09 | Teknekron Infoswitch Corporation | Method and system for automatically monitoring the performance quality of call center service representatives |
US6263049B1 (en) | 1996-10-10 | 2001-07-17 | Envision Telephony, Inc. | Non-random call center supervisory method and apparatus |
US20010040942A1 (en) | 1999-06-08 | 2001-11-15 | Dictaphone Corporation | System and method for recording and storing telephone call information |
US6408064B1 (en) | 1998-02-20 | 2002-06-18 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for enabling full interactive monitoring of calls to and from a call-in center |
US6542602B1 (en) | 2000-02-14 | 2003-04-01 | Nice Systems Ltd. | Telephone call monitoring system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5901243A (en) * | 1996-09-30 | 1999-05-04 | Hewlett-Packard Company | Dynamic exposure control in single-scan digital input devices |
US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
-
2003
- 2003-10-20 CA CA2502533A patent/CA2502533C/en not_active Expired - Fee Related
- 2003-10-20 WO PCT/US2003/033040 patent/WO2004036543A2/en not_active Application Discontinuation
- 2003-10-20 AU AU2003282940A patent/AU2003282940C1/en not_active Ceased
- 2003-10-20 EP EP03774874A patent/EP1565907A4/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5535256A (en) | 1993-09-22 | 1996-07-09 | Teknekron Infoswitch Corporation | Method and system for automatically monitoring the performance quality of call center service representatives |
US6263049B1 (en) | 1996-10-10 | 2001-07-17 | Envision Telephony, Inc. | Non-random call center supervisory method and apparatus |
US6408064B1 (en) | 1998-02-20 | 2002-06-18 | Genesys Telecommunications Laboratories, Inc. | Method and apparatus for enabling full interactive monitoring of calls to and from a call-in center |
US20010040942A1 (en) | 1999-06-08 | 2001-11-15 | Dictaphone Corporation | System and method for recording and storing telephone call information |
US6542602B1 (en) | 2000-02-14 | 2003-04-01 | Nice Systems Ltd. | Telephone call monitoring system |
Non-Patent Citations (3)
Title |
---|
CLEMENS ET AL.: "Broadcast Engineering Conference Proceedings, Las Vegas", 2001, FAST-TALK COMMUNICATIONS, INC., article "Phonetic Searching of Digital Audio" |
See also references of EP1565907A4 |
STEVEN WARTIK ET AL.: "Information Retrieval Data Structures & Algorithms", 1992, PRENTICE-HALL, article "Boolean Operations", pages: 264 - 268 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008109501A1 (en) | 2007-03-05 | 2008-09-12 | Calabrio, Inc. | Monitoring quality of customer service in customer/agent calls over a voip network |
EP2119108A4 (en) * | 2007-03-05 | 2015-05-27 | Calabrio Inc | Monitoring quality of customer service in customer/agent calls over a voip network |
US20100161604A1 (en) * | 2008-12-23 | 2010-06-24 | Nice Systems Ltd | Apparatus and method for multimedia content based manipulation |
Also Published As
Publication number | Publication date |
---|---|
WO2004036543A3 (en) | 2004-07-22 |
EP1565907A4 (en) | 2006-01-18 |
AU2003282940C1 (en) | 2009-07-16 |
EP1565907A2 (en) | 2005-08-24 |
CA2502533C (en) | 2012-12-11 |
AU2003282940A1 (en) | 2004-05-04 |
AU2003282940B2 (en) | 2009-03-05 |
CA2502533A1 (en) | 2004-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7076427B2 (en) | Methods and apparatus for audio data monitoring and evaluation using speech recognition | |
US7133828B2 (en) | Methods and apparatus for audio data analysis and data mining using speech recognition | |
US8055503B2 (en) | Methods and apparatus for audio data analysis and data mining using speech recognition | |
US9992336B2 (en) | System for analyzing interactions and reporting analytic results to human operated and system interfaces in real time | |
US9565310B2 (en) | System and method for message-based call communication | |
US9680998B2 (en) | Call center services system and method | |
US9407764B2 (en) | Systems and methods for presenting end to end calls and associated information | |
US20100158237A1 (en) | Method and Apparatus for Monitoring Contact Center Performance | |
US20230273930A1 (en) | Data Processing System for Automatic Presetting of Controls in an Evaluation Operator Interface | |
US9210264B2 (en) | System and method for live voice and voicemail detection | |
AU2003282940C1 (en) | Methods and apparatus for audio data monitoring and evaluation using speech recognition | |
AU2003301373B9 (en) | Methods and apparatus for audio data analysis and data mining using speech recognition | |
KR20230156599A (en) | A system that records and manages calls in the contact center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003282940 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1-2005-500696 Country of ref document: PH |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2502533 Country of ref document: CA Ref document number: 282/MUMNP/2005 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003774874 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2003774874 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: JP |