WO2009084000A1

WO2009084000A1 - Method and system for searching preferred multimedia content

Info

Publication number: WO2009084000A1
Application number: PCT/IN2007/000628
Authority: WO
Inventors: Cattamanchi Indivar Reddy; Sanjay Pandey; Shriganesh Krishna Khandagar
Original assignee: Onmobile Global Limited
Priority date: 2007-12-31
Filing date: 2007-12-31
Publication date: 2009-07-09

Abstract

A method and system for searching multimedia content stored in an index store are disclosed. The disclosed method and system enable a caller to access a particular multimedia object in the multimedia content through a search query. The search query is in the form of utterance of few words related to metadata of the multimedia object. The system converts the voice search query into a text search query. Further, the system converts the text search query into a phonetic equivalent query and an n-tuple equivalent query. The system then searches for the multimedia object in the index store based on the phonetic equivalent query, n-tuple equivalent query and text search query.

Description

METHOD AND SYSTEM FOR SEARCHING PREFERRED MULTIMEDIA CONTENT

FIELD OF THE INVENTION

The present invention relates to a system and a method for searching content in a database. More specifically, the invention relates to method of searching the desired multimedia object in an index store based on a search query.

DEFINITIONS

Caller: The mobile phone user who initiates a call is called a 'caller' Callee: The mobile phone user to whom the call is made is called a 'callee'

BACKGROUND OF THE INVENTION

An individual uses telecommunication services, like a mobile telephony service, to communicate with other individuals. Communication through a telecommunication service is established using telecommunication terminals like a fixed landline telephone, a mobile telephone or by some other wireless communication device. An individual can make or receive calls on her wired/wireless device after registering with a communication service provider. Any individual registered with a telecommunication service provider is hereinafter referred to as a subscriber.

Typically, several service providers operate simultaneously within a geographical area. This has led to stiff competition among the service providers for increasing their subscriber base. Service providers are therefore providing more and more Value Added Services (VAS) to attract customers. VAS is either provided directly by the telecommunication service provider; or through a third party provider, in collaboration with the telecommunication service provider. The VAS includes, for example, MMS, caller ring-back tone (RBT)₁ music on demand and the likes. The subscriber may need to register separately with the VAS provider, which may be the telecommunication service provider or a third party provider, for availing the VAS. A subscriber registered for VAS is hereinafter referred to as a VAS subscriber.

Typically, there are circumstances when a VAS subscriber may want to access multimedia content, stored in a multimedia content database, setup by the VAS provider. For example, VAS subscriber may want to access the songs provided by VAS provider to choose one of the songs as her RBT. Another example could be accessing the songs provided by VAS provider for listening to a favorite song. A multimedia content database generally contains a plurality of multimedia objects, like songs, movie clips etc.

VAS providers provide the VAS subscriber with a choice of selecting a song from a plurality of songs. The existing systems provide the choice of songs by playing a list of songs to the subscriber and receiving a selection of a particular choice from the user. However, this method of providing the selection of songs suffers from the following drawbacks: Firstly, only a limited number of songs can be played to the subscriber. This method is extremely ineffective in case selection from a large number of songs needs to be provided to the subscribers. Secondly, the subscriber may have to wait for a very long time before the song of the subscribers' interest is played.

Certain existing systems overcome the above-mentioned limitations by providing directory browsing to the subscribers for selection of a song. In directory browsing, the subscriber is provided with a list of categories of songs, for examples the genre of the songs, and then selection of a category is received from the subscriber. On selection of a category, a choice of sub-categories may be presented to the subscriber. Similarly, the subscriber may browse through several levels of categories before arriving at the desired song. This method of receiving subscribers' selection of song may be very time consuming and laborious for the subscribers.

To overcome above drawbacks and enabling the subscribers to search and access the desired multimedia content from the VAS provider database, the VAS providers typically use a voice recognition search system.

Through a voice recognition system, the subscriber searches for the desired multimedia object by uttering the name of the multimedia object or other metadata associated with the multimedia object. For example, the VAS subscriber may search for the desired multimedia object stored in the database through voice by interacting with a voice portal. The voice portal receives further information related to the requested multimedia object through asking some questions from the VAS subscriber. The voice portal may, for example, ask the subscriber to utter some information related to the multimedia content, like few words of a song, if the multimedia content is a song. The subscriber then utters a few words related to the song to the voice portal. The voice portal then searches for the song based on the words of the song uttered by the subscriber and delivers the search results. The search results, typically, depend on how efficiently the words uttered by subscriber are recognized by the voice recognition system.

One such method of providing information to subscribers of a mobile communication has been disclosed in WIPO application number WO/2002/103559, titled "Method and system for providing information services to subscribed users of a cellular network using an interface device", assigned to 'Cellesense Technologies Ltd'. The patent application discloses a method for providing information to a subscriber, in response to a request from the subscriber. The subscriber sends the request in a voice format. The system converts the request into a text format. Further, the system searches the required information in a database based on the converted text format. The retrieved information is delivered to the subscriber.

However, there are certain limitations associated with the voice recognition systems. Different subscribers have different accents that lead to different inputs into the voice recognition system for the same word. Further, the words uttered by the subscriber carry noise from the environment which makes voice recognition difficult. If the subscriber's utterances are incomplete or environment is noisy, the voice recognition system may not be able to analyze the utterance correctly. Therefore, there is need for a system that is tolerant to subscriber's accents, pronunciations, and ambient noise.

The existing methods of searching multimedia content by the subscribers have the above-mentioned drawbacks. A method and system is required to overcome the above-mentioned limitations. There is a need to provide a search method, which can support the search request from a communication device through multiple protocols and delivering the requested information. There is a need for a system to receive a request from the subscriber in voice format and delivering the requested multimedia content more accurately.

SUMMARY OF THE INVENTION

The aforementioned needs are addressed by the present invention. Accordingly, a method and a system for searching multimedia content stored in an index store are disclosed. Further, the disclosed method and system enable a caller to access a particular multimedia object in the multimedia content. According to an embodiment of the invention, the caller sends a search query for searching a particular multimedia object in the index store. The search query may be sent by making a phone call to a specified number through her communication terminal. The search query is in the form of a voice search query i.e. uttering a few words related to the metadata of the multimedia object. The system converts the voice search query into a text search query. Further, the system converts the text search query into a phonetic equivalent query and an n-tuple equivalent query. The system then searches for the multimedia object in the index store based on the phonetic equivalent query and generate a first set of search results. The system also searches for the multimedia object in the index store based on n-tuple equivalent query and generate a second set of search results. The system then searches for the multimedia object in the index store based on text search query and generates a third set of search results. The different sets of search results are produced on the basis of confidence factor of matching as well as other parameters. The system then generates a final set of search results based on the three sets of search results. The final set of search results is calculated by giving different weights to different sets of search results. The final search results are a plurality of text identifiers of the multimedia objects. The multimedia objects associated with the text identifiers in the final set of search results are provided to the caller.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustrating the environment of the invention;

FIG. 2 is a schematic illustrating a system for providing access to a content database to a caller in accordance with an embodiment of the invention;

FIG.3 is a schematic illustrating a system for searching a content database in accordance with an embodiment of the invention;

FIG.4 is a schematic illustrating an organization of a content database in accordance with an embodiment of the invention;

FIG. 5 is a flow diagram illustrating a method for providing a multimedia content to a caller in accordance with an embodiment of the invention; and

FIG. 6 is a flow diagram illustrating a method for delivering a preferred multimedia content to a caller as in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, specific details are set forth in order to provide a thorough understanding of the invention. However, it will be apparent that the invention may be practiced without these specific details. Various aspects and features of example embodiments of the invention are described in more detail hereinafter.

FIG. 1 is a schematic depicting an environment for the disclosed invention, in accordance with an embodiment of the invention. Telecommunication service provider enables a caller 102 to connect to a telecommunication infrastructure 104 for making a call. Caller 102 may use a wired/wireless device like a fixed landline telephone, mobile telephone, etc. to connect to telecommunication infrastructure 104. Telecommunication infrastructure 104 comprises switching centers for e.g. a Mobile Switching Center (MSC) 106 for enabling a call connection between caller 102 and the destination.

Telecommunication infrastructure 104 further comprises an information storage module used to store subscription information related to subscribers of the telecommunication service. An example of the information storage module is a central database 108. For each subscriber, the subscription information comprises a unique identifier for the subscriber, telephone number of the subscriber, current location of the subscriber, various services the subscriber has registered for, etc. Telecommunication infrastructure 104 further comprises a VAS system 110 for providing VAS like RBT services, voice mail box service, access to multimedia content and the likes. According to an embodiment of the invention, caller 102 makes a call to VAS system 112 by dialing a specified number. VAS system 112 is used for providing caller 102, access to multimedia content stored in a database. VAS system 112 has been described in detail in conjunction with FIG. 2.

FIG. 2 is a block diagram representing VAS system 112, in accordance with an embodiment of the invention. VAS system stores the multimedia content and enables caller 102 to search and access any desired multimedia object in the multimedia content. VAS system 112 comprises information handling system 202, a message receiver 204, a voice recognition system 206, a phonetic search system 208 and an index store 210. Telecommunication infrastructure 104 transfers search query of caller 102 to information handling system 202 for searching a particular multimedia object stored in index store 210. According to an embodiment of the invention, the search query is a voice search query. The voice search query includes words uttered by the user, which may be related to the metadata of the multimedia object. Voice search query has been discussed further in conjunction with FIG. 3.

Information handling system 202 comprises an information receiving module and a plurality of information exchange cards. The information exchange cards may be, for example, media cards and signaling cards. Signaling cards are used for processing signals to and from communication network 104 and provide specific information related to a call. For example, communication network 104 sends signals regarding initiation of call by caller 102, termination of call by caller 102, receiving of message from caller 102. The signals are transmitted using standard protocols. Examples of protocols for signal handling are the SS7 protocol, PRI protocol etc. An example of signaling cards is NMS TX-4000 cards.

Media cards are used for processing media, for example music playback, DTMF, voice, etc. to and from communication network 104. Any voice search query, for example 'next', by caller 102 is recognized by media cards. Any multimedia content played to caller 102 is via the media cards. An example of media cards are NMS AG-4040 cards. Signaling cards and media cards comprises software components used for signal and media processing and handling respectively. The software component may be written in C/C++, Java or any other programming language.

The information receiving module, in information handling system 202, receives the request of caller 102 and transfers the request to voice recognition system 206 or message receiver 204 based on the request format. According to an embodiment, the information receiving module is software, and may be written in C/C++, Java or any other programming language.

Index store 210 stores various documents in a compressed format. Index store 210 is recreated and loaded any time when the multimedia content is refreshed. Index store 210 comprises a plurality of multimedia objects. As used herein, the term multimedia object refers to any form of stored information that can be transferred through a communication network. Examples of multimedia objects may include movies, songs, music, ring tones or other multimedia information that can be stored in compact discs (CDs), Digital versatile Disc (DVDs) or any other form of media storage. It will be apparent to a person skilled in the art that the invention is not limited to any particular type of multimedia content and can be extended to any new or existing items which can be delivered to caller 102 through a communication network 104. A multimedia object stored in index store 210 is hereinafter referred to as an object.

According to an embodiment of the invention, objects in index store 210 comprise of multimedia objects, for example songs. The songs can be from various languages and can be stored in various formats like MP3, WAV etc. Further, the name of each song in index store 210 is also stored into a text identifiers file in text format. The name of a song comprises a few words related to the song. Additionally, the names of the songs are converted into a phonetic equivalent strings and stored into a phonetic equivalent string file in index store 210. The organization of index store 210 is further explained in conjunction with FIG. 4.

According to an embodiment of the invention, caller 102 dials a specified telephone number through her wired/wireless communication terminal and requests for a particular song through a voice search query. The voice search query comprises uttering few words related to the song. The voice utterance is related to the metadata of the multimedia objects. The metadata of the multimedia object could be the name of the song, name of the singer of the song etc. Further, it will be apparent to a person skilled in the art that metadata of a multimedia objects are not limited to above mentioned aspects. Metadata of the multimedia object can be any information associated with the multimedia object that may help in identification of the multimedia object in index store 210. When caller 102 dials the specified telephone number, communication network 104 forwards the call to VAS system 112. Further, when caller 102 utters her request, information handling system 202 in VAS system 112 forwards the voice search query to a voice recognition system 206.

Voice recognition system 206 identifies the voice search query of caller 102 and converts the request into text format. Voice recognition system 206 comprises at least one processor used to process software codes to recognize the request of caller 102. The software codes may be written in C/C++, Java or any other programming language. To get improved results, voice recognition system 206 is able to identify various utterances of the title of the multimedia object or any other metadata associated with the multimedia objects of index store 210. Voice recognition system 206 is trained with respect to metadata related to the songs stored in index store 210. The training of voice recognition system 206 makes it tolerant to accents, pronunciations of caller 102 and other ambient noises. For example a song named "Chand Taare" is stored in index store 210 and three different callers request for the same song "Chand Taare". Due to different accent or noise, voice recognition system 206 may identify the sound of the first word uttered by the three different callers as "Chand", "Chaanda", and "Chanda". However, as the voice recognition system 206 is trained with respect to the names of the songs stored in index store 210, the voice recognition system 206 gives the text output for all the three requests as "Chand". Therefore, the voice signals of "Chand Taare", "Chaanda Taare", "Chanda Taare", "Chand Thaare", and " Chaanda Thaare" are identified as same as "Chand Taare".

According to another embodiment of the invention, caller 102 sends her request for the multimedia object in text format. Caller 102 can send her request in text format through, for example, SMS. The request in text format comprises words related to the song. Information handling system 202, then, sends the request message to message receiver 204 instead of voice recognition system 206. For example, caller 102 requests for a song named "Chand Taare" in text format through an SMS. Caller 102 types the song name on her wired/wireless device and sends the SMS to a pre-specified telephone number.

The output from voice recognition system 206 or the message receiver 204, as the case may be, is a text search query. The text search query comprises a few words in text format and essentially represents the search query of caller 102 in text format. Phonetic search system 208 receives text search query from message receiver 204 or voice recognition system 206 as the case may be. Phonetic search system 208 searches for the requested song in index store 210 based on the text search query.

The text search query is typically different from the name of the song as stored in index store 210. The difference can arise due to various reasons. In case caller 102 has sent a request in text format, different callers 102 may send different request for the same song. For example, if the name of the song as stored in index store 210 is "Chand Taare", different callers can send the text request as "Chaand Taare", "Chand Thaare", Chaanda Taare", or "Chaanda Thaare". Similarly, in case caller 102 has sent a voice search query, errors may occur when voice recognition system 206 converts requests into text search query.

Phonetic search system 208 identifies a plurality of songs from index store 210 which, according to phonetic search system 208, best matches the text search query. Phonetic search system 208 has been described in detail in conjunction with FIG. 3. The songs searched by phonetic search system are then played to caller 102 one by one.

According to an embodiment of the invention, information handling system 202 may receive further response from caller 102 for the played song. If the desired song is played to caller 102, then, for example, caller 102 may request for setting the song as her ring-back tone. If the desired song is not played to caller 102, then caller 102 may request for another song to be played.

FIG. 3 is a block diagram illustrating phonetic search system 208 in accordance with an embodiment of the invention. Phonetic search system 208 comprises a text receiver 302, a phonetic converter 304, an N-tuples converter 306 and a search engine 308.

Text receiver 302 receives text search query from either voice recognition system 206 or message receiver 204 as the case may be. Text receiver 302 comprises a memory unit which stores the request of caller 102 in text format. Text receiver 302 forwards the text format request to phonetic converter 304 and N-tuples converter 306. According to an embodiment of the invention, text receiver 302 may also receive and store requests in any other format, for example, ASCII coded format.

Phonetic converter 304 comprises at least one processor used to process software codes for converting the text format request received from text receiver 302 into a phonetic equivalent string of the received text. The software codes can be written in C/C++, JAVA or any other programming language. Phonetic converter 304 converts the text search query into phonetic equivalent query. Phonetic equivalent query comprises phonemes of the words in the text search query. According to an embodiment of the invention, phonetic search system 208 removes the vowels from the name of the song in process of converting the text search query into phonetic equivalent query. For example, "Chand Taare" may be converted into a phonetic equivalent query "XND TR" such that the phonetic equivalent query form has no vowels.

N-tuples converter 306 has at least one processing unit used to convert the text format request into various N-tuples of the received text. A tuple is a finite sequence of similar objects each of a specified type. In a tuple, objects occur in a certain order. A tuple of N objects is usually described as an N-tuple where N is a non zero positive integer. For example, a word can be broken into different N-tuple strings of its letters. N-tuple converter 306 identifies different words from the text format request and breaks each word into N-tuple strings of the letters of each word. Processing unit of N-tuples converter 306 is used to process software codes which convert the text format request into various N-tuples of the text. The software codes may be written in C/C++, JAVA or any other programming language. N-tuple strings have been further discussed in conjunction with FIG. 5.

Search engine 308 searches for the song requested by caller 102 based on inputs from text receiver 302, phonetic converter 304, and N-tuples converter 306. Search engine 308 includes a processor and a memory unit. Search engine searches a plurality of songs from index store 210 which, according to search engine 308, best matches the search query given by caller 102. Methodology followed by search engine in coming up with the best matched songs is discussed in conjunction with FIG. 5.

FIG. 4 is a schematic illustrating the organization of index store 210. Index store 210 comprises a storage unit 402; a text identifier file 404, a phonetic equivalent string file 406 and an index file 408. Storage unit 402 contains plurality of multimedia objects like songs. Storage unit 402 may be a collection of plurality of memory storage devices. In an embodiment of the invention, storage unit 402 is a hard disk of a computer. Further, each multimedia object stored in storage unit 402 has a name for identification. The name of the multimedia objects is text identifiers. The text identifiers are stored in text identifier file 404. Text identifier file 404 may be part of hard disk of a computer. The multimedia objects are identified based on their text identifiers. The text identifiers are further converted into a phonetic equivalent strings. Phonetic equivalent strings comprise phonemes of the words in the text identifiers. The phonetic equivalent conversion may be executed by software programs. The software codes of the program may be written in C/C++ or JAVA or any other language. The phonetic equivalent strings are stored in phonetic equivalent string file 406. Further, index file 408 is a file with keys and pointers for each multimedia object 402. Every entry in index file 408 associated with some entry in text identifier file 404 and phonetic equivalent string file 406. In an embodiment of the invention index file 408 stores documents in compressed format. The document is a set of stored fields. Each field has a name and textual value. On searching, these fields are returned from index file 408. The index file 408 is accessible to phonetic search engine 208.

FIG. 5 is a flow diagram depicting a method for searching and delivering multimedia content to caller 102. According to an embodiment of the invention, at step 502, VAS system 112 receives search query in the form of voice search query from caller 102. However, for a person skilled in the art, it will be apparent that the invention is not limited to receive request only in the form of voice search query. At step 504, a search query is received by a voice recognition system 206. At step 506, voice recognition system 206 converts search query into a text search query.

The phonetic search system 208 receives text search query from voice recognition system 206. At step 508, phonetic converter 304 converts text search query into phonetic equivalent query. For example, "chand taare" is converted into "XND TR". Further, n-tuples converter 306 converts text search query into n-tuple equivalent query of the name of the song. For example, "Chand Taare" converts into 3-tuple equivalent for all the words of the name of the song. 3-tuple of "Chand" are 'cha', 'nan', and 'and'. Similarly 3-tuple for "Taare" is 'taa', 'aar', and 'are'. N-tuple conversion improves the search results by accommodating spelling mistakes in the text search query. Spelling mistakes in text search query may arise due to the difference in the accents of different callers. Further, due to external environmental noise, voice recognition may fail to pick the utterance of the caller 102 correctly.

At step 510, search engine 308 searches index store 210 based on the phonetic equivalent query and n-tuple equivalent query. According to an embodiment of the invention, search engine 308 has 3 types of input signals for the requested song. Firstly phonetic search system has the original actual received signal in voice or text format as input. Secondly phonetic search system 208 has phonetic search query string of the requested song as input. Thirdly phonetic search system 208 has n-tuple equivalent of the requested song as input. Based on the 3 types of input signals of the requested multimedia object, index store 210 is searched for getting the requested multimedia object.

Search engine 308 searches phonetic equivalent string file and text identifiers file through index fields of index store 210 for the matching phonetic equivalent query and n-tuple equivalent query. Therefore, in matching phonetic equivalent query, the processor of search engine 308 returns the matched entries in the phonetic equivalent string file from the index fields of index store 210 based on a confidence factor of matching. The confidence factor is a numerical value representing the degree of closeness between the phonetic equivalent query and the phonetic equivalent strings of songs in index store 210. A high confidence factor implies greater degree of matching. The text identifiers of the returned matched entries for phonetic equivalent query are stored in the memory unit of search engine 308 as first set of search results. The first set of search results, thus, comprises text identifiers of songs, the phonetic equivalent strings of which have matched with phonetic equivalent query. Similarly, the returned matched entries for n-tuple equivalent query are stored in the memory unit of search engine 308 as second set of search results. The second set of search results comprises text identifiers of songs, the n-tuple equivalent strings of which have matched with n-tuple equivalent query. The processor of search engine 308 produces a final set of search results based on the first set of search results and the second set of search results. The final set of search results are produced by assigning different weightages to first set of search results and the second set of search results. The final set of search results are forwarded to caller 102. Vector space search method is used for matching the phonetic equivalent query and the n-tuple equivalent query.

According to another embodiment of the invention, search engine 308 further searches text identifier file of index store 210 based on text search query. The processor of search engine 308 returns text identifiers of the multimedia objects that match the text search query. The returned matched text identifiers for text search query are stored in the memory unit of search engine 308 as third set of search results. The processor of search engine 308 produces a final set of search results based on the first set of search results, the second set of search results, and the third set of search results. The final set of search results are produced by assigning different weightage to first set of search results, the second set of search results, and the third set of search results. The final set of search results are provided to caller 102. Vector space search method is used for matching the phonetic equivalent query, the n-tuple equivalent query, and the text search query.

According to an embodiment of the invention, the sets of search results can be further based upon a plurality of caller parameters, in addition to the confidence factor. According to an embodiment of the invention, the caller parameters include time and location of caller 102. For example, during a festival time, words and multimedia objects related to the festival will be given higher weightage. Similarly, callers 102 from a particular place typically have a preference for the song in a particular language. Hence search engine 308 gives more weightage to the multimedia objects of that language in the search results.

It will be apparent to a person skilled in the art; the parameters are not limited to above mentioned parameters and many more parameters may be considered for assigning weightages to the search results.

Based on the three set of search results, a plurality of matched multimedia objects may be stored as matched entries with varying level of confidence factors. For each of the matched multimedia objects, a cumulative average confidence factor is calculated. The matched multimedia objects that have cumulative average more than a certain pre-specified threshold level are delivered to caller 102. The matched multimedia object that has highest level of cumulative confidence factor is first delivered to caller 102. Thereafter based on caller 102's response other multimedia objects may be delivered to caller 102. The method of delivering the matched multimedia objects to caller 102 is explained in detail in conjunction with FIG.6.

FIG. 6 is a flow diagram illustrating a method for delivering multimedia objects to caller 102 and receiving her responses for the delivered multimedia objects. At step 602, the best matched multimedia object is delivered to caller 102. The best matched multimedia object is the one that has the highest level of cumulative confidence factor. For example, if caller 102 has requested for the song "Chand Taare" and phonetic search system 208 finds four songs in index store 210 that have cumulative confidence factors more than the certain pre-specified threshold limit, the song with the highest cumulative confidence factor will be delivered to caller 102.

Caller 102, on receiving the best match multimedia object may find that the multimedia object is not the multimedia object as requested by caller 102. In that case, the caller 102 may wish to receive other multimedia object as well. Caller 102 gives a response based on whether other multimedia object needs to be delivered to caller 102 or not. Caller 102 may give a response by either pressing a set of DTMF keys on her wired/wireless device or by speaking further instructions. At step 604, VAS system 112 receives response from caller 102 for further delivering of multimedia objects.

In case, further delivery of multimedia objects is not required, at step 608, VAS system 112 receives caller 102's response for the delivered multimedia object. For example, caller 102 may download the multimedia object as caller tone or as the ring tone. Caller 102 may give the further response by either pressing a set of DTMF keys on her wired/wireless device or by speaking further instructions. According to an embodiment of the invention, caller 102 gives different responses for downloading and using the delivered multimedia content for different purposes. For example, caller 102 may need to press '1' for downloading the multimedia object and may need to press '2' for using the delivered multimedia object as a ring back tone.

In case other multimedia object is required to be delivered to caller, then at step 606, next best matched multimedia content is delivered to caller 102 after receiving a response from caller 102. After step 606, step 604 is executed again. A cycle comprising steps 604 and 606 may be executed until caller 102 is satisfied with the delivered multimedia object and then step 608 is executed.

The forgoing description of the exemplary embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above discussion. It is intended that the scope of the invention be limited not with this detailed description, but rather defined by the claim mentioned here.

Claims

CLAIMS:

1) A method for searching an object in a set of objects in response to a search query, the set of objects being stored in an index store, the index store further comprising a text identifier for each object and a phonetic equivalent string for each text identifier, the method comprising: a) converting the search query into a text search query; b) converting the text search query into a phonetic equivalent query; c) converting the text search query into an n-tuple equivalent query; d) searching the set of phonetic equivalent strings based on the phonetic equivalent query to generate a first set of search results, wherein the first set of search results consists of associated text identifiers for the searched phonetic equivalent strings; e) searching the set of text identifiers based on the n-tuple equivalent query to generate a second set of search results; and f) generating a final set of search results based on the first set of search results and the second set of search results.

2) The method as recited in claim 1 further comprising the steps of: a) searching the set of text identifiers based on the text search query to generates a third set of search results; and b) modifying the final set of search results based on the third set of search results.

3) The method as recited in claim 1 further comprising the step of identifying the objects associated with the text identifiers in the final set of search results.

4) The method as recited in claim 1 , wherein the step of searching the phonetic equivalent string comprises: a) comparing the phonetic equivalent query with index field entries of database; and b) calculating a degree of matching confidence factor for each comparison.

5) The method as recited in claim 1 further comprising the step of modifying the final set of search results based on caller parameters.

6) The method as recited in claim 1 wherein the search query is a voice search query. 7) A method for delivering a desired object from a set of objects to a user in response to a search query from the user, the set of objects being stored in a database, the database further comprising a text identifier for each object and a phonetic equivalent string for each text identifier, the method comprising: a) converting the search query into a text search query; b) converting the text search query into a phonetic search query string; c) converting the text search query into an n-tuple equivalent; d) searching the set of phonetic equivalent strings based on the phonetic search query string to generate a first set of search results, wherein the first set of search results consists of associated text identifiers for the searched phonetic equivalent strings; e) searching the set of text identifiers based on the n-tuple equivalent to generate a second set of search results; f) generating a final set of search results based on the first set of search results and the second set of search results; and g) identifying at least one object associated with the text identifiers in the final set of search results.

8) The method as recited in claim 7 further comprising the steps of: a) receiving a user input for a desired action from the user based on the final set of search results; and b) performing the desired action based on the user input.

9) A system for searching an object in a set of objects in response to a search query, the set of objects being stored in an index store, the index store further comprising a text identifier for each object and a phonetic equivalent string for each text identifier, the system comprising: a) a voice recognition system for converting the search query into a text search query; b) a phonetic converter for converting the text search query into a phonetic equivalent query; c) an n-tuple converter for converting the text search query into an n-tuple equivalent query; d) a search engine for generating a set of search results by searching the index store based on the text search query, the phonetic equivalent query and the n-tuple equivalent query.