US20150095320A1 - Apparatus, systems and methods for scoring the reliability of online information - Google Patents

Apparatus, systems and methods for scoring the reliability of online information Download PDF

Info

Publication number
US20150095320A1
US20150095320A1 US14/039,333 US201314039333A US2015095320A1 US 20150095320 A1 US20150095320 A1 US 20150095320A1 US 201314039333 A US201314039333 A US 201314039333A US 2015095320 A1 US2015095320 A1 US 2015095320A1
Authority
US
United States
Prior art keywords
score
multimedia documents
multimedia
documents
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/039,333
Inventor
Stanislas Motte
Ramon Ruti
Arnaud Jacolin
Pierre-Albert Ruquier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Myslinski Lucas J
Original Assignee
Trooclick France
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Trooclick France filed Critical Trooclick France
Priority to US14/039,333 priority Critical patent/US20150095320A1/en
Assigned to Trooclick France reassignment Trooclick France ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JACOLIN, Arnaud, MOTTE, Stanislas, RUQUIER, Pierre-Albert, RUTI, Ramon
Priority to PCT/EP2014/070331 priority patent/WO2015044179A1/en
Priority to US15/024,574 priority patent/US10169424B2/en
Publication of US20150095320A1 publication Critical patent/US20150095320A1/en
Assigned to MYSLINSKI, LUCAS J reassignment MYSLINSKI, LUCAS J ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Trooclick France
Priority to US16/190,824 priority patent/US10915539B2/en
Assigned to Trooclick France reassignment Trooclick France ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JACOLIN, Arnaud, MOTTE, Stanislas, RUQUIER, Pierre-Albert, RUTI, Ramon
Priority to US17/141,720 priority patent/US11755595B2/en
Priority to US18/123,584 priority patent/US20230252034A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/3053
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • G06F17/30011
    • G06F17/30017
    • G06F17/30598
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the present invention relates to the field of apparatus, systems and methods for big data analysis, in order to score the reliability of online information with a high efficiency and a real-time availability of the processed information.
  • U.S. Pat. No. 2009/0125382 provides an indication of a data source's accuracy with respect to past expressed opinions.
  • the data source is assigned with predication scores based on the verified credibility of historical documents.
  • a reputation score is assigned for a new document as a function of the predication scores from the historical documents, data source affiliations, document topics and other parameters.
  • U.S. Pat. No. 7,249,380 providing a model to evaluate trust and transitivity of trust of online services.
  • the trust attributes are categorized in three categories, which relate to contents, owner of the web document and the relationships between the web document and certificate authorities.
  • U.S. Pat. No. 7,809,721 describes a system for ranking data including three calculations: firstly, the quantitative semantic similarity score calculation which shows the qualitative relevancy of the particular location to the query; secondly, the general quantitative score calculation which comprises a semantic similarity score, a distance score and a rating score; thirdly, the addition of the quantitative semantic similarity score and the general quantitative score to obtain a vector score.
  • the present invention dynamically provides the reliability of multimedia documents by applying a series of intrinsic criteria and extrinsic criteria. All the pre-calculated reliability scores of a subset of the existing multimedia documents are stored in a database so customers just need to retrieve the scores already pre-calculated, thus it is less time consuming than triggering the calculation process of the full set of existing documents. These scores can be updated according to the publishing of various sources including the comments of the social networks and communities. Additionally, all the subsets of multimedia documents are cross-checked among different sources.
  • the inventive subject matter provides apparatus, systems and methods in which multimedia documents are attributed a reliability score.
  • One aspect of the inventive subject matter includes a method to provide a customer at least one multimedia document associated with a reliability score calculated by applying a first category of intrinsic criteria and a second category of extrinsic criteria. It comprises first steps of pre-calculating the reliability score for at least a set of multimedia documents of at least one pre-selected source of documents, second steps of updating this reliability score by applying the extrinsic criteria, and a last step for providing, in response to a customer's request, the multimedia documents from the pre-selected sources associated with the updated score and the multimedia documents from the other sources associated with a score conditionally calculated.
  • a calculation of the score for the multimedia documents from the other sources is activated by an action from the customer. Besides, it is also activated by the detection of at least one request coming from the customer device in the case of getting multimedia documents that do not have a pre-calculated score. In addition, the score of multimedia documents coming from the other sources are pre-calculated when a threshold of interest is reached. The processing of pre-calculation of the score of the multimedia documents within a pre-selected source is prioritized according to an interest indicator, where the interest indicator is weighted by the number of requests and the measure of engagements.
  • the reliability scores are time-stamped and associated to the related time-stamped version of the documents. So when a discrepancy is detected between the time-stamp of the reliability score and the time-stamp of the related document, the pre-processing of the document is updated. This detection of the discrepancies could be performed by the customer's device.
  • the representation of the pre-calculated scored documents is computed by the customer device through an aggregation of the multimedia documents coming from the source's servers and the related scores coming from the score processing server.
  • the document acquisition can be performed by both the score processing server and the customer device.
  • the method of scoring the reliability of online information comprises additional steps of computing, for at least one source, a global reliability score of the source, based on the reliability score of its multimedia documents and the number of its visionary documents.
  • the method for scoring the reliability of online information comprises a step of filtering according to a white list, so as to exclude the checking of an existing reliability score for multimedia documents coming from sources not belonging to the white list; or according to a black list, so as to exclude the checking of an existing reliability score for multimedia documents coming from sources belonging to the black list.
  • FIG. 1 is a flow chart that describes the general steps for scoring the reliability of online information.
  • FIG. 2 is a flow chart that describes in details the various steps for creating, updating and distributing the reliability score.
  • FIG. 3 is the diagram of the hardware for the implementation of the function for document retrieval.
  • FIG. 4 a is a flow chart that describes in details the identification, consolidation and weighting of relevant words.
  • FIG. 4 b is an example of document partitioning with an example the different parts of a typical news document on the World Wide Web.
  • FIGS. 5 a and 5 b are flow charts that describe the document association using respectively technical classification and clustering.
  • FIG. 6 is the diagram of the hardware for the distribution of the reliability score through different clients.
  • FIG. 7 illustrates a display of a reliability score to customers.
  • FIG. 8 illustrates an exemplary computer (electronic circuit) hardware diagram.
  • FIG. 1 provides an overview of the general computer-implemented process for creating and distributing a reliability score.
  • the document is retrieved by computer from a media (from a web site or collected via proprietary API for example).
  • the document can be any multimedia information on the World Wide Web, e.g. a text, an image, an audio and/or a video registration.
  • this retrieved document is computer-analyzed according to the type of document, e.g. text, audio, video, etc.
  • a reliability score is calculated by a computer according to the analyzed result of the documents.
  • a series of intrinsic criteria and extrinsic criteria stored in memory of the computer are applied successively to calculate the reliability score to qualify the information.
  • this reliability score is electronically distributed to different clients.
  • FIG. 2 describes in more details this multimedia document scoring process.
  • the steps 110 , 120 and 130 each computer-implemented, constitute the computer-implemented step 100 in FIG. 1 .
  • a document selection step 110 is performed. It can be all kinds of information, as for example a news article.
  • the step 120 shows the document retrieval, which is the process of automatic collection of information from the World Wide Web or other sources. This could be achieved by connecting to proprietary APIs, by using existing RSS news feeds or by crawling the World Wide Web.
  • a multimedia document (for example, a news article) can be updated by the source after the reliability analysis has been performed.
  • the pre-calculated scores are time-stamped, customers are notified that the multimedia document has been updated but the score has not yet been re-calculated.
  • the processing server is informed of this update in order to trigger a new cycle of multimedia document retrieval, analysis and reliability score calculation.
  • the step 130 illustrates the process of multimedia document cleaning, formatting and classifying according to its type (text, image, audio or video) before the next step of multimedia document analysis.
  • FIG. 3 presents the diagram of the hardware for the implementation of the function for document retrieval.
  • 110 a , 110 b , 110 c and 110 d represent different document source servers, e.g. the serve of “Washington Post”, the serve of “Fox news”.
  • Different applications like RSS feeds, crawling, APIs collect the information from the serves via network as indicated by step 120 a .
  • the collected information goes through the multimedia document retrieval and dispatcher server as indicated by step 140 and is then classified either in the normal documents queue as shown in step 150 a or in the priority documents queue as shown in step 150 b .
  • This information serves to calculate the reliability score in the processing server as indicated in step 160 .
  • this reliability score is saved in the databases as indicated in step 170 .
  • FIG. 8 For additional discussion of exemplary computer-implemented hardware configurations see discussion with reference to FIG. 8 , below.
  • the document retrieval process is configured according to the frequency and the number of times a source can be solicited. Information that has already been processed must be frequently updated, over an undefined period of time.
  • this multimedia document retrieval tool is adapted so as to collect breaking news emails in order to quickly retrieve urgent information from email alerts sent by newspapers. This will allow a quick response to frequent changes of that information. Documents collected via breaking news will be automatically dispatched into the priority queue displayed in FIG. 3 . Information is then assigned a new pre-calculated reliability score with every update.
  • the computer-implemented step 200 of multimedia document content analysis contains steps 210 , 220 , 230 and 240 , each being computer-implemented. This analysis is performed in different ways depending on the type of document. If the content is a text, then morphology, syntax and semantic analysis are performed as indicated in step 220 . If the content is in the form of an image, the analysis is performed by clone, masking, etc. as indicated in step 210 . For the audio documents, the content is transformed to text via speech-to-text technologies before applying the text analysis as indicated in step 240 . For the video documents, the analysis is a combination of the image and audio analysis previously explained as indicated by steps 230 and 231 .
  • Morphology and syntax analyses are used to determine the category each word belongs to: whether a verb, an adjective, a noun, preposition, etc. . . . To do this it is often necessary to disambiguate between several possibilities.
  • the word ‘general’ can be either a noun or an adjective.
  • the context helps disambiguate between the two meanings.
  • a named entity recognition is performed to locate and classify atomic elements in the text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. as indicated in FIG. 4 a.
  • FIG. 6 illustrates how to identify the weight of words found in the document which is to be scored.
  • Steps 221 a , 221 b , 221 c and 221 d allow to select the most relevant named entities and events for a given news document.
  • the selection method relies on the contrast between two types of textual units: on the one hand the named entities that denote referential entities that are well-identified in the specific document (e.g. Organization, Location, Person) and on the other hand the terms that represent events on the other hand.
  • weighting takes into account the number of occurrences in the text. The more a word appears in the text the more its weighting will be important. For documents comprised of a title and a header, weighting will be heavier for:
  • Words that are semantically close to the title for example, the concept “prison” is close to “justice”.
  • step 221 f shows a multimedia document's detailed definition based on the relevance weighting calculated in the step 221 e.
  • FIG. 4 b represents a typical online news article. Different parts of this article, which are represented in different colors, have different degrees of importance.
  • the grey parts e.g. title, date, author and text are used to calculate the reliability score of the document.
  • the words founded in the title and in the first paragraph are considered more important than those in other paragraphs.
  • step 300 the computer receives as input the original document and the metadata of the detailed document definition obtained in the process of FIG. 4 a . It represents the calculation of the reliability score of a document by applying a first category of intrinsic criteria and a second category of extrinsic criteria.
  • the intrinsic criteria are related to the document information itself: the content, the context in which the information is published, its date of publication/update, the use of conditional tense, lack of precision while providing the sources, etc.
  • the extrinsic criteria include users-related criteria like the comments in related social networks, the cross-checking with other sources dealing with the same information so as to detect inconsistencies, etc.
  • the step 310 for the intrinsic criteria computer calculation depends on the criteria linked to the document itself, like the use of the detection of the conditional tense use, inconsistencies between the title and the body of the text, spelling mistakes and so on, which are described by the steps 311 - 314 .
  • the step 311 of fact checking is an extrinsic criterion that can be calculated by the computer without document classification or clustering. All facts identified inside the document are verified with a knowledge database, e.g. created from corporate reporting, Wikipedia's infoboxes, and governmental figures, amongst others. Here is a non-exhaustive list:
  • the step 312 illustrates the detection of the conditional tense, an intrinsic criterion related to the document itself.
  • an author is unsure of the reliability of a piece of information, he may use the conditional tense to protect himself.
  • it is necessary to analyze the conjugation of the verbs linked to the most relevant words. Identifying the verb carrying the meaning is crucial. It is necessary to simultaneously look for textual clues in the document (such as “it seems that”) bearing the same conditional function. If the verb tenses of the principal information are different in the title, header and text, but especially if the title is in the present or present perfect while the header and text are in the conditional, the document's reliability score will be lowered.
  • the step 313 shows other criteria such as inconsistencies between the title and the content inside a text document.
  • the document is not a text one, other criteria are carried out, such as the detection of digitally modified images as illustrated on step 314 .
  • news article contain a photo it is necessary to check that it has not undergone substantial alterations such as cloning, masking, adding or deleting of fictional elements and so on.
  • event-related news for example, terrorist act, strike, accident
  • the computer thus lowers the reliability score of the article if it was taken a long time before the events or comments on the photo. This may be relevant for example when photos are used to illustrate a strike.
  • a reliable document should contain the 5 “W” relating to its principal piece of information: Who, What, Where, When and Why about an event. If several “W” are missing, the document's reliability score will be lowered.
  • the step 320 represents the multimedia document classification and clustering.
  • a cluster contains all multimedia documents coming from one or several sources dealing with the same information. For example, two multimedia documents created by different sources and describing the arrest of the same person will be integrated in the same cluster.
  • the aim of this task is to identify multimedia documents dealing with the same topic in order to group them together.
  • Various elements will be used to achieve this. It is crucial that, for example, a multimedia document announcing the launch of a new product by a company must not be associated with another multimedia document describing the acquisition of this company by another one.
  • the detailed document definition described on FIG. 4 a is needed.
  • 2 documents to be grouped together they must have a close date of creation. For example, for some topics such as sports, a 12 hour window can be adopted while for other “long trend” topics the grouping windowing could reach several days.
  • a verification and consolidation process of existing clusters is necessary.
  • a process of background consolidation of clusters e.g. to find and unify multimedia documents that are isolated in different clusters but deal with the same topic
  • this process succeeds in unifying several clusters, some of the extrinsic criteria will be recalculated and thus the reliability score will also be updated.
  • Each cluster contains a representative multimedia document that defines most adequately the association. This avoids having to compare new entries with all the multimedia documents that are already present in a cluster. Of course when a cluster contains a single multimedia document, that multimedia document will be the representative. Multimedia documents using different languages are also clustered together. Inside a cluster, different sub-clusters can be created depending on their language, country of origin, sources and so on.
  • step 341 describes the end of document intrinsic analysis.
  • the document intrinsic score which is represented by the reliability score, is stored in the step 342 .
  • step 343 an assessment is done in step 343 to classify the document into the relevant cluster.
  • the flow goes to “Yes”, the new multimedia document is added to this cluster as indicated in step 344 . If there is no relevant cluster, the flow goes to “No” and a new cluster is created as indicated in step 345 .
  • step 344 the new added multimedia document is compared with the other multimedia documents already existing in the cluster. Extrinsic document criteria like omission and inconsistency are calculated as indicated by step 346 . Finally, a decision is made in step 347 : whether to change the reliability score of the existing cluster multimedia documents. If “Yes”, the updated score is stored for all the multimedia documents in this cluster as indicated in step 248 ; if “No”, only the score of the new multimedia document is saved as shown in step 349 .
  • FIG. 5 b represents an asynchronous cluster consolidation process to detect and merge the equivalent clusters. It starts with a comparison of step 350 , if there are equivalent clusters with the answer “Yes”, it goes to step 351 to perform cluster consolidation and unification. If “NO”, it goes to step 352 to end the process.
  • step 351 the reliability score of all the multimedia documents in the new unified cluster are re-calculated based on the inconsistencies between the documents.
  • step 354 the decision to change the scores of new document clusters scores is done in step 354 : if “Yes”, the concerned multimedia document ratings are updated by storing the updated reliability score in step 355 ; if “No”, only the score of the new multimedia document is stored in step 356 .
  • the step 330 for the extrinsic criteria calculation depends on the criteria linked to other documents, like the reliability score of the sources where the news come from, the inconsistency relative to other documents in the same cluster, the comments on the other social media and net works and so on, which are represented by the steps 331 - 334 .
  • a source can be a publisher of a newspaper, here also refer to a recognized author.
  • the score of a source will depend on the score obtained by the multimedia documents created by that source.
  • the score of a news website is constituted by the weighted-average of the newspaper's various sections, and the score of the sections is constituted by the weighted-average of the section's various multimedia documents over a period of time.
  • Each source has a reliability score that evolves according to the weighting of the rate obtained of its various multimedia documents.
  • the documents that were right from the beginning are called visionary documents and are given a better reliability.
  • the reliability score of sources and authors evolves in time and is not based solely on the values manually assigned to them during the launching period.
  • the aim of this criterion is to detect factual information that varies from one document to another. If an inconsistency is detected, a warning is triggered on the reliability score.
  • inconsistencies can be detected: the first one verifies the information's meaning in order to detect, for example, if the tone of one document is positive about an event when another is negative.
  • Another type of inconsistency concerns the facts relating to the words that have been identified in the text (non-exhaustive list):
  • step 333 There are many other criteria represented by step 333 , one of them is Rumor detection, which is another important criterion. Although rumor sometimes end up being true, the fact that it is a rumor necessarily raises doubts. The idea is not just to look for the word rumor/hoax in the text but to detect whether the principal piece of information is centered around a rumor, and whether the author is sure of its reliability or not.
  • step 340 represents the final multimedia document scoring, the process that takes into account and weights all the editorial criteria (both intrinsic and extrinsic) analyzed during the process.
  • the step 400 illustrates the reliability score distribution.
  • the main goal is to display a pre-calculated reliability score associated with a multimedia document.
  • a tool to distribute and display the reliability score must retrieve this score and associated metadata from a distant database and display them on a device.
  • This tool can be a web browser extension or any other multimedia application compatible with devices such as PC, mobile phone, tablet, TV, radio, etc. And these tools can also integrate the functions of update, retrieval and aggregation of multimedia documents.
  • FIG. 7 illustrates an example of such a web browser extension or “add-on” 410 rating the content in a webpage 420 .
  • the content of the news is chosen arbitrarily as an example and plays no importance in understanding this invention.
  • the add-on 410 is composed of several parts: the header 411 , the overall reliability score 412 and the set of intrinsic and extrinsic criteria 413 . Indicated as 415 , the user is able to quickly give his opinion regarding the reliability of the document. Additionally, 416 indicates the related multimedia documents as well as related social network messages such as tweets.
  • an icon is automatically positioned in the document to notify the customer that the pre-calculated score is available.
  • a pop-up window appears and provides a first level of information on the reliability of the multimedia document. This score is always time-stamped.
  • a “more information” link redirects the customer to a website where extra information, links to related documents and the community's comments are available.
  • the add-on is installed on the customer's browser, it is easily accessible thanks to an icon dynamically positioned in the multimedia document as well as a button in one of the browser's menu.
  • customers can manually request its processing via the add-on.
  • the add-on filters up-front the document sources having already been rated by the system using both a whitelist and a blacklist.
  • customers can optionally configure the sources they want to be processed.
  • the process of score distribution in the form of an interactive widget displaying the reliability scores of multimedia documents and sources to be included in websites.
  • the display will consist of a listing of the different multimedia documents reliability scores offering the possibility to see progressions and regressions in the scores or rankings over time.
  • the process of score distribution in the form of a daily, weekly or monthly newsletter showing the reliability scores and rankings of multimedia documents and sources.
  • the process of real-time score distribution in the form of an alerting service informing of the reliability scores and rankings of multimedia documents and sources.
  • the diagram of the hardware for the distribution of the reliability score to different customers is represented in FIG. 6 .
  • the documents are retrieved from document source server 1, 2 . . . N; at other hand, the reliability scores are read from the processing server.
  • the reliability score processing server contains sub-databases: user databases, white/blacklists databases, reliability score databases and knowledge databases.
  • FIG. 8 illustrates some of these.
  • the processor 500 of a multimedia document retrieval computer or server is coupled to communicate with a network, such as the Internet 502 .
  • Attached to processor 500 is a memory circuit (e.g., RAM memory) in which has been stored web crawler engine code 506 .
  • web crawler engine code 506 Such code may be written in a variety of different computer languages, such as Python, C++, Java, and the like.
  • a publicly available web crawler system such as PolyBot, UbiCrawler, C-proc, Dominos, or the like may be used.
  • Processor 500 is thus programmed to crawl the network (Internet 502 ) to retrieve multimedia documents 507 , which processor 500 stores in a storage device 508 attached.
  • Storage device 508 may be configured as a database as discussed above.
  • each multimedia document is scored as discussed above.
  • a processor 500 a (which could be the same physical processor as processor 500 , or a different processor) executes score calculation code 514 stored in the memory 504 a attached to processor 500 a . If processor 500 a and processor 500 are the same device, memory 504 a may be an allocated portion of memory 504 .
  • the processor 500 a is configured to access the database of multimedia documents 507 .
  • processor 500 a may either access the same storage device 508 as attached to processor 500 , or it may have its own attached storage device 508 a .
  • FIG. 8 one graphical representation has been provided in association with both reference numerals 508 and 508 a to illustrate that the storage device functionality may be the implemented using same physical device or implemented as separate physical devices.
  • the processor 500 a uses intrinsic criteria and extrinsic criteria. These criteria are both stored in memory 504 a , as at 510 and 512 , respectively. As each multimedia document is scored, its calculated reliability score 516 is associated with that multimedia document and stored as part of the database record for that document within a storage device 508 b . Similar to the explanation above, if desired, the functionality of storage device 508 b can be implemented using the same physical storage device as connected to processor 500 a (and/or processor 500 ).
  • processor 500 b is coupled to the network (e.g. Internet 502 ).
  • Processor 506 b may be physically separate from processors 500 a and 500 , or it may be the same physical device as processors 500 and/or 500 a .
  • Attached to processor 500 b is memory 504 b in which executable web server code is stored. Suitable web server code may be implemented using publicly available Apache HTTP web server code, for example.
  • Processor 500 b is attached to storage device 508 b , which may be the same physical storage device as devices 508 a and/or 508 , or which may be a separate device, storing a copy of the data transferred from device 508 a .
  • the user or customer accesses the web site established on the network by processor 500 b and through this connection the user or customer enters his or her requests for data, specifying any special criteria as discussed above.
  • the processor 500 b delivers selected multimedia content from the corpus of multimedia documents 507 that meet the user or customer's requirements, as more fully explained above.
  • the executable instructions for some or all of the functions described above may be stored in non-transitory computer readable media.

Abstract

The apparatus, systems and methods dynamically provide the reliability of multimedia documents by applying a series of intrinsic criteria and extrinsic criteria by pre-calculating a reliability score for at least a set of multimedia documents of at least one pre-selected source of multimedia documents, and by providing, in response to a request, the multimedia documents from the pre-selected sources associated with the score and the multimedia documents from the other sources associated with a score conditionally calculated.

Description

    FIELD
  • The present invention relates to the field of apparatus, systems and methods for big data analysis, in order to score the reliability of online information with a high efficiency and a real-time availability of the processed information.
  • BACKGROUND
  • Consumers regularly consult information relating to almost any topic on the World Wide Web. However, large volume of information is returned to the consumer with an unpredictable quality. In order to qualify the information, different technologies have been developed.
  • For example, U.S. Pat. No. 2009/0125382 provides an indication of a data source's accuracy with respect to past expressed opinions. The data source is assigned with predication scores based on the verified credibility of historical documents. A reputation score is assigned for a new document as a function of the predication scores from the historical documents, data source affiliations, document topics and other parameters.
  • Another example is U.S. Pat. No. 7,249,380 providing a model to evaluate trust and transitivity of trust of online services. The trust attributes are categorized in three categories, which relate to contents, owner of the web document and the relationships between the web document and certificate authorities.
  • U.S. Pat. No. 7,809,721 describes a system for ranking data including three calculations: firstly, the quantitative semantic similarity score calculation which shows the qualitative relevancy of the particular location to the query; secondly, the general quantitative score calculation which comprises a semantic similarity score, a distance score and a rating score; thirdly, the addition of the quantitative semantic similarity score and the general quantitative score to obtain a vector score.
  • However, all the existing methods work in a passive mode: the calculation is carried out only when a query is launched. These technologies take long time for the calculation and are not optimized for the dynamic update of the information on the World Wide Web. In the context of the fast development of social networks in particular, all users can constantly update information by broadcasting comments in all types of multimedia.
  • Technical difficulties result in a huge number of data and information sources that have to be taken into account in order to calculate a relevant score, with the additional difficulty of the continuously change of the scope of information. Calculating the score on the fly for a document requested by a user would need too many resources and calculation time. Furthermore, attributing a score to each document that may be requested by a user and refreshing all these scores every time a new document or information becomes available is also too complicated.
  • SUMMARY
  • The present invention dynamically provides the reliability of multimedia documents by applying a series of intrinsic criteria and extrinsic criteria. All the pre-calculated reliability scores of a subset of the existing multimedia documents are stored in a database so customers just need to retrieve the scores already pre-calculated, thus it is less time consuming than triggering the calculation process of the full set of existing documents. These scores can be updated according to the publishing of various sources including the comments of the social networks and communities. Additionally, all the subsets of multimedia documents are cross-checked among different sources.
  • The inventive subject matter provides apparatus, systems and methods in which multimedia documents are attributed a reliability score. One aspect of the inventive subject matter includes a method to provide a customer at least one multimedia document associated with a reliability score calculated by applying a first category of intrinsic criteria and a second category of extrinsic criteria. It comprises first steps of pre-calculating the reliability score for at least a set of multimedia documents of at least one pre-selected source of documents, second steps of updating this reliability score by applying the extrinsic criteria, and a last step for providing, in response to a customer's request, the multimedia documents from the pre-selected sources associated with the updated score and the multimedia documents from the other sources associated with a score conditionally calculated.
  • A calculation of the score for the multimedia documents from the other sources is activated by an action from the customer. Besides, it is also activated by the detection of at least one request coming from the customer device in the case of getting multimedia documents that do not have a pre-calculated score. In addition, the score of multimedia documents coming from the other sources are pre-calculated when a threshold of interest is reached. The processing of pre-calculation of the score of the multimedia documents within a pre-selected source is prioritized according to an interest indicator, where the interest indicator is weighted by the number of requests and the measure of engagements.
  • The reliability scores are time-stamped and associated to the related time-stamped version of the documents. So when a discrepancy is detected between the time-stamp of the reliability score and the time-stamp of the related document, the pre-processing of the document is updated. This detection of the discrepancies could be performed by the customer's device.
  • The representation of the pre-calculated scored documents is computed by the customer device through an aggregation of the multimedia documents coming from the source's servers and the related scores coming from the score processing server. The document acquisition can be performed by both the score processing server and the customer device.
  • The method of scoring the reliability of online information comprises additional steps of computing, for at least one source, a global reliability score of the source, based on the reliability score of its multimedia documents and the number of its visionary documents. The method for scoring the reliability of online information comprises a step of filtering according to a white list, so as to exclude the checking of an existing reliability score for multimedia documents coming from sources not belonging to the white list; or according to a black list, so as to exclude the checking of an existing reliability score for multimedia documents coming from sources belonging to the black list.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flow chart that describes the general steps for scoring the reliability of online information.
  • FIG. 2 is a flow chart that describes in details the various steps for creating, updating and distributing the reliability score.
  • FIG. 3 is the diagram of the hardware for the implementation of the function for document retrieval.
  • FIG. 4 a is a flow chart that describes in details the identification, consolidation and weighting of relevant words.
  • FIG. 4 b is an example of document partitioning with an example the different parts of a typical news document on the World Wide Web.
  • FIGS. 5 a and 5 b are flow charts that describe the document association using respectively technical classification and clustering.
  • FIG. 6 is the diagram of the hardware for the distribution of the reliability score through different clients.
  • FIG. 7 illustrates a display of a reliability score to customers.
  • FIG. 8 illustrates an exemplary computer (electronic circuit) hardware diagram.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • An example with detailed description is given hereafter, but the realization of this invention is not limited to the example illustrated.
  • FIG. 1 provides an overview of the general computer-implemented process for creating and distributing a reliability score. At step 100, the document is retrieved by computer from a media (from a web site or collected via proprietary API for example). The document can be any multimedia information on the World Wide Web, e.g. a text, an image, an audio and/or a video registration. Then at step 200, this retrieved document is computer-analyzed according to the type of document, e.g. text, audio, video, etc. Afterwards at step 300, a reliability score is calculated by a computer according to the analyzed result of the documents. In this step, a series of intrinsic criteria and extrinsic criteria stored in memory of the computer are applied successively to calculate the reliability score to qualify the information. Finally at step 400, this reliability score is electronically distributed to different clients.
  • FIG. 2 describes in more details this multimedia document scoring process. The steps 110, 120 and 130, each computer-implemented, constitute the computer-implemented step 100 in FIG. 1. According to the interest of customers, a document selection step 110 is performed. It can be all kinds of information, as for example a news article.
  • The step 120 shows the document retrieval, which is the process of automatic collection of information from the World Wide Web or other sources. This could be achieved by connecting to proprietary APIs, by using existing RSS news feeds or by crawling the World Wide Web.
  • A multimedia document (for example, a news article) can be updated by the source after the reliability analysis has been performed. As the pre-calculated scores are time-stamped, customers are notified that the multimedia document has been updated but the score has not yet been re-calculated. The processing server is informed of this update in order to trigger a new cycle of multimedia document retrieval, analysis and reliability score calculation.
  • The step 130 illustrates the process of multimedia document cleaning, formatting and classifying according to its type (text, image, audio or video) before the next step of multimedia document analysis.
  • FIG. 3 presents the diagram of the hardware for the implementation of the function for document retrieval. 110 a, 110 b, 110 c and 110 d represent different document source servers, e.g. the serve of “Washington Post”, the serve of “Fox news”. Different applications like RSS feeds, crawling, APIs collect the information from the serves via network as indicated by step 120 a. The collected information goes through the multimedia document retrieval and dispatcher server as indicated by step 140 and is then classified either in the normal documents queue as shown in step 150 a or in the priority documents queue as shown in step 150 b. This information serves to calculate the reliability score in the processing server as indicated in step 160. Finally, this reliability score is saved in the databases as indicated in step 170. For additional discussion of exemplary computer-implemented hardware configurations see discussion with reference to FIG. 8, below.
  • The document retrieval process is configured according to the frequency and the number of times a source can be solicited. Information that has already been processed must be frequently updated, over an undefined period of time. For example, this multimedia document retrieval tool is adapted so as to collect breaking news emails in order to quickly retrieve urgent information from email alerts sent by newspapers. This will allow a quick response to frequent changes of that information. Documents collected via breaking news will be automatically dispatched into the priority queue displayed in FIG. 3. Information is then assigned a new pre-calculated reliability score with every update.
  • The computer-implemented step 200 of multimedia document content analysis contains steps 210, 220, 230 and 240, each being computer-implemented. This analysis is performed in different ways depending on the type of document. If the content is a text, then morphology, syntax and semantic analysis are performed as indicated in step 220. If the content is in the form of an image, the analysis is performed by clone, masking, etc. as indicated in step 210. For the audio documents, the content is transformed to text via speech-to-text technologies before applying the text analysis as indicated in step 240. For the video documents, the analysis is a combination of the image and audio analysis previously explained as indicated by steps 230 and 231.
  • Morphology and syntax analyses are used to determine the category each word belongs to: whether a verb, an adjective, a noun, preposition, etc. . . . To do this it is often necessary to disambiguate between several possibilities. For example, the word ‘general’ can be either a noun or an adjective. The context helps disambiguate between the two meanings. During the relevant words identification and consolidation step 221 a named entity recognition is performed to locate and classify atomic elements in the text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. as indicated in FIG. 4 a.
  • Here, the text analysis is taken as an example. FIG. 6 illustrates how to identify the weight of words found in the document which is to be scored. Steps 221 a, 221 b, 221 c and 221 d allow to select the most relevant named entities and events for a given news document. The selection method relies on the contrast between two types of textual units: on the one hand the named entities that denote referential entities that are well-identified in the specific document (e.g. Organization, Location, Person) and on the other hand the terms that represent events on the other hand.
  • Two levels of relevant named entities and events are extracted:
      • The first level relies on the relevant named entities and events that appear in the title and in the first paragraph of the document.
      • The second level relies on words co-occurring with the identified relevant named entities and events. The co-occurrence is the connection of two words together in the same sentence.
  • Different named entities and events have different relevance weightings according to their relevance with the content represented by step 221 e. The weighting takes into account the number of occurrences in the text. The more a word appears in the text the more its weighting will be important. For documents comprised of a title and a header, weighting will be heavier for:
  • Words that are found in the title.
  • Words that are found in the header.
  • Words that are semantically close to the title (for example, the concept “prison” is close to “justice”).
  • Finally, the step 221 f shows a multimedia document's detailed definition based on the relevance weighting calculated in the step 221 e.
  • FIG. 4 b represents a typical online news article. Different parts of this article, which are represented in different colors, have different degrees of importance. For the document cleaning and formatting step 130, only the grey parts, e.g. title, date, author and text are used to calculate the reliability score of the document. To carry out the named entities and events relevance weighting step 221 e, the words founded in the title and in the first paragraph are considered more important than those in other paragraphs.
  • In the reliability score calculation, step 300, the computer receives as input the original document and the metadata of the detailed document definition obtained in the process of FIG. 4 a. It represents the calculation of the reliability score of a document by applying a first category of intrinsic criteria and a second category of extrinsic criteria. The intrinsic criteria are related to the document information itself: the content, the context in which the information is published, its date of publication/update, the use of conditional tense, lack of precision while providing the sources, etc. The extrinsic criteria include users-related criteria like the comments in related social networks, the cross-checking with other sources dealing with the same information so as to detect inconsistencies, etc.
  • The step 310 for the intrinsic criteria computer calculation depends on the criteria linked to the document itself, like the use of the detection of the conditional tense use, inconsistencies between the title and the body of the text, spelling mistakes and so on, which are described by the steps 311-314.
  • The step 311 of fact checking is an extrinsic criterion that can be calculated by the computer without document classification or clustering. All facts identified inside the document are verified with a knowledge database, e.g. created from corporate reporting, Wikipedia's infoboxes, and governmental figures, amongst others. Here is a non-exhaustive list:
  • Geographical data: Town T belongs to this or that country, River R goes through this or that continent, there are Y inhabitants in this or that country and so on.
  • Dates: when a public figure was born, when a company was founded, beginning and end of a political phase, when a place/invention was discovered, when an album was released and so on.
  • Corporate figures: stock values, forecasts, unemployment figures, taxes,
  • Public figures: unemployment figures, taxes and so on
  • Characters/directors in films, authors, musicians.
  • People belonging to a political party, government, company, organization and so on.
  • If a multimedia document mentions a fact that contradicts the knowledge base, the computer lowers its reliability score.
  • The step 312 illustrates the detection of the conditional tense, an intrinsic criterion related to the document itself. When an author is unsure of the reliability of a piece of information, he may use the conditional tense to protect himself. Once the morphological, syntactic and semantic analyses have been carried out, it is necessary to analyze the conjugation of the verbs linked to the most relevant words. Identifying the verb carrying the meaning is crucial. It is necessary to simultaneously look for textual clues in the document (such as “it seems that”) bearing the same conditional function. If the verb tenses of the principal information are different in the title, header and text, but especially if the title is in the present or present perfect while the header and text are in the conditional, the document's reliability score will be lowered. Though titles may not contain a verb, some nouns can replace a verb. For example in the title “looting in Montpellier supermarket”, the computer replaces the word “looting” with the expression “was looted”. Similarly, a document with the title “Mr. X, possible candidate for town hall” where “possible” implies the information is in conditional tense.
  • The step 313 shows other criteria such as inconsistencies between the title and the content inside a text document.
  • If the document is not a text one, other criteria are carried out, such as the detection of digitally modified images as illustrated on step 314. When news article contain a photo it is necessary to check that it has not undergone substantial alterations such as cloning, masking, adding or deleting of fictional elements and so on. In the case of event-related news (for example, terrorist act, strike, accident) it is necessary to verify when the photo was taken. The computer thus lowers the reliability score of the article if it was taken a long time before the events or comments on the photo. This may be relevant for example when photos are used to illustrate a strike.
  • Document thoroughness is another intrinsic criterion. A reliable document should contain the 5 “W” relating to its principal piece of information: Who, What, Where, When and Why about an event. If several “W” are missing, the document's reliability score will be lowered.
  • After the step 310, the multimedia document individual analysis is finished. The step 320 represents the multimedia document classification and clustering. A cluster contains all multimedia documents coming from one or several sources dealing with the same information. For example, two multimedia documents created by different sources and describing the arrest of the same person will be integrated in the same cluster.
  • The aim of this task is to identify multimedia documents dealing with the same topic in order to group them together. Various elements will be used to achieve this. It is crucial that, for example, a multimedia document announcing the launch of a new product by a company must not be associated with another multimedia document describing the acquisition of this company by another one. In order to identify the principal topic of a multimedia document, the detailed document definition described on FIG. 4 a is needed. In some cases, for 2 documents to be grouped together they must have a close date of creation. For example, for some topics such as sports, a 12 hour window can be adopted while for other “long trend” topics the grouping windowing could reach several days.
  • There is no limit to the number of multimedia documents that can be associated in the same cluster. When the identification and consolidation of relevant named entities (persons, organizations, locations, etc.) and events of a multimedia document are finished, the computer system must classify them among the existing clusters:
      • If the multimedia document contains the same relevant named entities and events found in an existing cluster, it will be classified into that cluster. If there are inconsistencies between the new multimedia document and the already existing multimedia documents in the cluster, the processing of some extrinsic criteria will be triggered and thus the reliability score will be updated. If no inconsistencies are detected between the new multimedia document and the already existing multimedia documents in the cluster, the reliability score of the existing multimedia documents will not be affected.
      • If no existing cluster is identified as being relevant, the multimedia document will constitute a new cluster.
  • To avoid having isolated multimedia documents because they are being processed simultaneously and therefore cannot be associated in real time, a verification and consolidation process of existing clusters is necessary. A process of background consolidation of clusters (e.g. to find and unify multimedia documents that are isolated in different clusters but deal with the same topic) is triggered to improve the precision of results. When this process succeeds in unifying several clusters, some of the extrinsic criteria will be recalculated and thus the reliability score will also be updated.
  • Each cluster contains a representative multimedia document that defines most adequately the association. This avoids having to compare new entries with all the multimedia documents that are already present in a cluster. Of course when a cluster contains a single multimedia document, that multimedia document will be the representative. Multimedia documents using different languages are also clustered together. Inside a cluster, different sub-clusters can be created depending on their language, country of origin, sources and so on.
  • The steps mentioned above are presented in FIGS. 5 a and 5 b with flow charts for multimedia document classification process and clustering consolidation process respectively. The step 341 describes the end of document intrinsic analysis. Then the document intrinsic score, which is represented by the reliability score, is stored in the step 342. Subsequently, an assessment is done in step 343 to classify the document into the relevant cluster. When such a cluster already exists, the flow goes to “Yes”, the new multimedia document is added to this cluster as indicated in step 344. If there is no relevant cluster, the flow goes to “No” and a new cluster is created as indicated in step 345.
  • After the step 344, the new added multimedia document is compared with the other multimedia documents already existing in the cluster. Extrinsic document criteria like omission and inconsistency are calculated as indicated by step 346. Finally, a decision is made in step 347: whether to change the reliability score of the existing cluster multimedia documents. If “Yes”, the updated score is stored for all the multimedia documents in this cluster as indicated in step 248; if “No”, only the score of the new multimedia document is saved as shown in step 349.
  • FIG. 5 b represents an asynchronous cluster consolidation process to detect and merge the equivalent clusters. It starts with a comparison of step 350, if there are equivalent clusters with the answer “Yes”, it goes to step 351 to perform cluster consolidation and unification. If “NO”, it goes to step 352 to end the process.
  • After step 351, the reliability score of all the multimedia documents in the new unified cluster are re-calculated based on the inconsistencies between the documents. Finally, the decision to change the scores of new document clusters scores is done in step 354: if “Yes”, the concerned multimedia document ratings are updated by storing the updated reliability score in step 355; if “No”, only the score of the new multimedia document is stored in step 356.
  • For the steps 346 and 353 for the re-calculation of the reliability scores of the multimedia documents in a cluster, there are several possibilities such as the omission of some information, inconsistences, etc. The step 330 for the extrinsic criteria calculation depends on the criteria linked to other documents, like the reliability score of the sources where the news come from, the inconsistency relative to other documents in the same cluster, the comments on the other social media and net works and so on, which are represented by the steps 331-334.
  • As in the step 331 of source score, a source can be a publisher of a newspaper, here also refer to a recognized author. In general, the score of a source will depend on the score obtained by the multimedia documents created by that source. For example, the score of a news website is constituted by the weighted-average of the newspaper's various sections, and the score of the sections is constituted by the weighted-average of the section's various multimedia documents over a period of time.
  • Each source has a reliability score that evolves according to the weighting of the rate obtained of its various multimedia documents. The documents that were right from the beginning are called visionary documents and are given a better reliability. The reliability score of sources and authors evolves in time and is not based solely on the values manually assigned to them during the launching period.
  • As represented in the step 332 regarding inconsistencies between multimedia documents, the aim of this criterion is to detect factual information that varies from one document to another. If an inconsistency is detected, a warning is triggered on the reliability score.
  • Different types of inconsistencies can be detected: the first one verifies the information's meaning in order to detect, for example, if the tone of one document is positive about an event when another is negative. Another type of inconsistency concerns the facts relating to the words that have been identified in the text (non-exhaustive list):
      • Different figures (company forecasts, unemployment rate, and number of people on strike, etc.). The difference must be sufficiently large to be relevant.
      • Different locations.
      • Different names of people.
      • Different dates.
      • Different brand names.
      • Different genders (male/female).
        The application of this inconsistency criterion requires that at least 2 documents be associated in a same cluster.
  • The following is an example of inconsistency detection through the comparison of two different news articles. Let's say that in the first article we have the sentence “23 Egyptian policemen killed in Sinai Peninsula by suspected militants” and in the second article we have “Militants kill at least 24 police officers in Egypt”. The tool associates together both articles as it understands that “Sinai Peninsula” is compatible with “Egypt” and it detects there is an inconsistency between the “23 Egyptian policemen” and the “at least 24 police officers” properties extracted from the two sentences.
  • There are many other criteria represented by step 333, one of them is Rumor detection, which is another important criterion. Although rumors sometimes end up being true, the fact that it is a rumor necessarily raises doubts. The idea is not just to look for the word rumor/hoax in the text but to detect whether the principal piece of information is centered around a rumor, and whether the author is sure of its reliability or not.
  • User-related criteria calculation is part of the extrinsic criteria. The users here are in the general sense, meaning social networks, comments written on information websites or the opinion of a community. The idea behind these criteria is to detect the temperature of a community regarding the reliability of a multimedia document as represented by step 334. In conclusion and returning to FIG. 2, step 340 represents the final multimedia document scoring, the process that takes into account and weights all the editorial criteria (both intrinsic and extrinsic) analyzed during the process.
  • Returning to FIG. 1, the step 400 illustrates the reliability score distribution. The main goal is to display a pre-calculated reliability score associated with a multimedia document. A tool to distribute and display the reliability score must retrieve this score and associated metadata from a distant database and display them on a device. This tool can be a web browser extension or any other multimedia application compatible with devices such as PC, mobile phone, tablet, TV, radio, etc. And these tools can also integrate the functions of update, retrieval and aggregation of multimedia documents.
  • FIG. 7 illustrates an example of such a web browser extension or “add-on” 410 rating the content in a webpage 420. The content of the news is chosen arbitrarily as an example and plays no importance in understanding this invention.
  • The add-on 410 is composed of several parts: the header 411, the overall reliability score 412 and the set of intrinsic and extrinsic criteria 413. Indicated as 415, the user is able to quickly give his opinion regarding the reliability of the document. Additionally, 416 indicates the related multimedia documents as well as related social network messages such as tweets.
  • Every time a customer browses a web page that contains a multimedia document with a pre-calculated score in the database, an icon is automatically positioned in the document to notify the customer that the pre-calculated score is available. When the customer clicks on this icon, a pop-up window appears and provides a first level of information on the reliability of the multimedia document. This score is always time-stamped. A “more information” link redirects the customer to a website where extra information, links to related documents and the community's comments are available.
  • Once the add-on is installed on the customer's browser, it is easily accessible thanks to an icon dynamically positioned in the multimedia document as well as a button in one of the browser's menu. When the pre-calculated score is not available for a multimedia document, customers can manually request its processing via the add-on.
  • Every time a customer browses a web page, a request is made to the reliability score database to check whether the pre-calculated score is available or not. In order to avoid querying the database with multimedia documents that have with no interest to be analyzed (web mail pages, e-commerce websites, etc.), the add-on filters up-front the document sources having already been rated by the system using both a whitelist and a blacklist. In addition, customers can optionally configure the sources they want to be processed.
  • The process of score distribution in the form of an interactive widget displaying the reliability scores of multimedia documents and sources to be included in websites. The display will consist of a listing of the different multimedia documents reliability scores offering the possibility to see progressions and regressions in the scores or rankings over time. The process of score distribution in the form of a daily, weekly or monthly newsletter showing the reliability scores and rankings of multimedia documents and sources. The process of real-time score distribution in the form of an alerting service informing of the reliability scores and rankings of multimedia documents and sources.
  • The diagram of the hardware for the distribution of the reliability score to different customers is represented in FIG. 6. In one hand, the documents are retrieved from document source server 1, 2 . . . N; at other hand, the reliability scores are read from the processing server. At the different applications of clients, e.g. PC, Smartphone, Tablet or TV, the documents are displayed with the corresponding reliability scores. Additionally, the reliability score processing server contains sub-databases: user databases, white/blacklists databases, reliability score databases and knowledge databases.
  • The methods and apparatus or system to provide a customer with multimedia documents, tagged with a reliability score are computer-implemented, as discussed above. While a variety of different computer hardware (electronic circuit) embodiments are envisioned, FIG. 8 illustrates some of these.
  • As shown in FIG. 8, the processor 500 of a multimedia document retrieval computer or server is coupled to communicate with a network, such as the Internet 502. Attached to processor 500 is a memory circuit (e.g., RAM memory) in which has been stored web crawler engine code 506. Such code may be written in a variety of different computer languages, such as Python, C++, Java, and the like. Alternatively a publicly available web crawler system such as PolyBot, UbiCrawler, C-proc, Dominos, or the like may be used.
  • Processor 500 is thus programmed to crawl the network (Internet 502) to retrieve multimedia documents 507, which processor 500 stores in a storage device 508 attached. Storage device 508 may be configured as a database as discussed above.
  • Once a relevant corpus of multimedia documents 507 has been collected and stored, each multimedia document is scored as discussed above. To perform such scoring, a processor 500 a (which could be the same physical processor as processor 500, or a different processor) executes score calculation code 514 stored in the memory 504 a attached to processor 500 a. If processor 500 a and processor 500 are the same device, memory 504 a may be an allocated portion of memory 504.
  • The processor 500 a, is configured to access the database of multimedia documents 507. Thus processor 500 a may either access the same storage device 508 as attached to processor 500, or it may have its own attached storage device 508 a. In FIG. 8, one graphical representation has been provided in association with both reference numerals 508 and 508 a to illustrate that the storage device functionality may be the implemented using same physical device or implemented as separate physical devices.
  • In executing the score calculation code, based on the score calculation discussion above, the processor 500 a uses intrinsic criteria and extrinsic criteria. These criteria are both stored in memory 504 a, as at 510 and 512, respectively. As each multimedia document is scored, its calculated reliability score 516 is associated with that multimedia document and stored as part of the database record for that document within a storage device 508 b. Similar to the explanation above, if desired, the functionality of storage device 508 b can be implemented using the same physical storage device as connected to processor 500 a (and/or processor 500).
  • With the corpus of multimedia documents 507 now each having an associated reliability score 516, they are ready to be accessed by a user or customer. This may be effected by providing access via a web server. To implement this, processor 500 b is coupled to the network (e.g. Internet 502). Processor 506 b may be physically separate from processors 500 a and 500, or it may be the same physical device as processors 500 and/or 500 a. Attached to processor 500 b is memory 504 b in which executable web server code is stored. Suitable web server code may be implemented using publicly available Apache HTTP web server code, for example.
  • Processor 500 b is attached to storage device 508 b, which may be the same physical storage device as devices 508 a and/or 508, or which may be a separate device, storing a copy of the data transferred from device 508 a. The user or customer accesses the web site established on the network by processor 500 b and through this connection the user or customer enters his or her requests for data, specifying any special criteria as discussed above. The processor 500 b delivers selected multimedia content from the corpus of multimedia documents 507 that meet the user or customer's requirements, as more fully explained above. If desired the executable instructions for some or all of the functions described above (e.g., the executable code stored at 506, 514 and 518), as well as the data structures and schema of the database configured within storage device(s) 508, 508 a, 508 b and as well as the data structure definitions in which the intrinsic criteria 510 and extrinsic criteria 512 are stored may be stored in non-transitory computer readable media.
  • The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.

Claims (22)

What is claimed is:
1. A method to provide a customer with at least one multimedia document associated with a reliability score calculated by applying a first category of intrinsic criteria in realtime then a second category of extrinsic criteria, the method comprising initial steps of pre-calculating the reliability score for at least a set of multimedia documents coming from at least one pre-selected source of multimedia documents, and later steps of providing, in response to a request, the multimedia documents from the pre-selected sources associated with the score and the multimedia documents from the other sources associated with a score conditionally calculated.
2. The method according to claim 1 further comprising updating this reliability score by applying the extrinsic criteria.
3. The method according to claim 1 further comprising activating a calculation of the score for the multimedia documents from the other sources by an action from the customer.
4. The method according to claim 1 further comprising activating a calculation of the score for the multimedia documents from the other sources by the detection of at least one request coming from the customer's device in the case of getting multimedia documents without a pre-calculated score.
5. The method according to claim 3 wherein the score of multimedia documents coming from the other sources are pre-calculated when a threshold of interest is reached.
6. The method according to claim 1, further comprising prioritizing the processing of pre-calculation of the reliability score of the multimedia documents within a pre-selected source according to an interest indicator.
7. The method according to claim 6 further comprising weighting the interest indicator by the number of requests.
8. The method according to claim 6 further comprising weighting the interest indicator by the measure of engagements.
9. The method according to claim 1 further comprising time-stamping the reliability scores and associating the scores to the related time-stamped version of the multimedia documents.
10. The method according to claim 9 wherein when a discrepancy is detected between the time-stamp of the reliability score and the time-stamp of the related document, the pre-processing of the multimedia document is updated.
11. The method according to claim 10 further comprising performing the detection of the discrepancy by the customer's device.
12. The method according to claim 1 further comprising computing the representation of the scored documents by the customer's device by an aggregation of the multimedia documents coming from sources servers and the related scores coming from the score processing server.
13. The method according to claim 1 further comprising performing the multimedia document acquisition by the customer's device and the score processing server.
14. The method according to claim 1 further comprising computing, for at least one source, a global reliability score for the source, based on the reliability score of its multimedia documents.
15. The method according to claim 1 further comprising computing, for at least one source, a global reliability score for the source, based on the number of visionary multimedia documents that it has published.
16. The method according to claim 1 wherein the global reliability score of the source dynamically evolves in time.
17. The method according to claim 1 further comprising filtering in order to block the displaying of multimedia documents with a reliability score below a certain threshold.
18. The method according to claim 1 further comprising filtering, according to a white list, to exclude the checking for an existing reliability score for the multimedia documents coming from the sources not belonging to the white list.
19. The method according to claim 1 further comprising filtering, according to a black list, to exclude the checking for an existing reliability score for documents coming from the sources not belonging to the black list.
20. A computer readable medium operable to execute the following steps on a processor of a computer, the computer readable medium comprising:
pre-calculating the reliability score for at least a set of multimedia documents coming from at least one pre-selected source of multimedia documents; and
providing, in response to a request, the multimedia documents from the pre-selected sources associated with the updated score and the multimedia documents from the other sources associated with a score conditionally calculated.
21. The computer readable medium according to claim 20 wherein it is operable to execute at least an additional step of updating this reliability score by applying the extrinsic criteria.
22. A system for the processing of information to provide a customer with at least one multimedia document associated with a reliability score calculated by applying a first category of intrinsic criteria and a second category of extrinsic criteria, comprising:
a source of a machine-readable specification of intrinsic criteria and a extrinsic criteria; and
at least one server engine, coupled with a user interface engine and with the source of a machine-readable specification, the server engine including resources to browse document sources in order to get newly published multimedia documents and to process the first steps for the pre-calculating the reliability score for at least one set of multimedia documents coming from at least one pre-selected source of multimedia documents, and steps of providing, in response to a request, the multimedia documents from the pre-selected sources associated with the updated score and the multimedia documents from the other sources associated with a score conditionally calculated.
US14/039,333 2013-09-27 2013-09-27 Apparatus, systems and methods for scoring the reliability of online information Abandoned US20150095320A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US14/039,333 US20150095320A1 (en) 2013-09-27 2013-09-27 Apparatus, systems and methods for scoring the reliability of online information
PCT/EP2014/070331 WO2015044179A1 (en) 2013-09-27 2014-09-24 Apparatus, systems and methods for scoring and distributing the reliability of online information
US15/024,574 US10169424B2 (en) 2013-09-27 2014-09-24 Apparatus, systems and methods for scoring and distributing the reliability of online information
US16/190,824 US10915539B2 (en) 2013-09-27 2018-11-14 Apparatus, systems and methods for scoring and distributing the reliablity of online information
US17/141,720 US11755595B2 (en) 2013-09-27 2021-01-05 Apparatus, systems and methods for scoring and distributing the reliability of online information
US18/123,584 US20230252034A1 (en) 2013-09-27 2023-03-20 Apparatus, systems and methods for scoring and distributing the reliablity of online information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/039,333 US20150095320A1 (en) 2013-09-27 2013-09-27 Apparatus, systems and methods for scoring the reliability of online information

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US15/024,574 Continuation-In-Part US10169424B2 (en) 2013-09-27 2014-09-24 Apparatus, systems and methods for scoring and distributing the reliability of online information
PCT/EP2014/070331 Continuation-In-Part WO2015044179A1 (en) 2013-09-27 2014-09-24 Apparatus, systems and methods for scoring and distributing the reliability of online information

Publications (1)

Publication Number Publication Date
US20150095320A1 true US20150095320A1 (en) 2015-04-02

Family

ID=51619171

Family Applications (3)

Application Number Title Priority Date Filing Date
US14/039,333 Abandoned US20150095320A1 (en) 2013-09-27 2013-09-27 Apparatus, systems and methods for scoring the reliability of online information
US17/141,720 Active 2034-01-16 US11755595B2 (en) 2013-09-27 2021-01-05 Apparatus, systems and methods for scoring and distributing the reliability of online information
US18/123,584 Pending US20230252034A1 (en) 2013-09-27 2023-03-20 Apparatus, systems and methods for scoring and distributing the reliablity of online information

Family Applications After (2)

Application Number Title Priority Date Filing Date
US17/141,720 Active 2034-01-16 US11755595B2 (en) 2013-09-27 2021-01-05 Apparatus, systems and methods for scoring and distributing the reliability of online information
US18/123,584 Pending US20230252034A1 (en) 2013-09-27 2023-03-20 Apparatus, systems and methods for scoring and distributing the reliablity of online information

Country Status (2)

Country Link
US (3) US20150095320A1 (en)
WO (1) WO2015044179A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302316A1 (en) * 2014-04-22 2015-10-22 Google Inc. System and method for determining unwanted phone messages
WO2017047876A1 (en) * 2015-09-18 2017-03-23 충북대학교 산학협력단 Reliability evaluation method and system on basis of user activity analysis on social media
CN107229624A (en) * 2016-03-23 2017-10-03 百度在线网络技术(北京)有限公司 A kind of page provides method and the page provides device
US10546034B2 (en) 2015-09-18 2020-01-28 Chungbuk National University Industry Academic Cooperation Foundation Method and system for evaluating reliability based on analysis of user activities on social medium
US20200053409A1 (en) * 2009-12-18 2020-02-13 Crossbar Media Group, Inc Systems and Methods for Automated Extraction of Closed Captions in Real Time or Near Real-Time and Tagging of Streaming Data for Advertisements
US10762122B2 (en) * 2016-03-18 2020-09-01 Alibaba Group Holding Limited Method and device for assessing quality of multimedia resource
US20210019304A1 (en) * 2019-07-15 2021-01-21 fakeOut Ltd. System and method retrieving, analyzing, evaluating and concluding data and sources
US20210286988A1 (en) * 2018-07-31 2021-09-16 Claus Eichmann Computer-implemented method for detecting document content from a document

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9087048B2 (en) 2011-06-10 2015-07-21 Linkedin Corporation Method of and system for validating a fact checking system
US8768782B1 (en) 2011-06-10 2014-07-01 Linkedin Corporation Optimized cloud computing fact checking
US9483159B2 (en) 2012-12-12 2016-11-01 Linkedin Corporation Fact checking graphical user interface including fact checking icons
US8990234B1 (en) 2014-02-28 2015-03-24 Lucas J. Myslinski Efficient fact checking method and system
US9643722B1 (en) 2014-02-28 2017-05-09 Lucas J. Myslinski Drone device security system
US9972055B2 (en) 2014-02-28 2018-05-15 Lucas J. Myslinski Fact checking method and system utilizing social networking information
US11809434B1 (en) * 2014-03-11 2023-11-07 Applied Underwriters, Inc. Semantic analysis system for ranking search results
US9189514B1 (en) 2014-09-04 2015-11-17 Lucas J. Myslinski Optimized fact checking method and system
US20170116194A1 (en) * 2015-10-23 2017-04-27 International Business Machines Corporation Ingestion planning for complex tables
US11762864B2 (en) * 2018-10-31 2023-09-19 Kyndryl, Inc. Chat session external content recommender

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109285A1 (en) * 2006-10-26 2008-05-08 Mobile Content Networks, Inc. Techniques for determining relevant advertisements in response to queries
US20100023525A1 (en) * 2006-01-05 2010-01-28 Magnus Westerlund Media container file management
US20100332583A1 (en) * 1999-07-21 2010-12-30 Andrew Szabo Database access system

Family Cites Families (147)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6240555B1 (en) 1996-03-29 2001-05-29 Microsoft Corporation Interactive entertainment system for presenting supplemental interactive content together with continuous video programs
US5897616A (en) 1997-06-11 1999-04-27 International Business Machines Corporation Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
US5960411A (en) 1997-09-12 1999-09-28 Amazon.Com, Inc. Method and system for placing a purchase order via a communications network
US6266664B1 (en) 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6961954B1 (en) 1997-10-27 2005-11-01 The Mitre Corporation Automated segmentation, information extraction, summarization, and presentation of broadcast news
US6782510B1 (en) 1998-01-27 2004-08-24 John N. Gross Word checking tool for controlling the language content in documents using dictionaries with modifyable status fields
US6256734B1 (en) 1998-02-17 2001-07-03 At&T Method and apparatus for compliance checking in a trust management system
US20030001880A1 (en) 2001-04-18 2003-01-02 Parkervision, Inc. Method, system, and computer program product for producing and distributing enhanced media
KR100351400B1 (en) * 1999-09-22 2002-09-10 엘지전자 주식회사 Relevance feedback system of multimedia data
US7725307B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
JP2001326914A (en) 2000-03-08 2001-11-22 Sony Corp Distribution processing system for contents of electronic information, information distribution device, information processor and distribution processing method for contents of electronic information
US20010030667A1 (en) 2000-04-10 2001-10-18 Kelts Brett R. Interactive display interface for information objects
US20020112237A1 (en) 2000-04-10 2002-08-15 Kelts Brett R. System and method for providing an interactive display interface for information objects
US20020099730A1 (en) 2000-05-12 2002-07-25 Applied Psychology Research Limited Automatic text classification system
WO2002013065A1 (en) * 2000-08-03 2002-02-14 Epstein Bruce A Information collaboration and reliability assessment
US20060015904A1 (en) 2000-09-08 2006-01-19 Dwight Marcus Method and apparatus for creation, distribution, assembly and verification of media
US8302127B2 (en) 2000-09-25 2012-10-30 Thomson Licensing System and method for personalized TV
US20040103032A1 (en) 2000-10-12 2004-05-27 Maggio Frank S. Remote control system and method for interacting with broadcast content
US7439847B2 (en) 2002-08-23 2008-10-21 John C. Pederson Intelligent observation and identification database system
EP1346559A4 (en) 2000-11-16 2006-02-01 Mydtv Inc System and methods for determining the desirability of video programming events
US20020083468A1 (en) 2000-11-16 2002-06-27 Dudkiewicz Gil Gavriel System and method for generating metadata for segments of a video program
US8479238B2 (en) * 2001-05-14 2013-07-02 At&T Intellectual Property Ii, L.P. Method for content-based non-linear control of multimedia playback
US6907466B2 (en) 2001-11-08 2005-06-14 Extreme Networks, Inc. Methods and systems for efficiently delivering data to a plurality of destinations in a computer network
US7249058B2 (en) 2001-11-13 2007-07-24 International Business Machines Corporation Method of promoting strategic documents by bias ranking of search results
US20030158872A1 (en) 2002-02-19 2003-08-21 Media Vu, Llc Method and system for checking content before dissemination
SE521896C2 (en) 2002-02-28 2003-12-16 Ericsson Telefon Ab L M A method and a distributed tariff calculation system for determining tariff data in a billing system
US20030210249A1 (en) 2002-05-08 2003-11-13 Simske Steven J. System and method of automatic data checking and correction
US20050022252A1 (en) 2002-06-04 2005-01-27 Tong Shen System for multimedia recognition, analysis, and indexing, using text, audio, and digital video
US6946715B2 (en) 2003-02-19 2005-09-20 Micron Technology, Inc. CMOS image sensor and method of fabrication
US7249380B2 (en) 2002-09-05 2007-07-24 Yinan Yang Method and apparatus for evaluating trust and transitivity of trust of online services
WO2004034755A2 (en) 2002-10-11 2004-04-22 Maggio Frank S Remote control system and method for interacting with broadcast content
US20040122846A1 (en) 2002-12-19 2004-06-24 Ibm Corporation Fact verification system
US7802283B2 (en) 2002-12-20 2010-09-21 Shailen V Banker Linked information system
US7577655B2 (en) 2003-09-16 2009-08-18 Google Inc. Systems and methods for improving the ranking of news articles
US20050108024A1 (en) 2003-11-13 2005-05-19 Fawcett John Jr. Systems and methods for retrieving data
AU2003294095B2 (en) * 2003-11-27 2010-10-07 Advestigo System for intercepting multimedia documents
US20050120391A1 (en) 2003-12-02 2005-06-02 Quadrock Communications, Inc. System and method for generation of interactive TV content
US20050132420A1 (en) 2003-12-11 2005-06-16 Quadrock Communications, Inc System and method for interaction with television content
WO2005114450A1 (en) 2004-05-14 2005-12-01 Nielsen Media Research, Inc. Methods and apparatus for identifying media content
US7478078B2 (en) 2004-06-14 2009-01-13 Friendster, Inc. Method for sharing relationship information stored in a social network database with third party databases
US20070043766A1 (en) 2005-08-18 2007-02-22 Nicholas Frank C Method and System for the Creating, Managing, and Delivery of Feed Formatted Content
KR101212929B1 (en) 2004-09-27 2012-12-14 구글 인코포레이티드 Secure data gathering from rendered documents
US20080077570A1 (en) 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
US7266116B2 (en) 2004-12-13 2007-09-04 Skylead Assets Limited HTTP extension header for metering information
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US7487334B2 (en) 2005-02-03 2009-02-03 International Business Machines Corporation Branch encoding before instruction cache write
US8280882B2 (en) 2005-04-21 2012-10-02 Case Western Reserve University Automatic expert identification, ranking and literature search based on authorship in large document collections
US9384345B2 (en) 2005-05-03 2016-07-05 Mcafee, Inc. Providing alternative web content based on website reputation assessment
TW200701022A (en) 2005-06-24 2007-01-01 Era Digital Media Co Interactive news gathering and media production control system
US20070011710A1 (en) 2005-07-05 2007-01-11 Fu-Sheng Chiu Interactive news gathering and media production control system
US20070100730A1 (en) 2005-11-01 2007-05-03 Dmitry Batashvili Normalization algorithm for improving performance of modules depending on price feed changes in real-time transactional trading systems
US7734106B1 (en) * 2005-12-21 2010-06-08 Maxim Integrated Products, Inc. Method and apparatus for dependent coding in low-delay video compression
US7761293B2 (en) 2006-03-06 2010-07-20 Tran Bao Q Spoken mobile engine
JP5649303B2 (en) 2006-03-30 2015-01-07 エスアールアイ インターナショナルSRI International Method and apparatus for annotating media streams
US20080082662A1 (en) 2006-05-19 2008-04-03 Richard Dandliker Method and apparatus for controlling access to network resources based on reputation
US8286218B2 (en) 2006-06-08 2012-10-09 Ajp Enterprises, Llc Systems and methods of customized television programming over the internet
US7831928B1 (en) 2006-06-22 2010-11-09 Digg, Inc. Content visualization
US20080177793A1 (en) * 2006-09-20 2008-07-24 Michael Epstein System and method for using known path data in delivering enhanced multimedia content to mobile devices
US20080109780A1 (en) 2006-10-20 2008-05-08 International Business Machines Corporation Method of and apparatus for optimal placement and validation of i/o blocks within an asic
US8072467B2 (en) 2007-01-31 2011-12-06 Microsoft Corporation Request-driven on-demand processing
US8306816B2 (en) 2007-05-25 2012-11-06 Tigerfish Rapid transcription by dispersing segments of source material to a plurality of transcribing stations
US20090012847A1 (en) 2007-07-03 2009-01-08 3M Innovative Properties Company System and method for assessing effectiveness of communication content
US8725673B2 (en) * 2007-08-22 2014-05-13 Linkedin Corporation Evaluating an item based on user reputation information
US20090063294A1 (en) 2007-09-05 2009-03-05 Dennis Hoekstra Scoring Feed Data Quality
US20090125382A1 (en) 2007-11-07 2009-05-14 Wise Window Inc. Quantifying a Data Source's Reputation
US7809721B2 (en) 2007-11-16 2010-10-05 Iac Search & Media, Inc. Ranking of objects using semantic and nonsemantic features in a system and method for conducting a search
WO2009089116A2 (en) 2008-01-02 2009-07-16 Three Purple Dots, Inc. Systems and methods for determining the relative bias and accuracy of a piece of news
US20090187951A1 (en) * 2008-01-21 2009-07-23 At&T Knowledge Ventures, L.P. System for preventing duplicate recordings
US20090210395A1 (en) 2008-02-12 2009-08-20 Sedam Marc C Methods, systems, and computer readable media for dynamically searching and presenting factually tagged media clips
US9092789B2 (en) 2008-04-03 2015-07-28 Infosys Limited Method and system for semantic analysis of unstructured data
US8086557B2 (en) 2008-04-22 2011-12-27 Xerox Corporation Method and system for retrieving statements of information sources and associating a factuality assessment to the statements
US8666824B2 (en) * 2008-04-23 2014-03-04 Dell Products L.P. Digital media content location and purchasing system
US8882512B2 (en) 2008-06-11 2014-11-11 Pacific Metrics Corporation System and method for scoring constructed responses
US8290924B2 (en) 2008-08-29 2012-10-16 Empire Technology Development Llc Providing answer to keyword based query from natural owner of information
US8290960B2 (en) 2008-10-24 2012-10-16 International Business Machines Corporation Configurable trust context assignable to facts and associated trust metadata
US20100205628A1 (en) 2009-02-12 2010-08-12 Davis Bruce L Media processing methods and arrangements
US8566088B2 (en) 2008-11-12 2013-10-22 Scti Holdings, Inc. System and method for automatic speech to text conversion
US20100121973A1 (en) 2008-11-12 2010-05-13 Yuliya Lobacheva Augmentation of streaming media
WO2010105246A2 (en) 2009-03-12 2010-09-16 Exbiblio B.V. Accessing resources based on capturing information from a rendered document
US20100235313A1 (en) 2009-03-16 2010-09-16 Tim Rea Media information analysis and recommendation platform
US9300755B2 (en) 2009-04-20 2016-03-29 Matthew Gerke System and method for determining information reliability
US20100306166A1 (en) 2009-06-01 2010-12-02 Yahoo! Inc. Automatic fact validation
US20110063522A1 (en) 2009-09-14 2011-03-17 Jeyhan Karaoguz System and method for generating television screen pointing information using an external receiver
US8364776B1 (en) * 2009-09-15 2013-01-29 Symantec Corporation Method and system for employing user input for website classification
US8280838B2 (en) 2009-09-17 2012-10-02 International Business Machines Corporation Evidence evaluation system and method based on question answering
US8589359B2 (en) 2009-10-12 2013-11-19 Motorola Solutions, Inc. Method and apparatus for automatically ensuring consistency among multiple spectrum databases
EP2488963A1 (en) 2009-10-15 2012-08-22 Rogers Communications Inc. System and method for phrase identification
US20110106615A1 (en) 2009-11-03 2011-05-05 Yahoo! Inc. Multimode online advertisements and online advertisement exchanges
US20110136542A1 (en) 2009-12-09 2011-06-09 Nokia Corporation Method and apparatus for suggesting information resources based on context and preferences
US9619469B2 (en) * 2009-12-22 2017-04-11 Apple Inc. Adaptive image browsing
US9032466B2 (en) 2010-01-13 2015-05-12 Qualcomm Incorporated Optimized delivery of interactivity event assets in a mobile broadcast communication system
TWI529703B (en) * 2010-02-11 2016-04-11 杜比實驗室特許公司 System and method for non-destructively normalizing loudness of audio signals within portable devices
US8666961B1 (en) * 2010-03-19 2014-03-04 Waheed Qureshi Platform for generating, managing and sharing content clippings and associated citations
US9002700B2 (en) 2010-05-13 2015-04-07 Grammarly, Inc. Systems and methods for advanced grammar checking
US8625907B2 (en) * 2010-06-10 2014-01-07 Microsoft Corporation Image clustering
US8775400B2 (en) 2010-06-30 2014-07-08 Microsoft Corporation Extracting facts from social network messages
US20120023145A1 (en) * 2010-07-23 2012-01-26 International Business Machines Corporation Policy-based computer file management based on content-based analytics
US9659313B2 (en) * 2010-09-27 2017-05-23 Unisys Corporation Systems and methods for managing interactive features associated with multimedia content
US20120078691A1 (en) * 2010-09-27 2012-03-29 Johney Tsai Systems and methods for providing multimedia content editing and management tools
US20120078712A1 (en) * 2010-09-27 2012-03-29 Fontana James A Systems and methods for processing and delivery of multimedia content
US20120078899A1 (en) * 2010-09-27 2012-03-29 Fontana James A Systems and methods for defining objects of interest in multimedia content
US9332319B2 (en) * 2010-09-27 2016-05-03 Unisys Corporation Amalgamating multimedia transcripts for closed captioning from a plurality of text to speech conversions
US20120102405A1 (en) 2010-10-25 2012-04-26 Evidence-Based Solutions, Inc. System and method for matching person-specific data with evidence resulting in recommended actions
US9449024B2 (en) * 2010-11-19 2016-09-20 Microsoft Technology Licensing, Llc File kinship for multimedia data tracking
US8301640B2 (en) * 2010-11-24 2012-10-30 King Abdulaziz City For Science And Technology System and method for rating a written document
US8396876B2 (en) * 2010-11-30 2013-03-12 Yahoo! Inc. Identifying reliable and authoritative sources of multimedia content
US20120191753A1 (en) 2011-01-20 2012-07-26 John Nicholas Gross System & Method For Assessing & Responding to Intellectual Property Rights Proceedings/Challenges
US20120198319A1 (en) 2011-01-28 2012-08-02 Giovanni Agnoli Media-Editing Application with Video Segmentation and Caching Capabilities
US9244818B1 (en) * 2011-03-29 2016-01-26 Amazon Technologies, Inc. Automated selection of quality control tests to run on a software application
US20120272143A1 (en) 2011-04-22 2012-10-25 John Gillick System and Method for Audience-Vote-Based Copyediting
WO2012167365A1 (en) * 2011-06-07 2012-12-13 In Situ Media Corporation System and method for identifying and altering images in a digital video
US9087048B2 (en) 2011-06-10 2015-07-21 Linkedin Corporation Method of and system for validating a fact checking system
US9176957B2 (en) 2011-06-10 2015-11-03 Linkedin Corporation Selective fact checking method and system
US20120317046A1 (en) 2011-06-10 2012-12-13 Myslinski Lucas J Candidate fact checking method and system
US9015037B2 (en) 2011-06-10 2015-04-21 Linkedin Corporation Interactive fact checking system
US8185448B1 (en) 2011-06-10 2012-05-22 Myslinski Lucas J Fact checking method and system
US20130159127A1 (en) 2011-06-10 2013-06-20 Lucas J. Myslinski Method of and system for rating sources for fact checking
US20130036353A1 (en) * 2011-08-05 2013-02-07 At&T Intellectual Property I, L.P. Method and Apparatus for Displaying Multimedia Information Synchronized with User Activity
JP5367031B2 (en) * 2011-08-11 2013-12-11 株式会社ソニー・コンピュータエンタテインメント Information processing method and information processing apparatus
US20130110748A1 (en) 2011-08-30 2013-05-02 Google Inc. Policy Violation Checker
EP2769540B1 (en) * 2011-10-20 2018-11-28 Dolby Laboratories Licensing Corporation Method and system for video equalization
US9069648B2 (en) * 2012-01-25 2015-06-30 Martin Kelly Jones Systems and methods for delivering activity based suggestive (ABS) messages
US9449089B2 (en) * 2012-05-07 2016-09-20 Pixability, Inc. Methods and systems for identifying distribution opportunities
US8861932B2 (en) * 2012-05-18 2014-10-14 At&T Mobility Ii Llc Video service buffer management
US20130317891A1 (en) * 2012-05-24 2013-11-28 Rawllin International Inc. Content rating and weighting system
US10303723B2 (en) * 2012-06-12 2019-05-28 Excalibur Ip, Llc Systems and methods involving search enhancement features associated with media modules
US20130346160A1 (en) 2012-06-26 2013-12-26 Myworld, Inc. Commerce System and Method of Using Consumer Feedback to Invoke Corrective Action
JP6120169B2 (en) * 2012-07-25 2017-04-26 パナソニックIpマネジメント株式会社 Image editing device
US9292552B2 (en) * 2012-07-26 2016-03-22 Telefonaktiebolaget L M Ericsson (Publ) Apparatus, methods, and computer program products for adaptive multimedia content indexing
US9461876B2 (en) * 2012-08-29 2016-10-04 Loci System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction
US10515412B2 (en) 2012-09-11 2019-12-24 Sage Decision Systems, Llc System and method for calculating future value
KR20140038577A (en) * 2012-09-11 2014-03-31 한국과학기술연구원 Recommendation for multimedia contents by using metadata
US20140098899A1 (en) * 2012-10-05 2014-04-10 Cheetah Technologies, L.P. Systems and processes for estimating and determining causes of video artifacts and video source delivery issues in a packet-based video broadcast system
US20140113258A1 (en) * 2012-10-20 2014-04-24 Elizabethtown College Electronic Hand Assessment Tool and Method of Using the Same
US9258353B2 (en) * 2012-10-23 2016-02-09 Microsoft Technology Licensing, Llc Multiple buffering orders for digital content item
US8805114B2 (en) * 2012-11-27 2014-08-12 Texas Instruments Incorporated Content adaptive edge and detail enhancement for image and video processing
US9436517B2 (en) * 2012-12-28 2016-09-06 Microsoft Technology Licensing, Llc Reliability-aware application scheduling
CN104937844B (en) * 2013-01-21 2018-08-28 杜比实验室特许公司 Optimize loudness and dynamic range between different playback apparatus
US20140255003A1 (en) * 2013-03-05 2014-09-11 Google Inc. Surfacing information about items mentioned or presented in a film in association with viewing the film
WO2014138115A1 (en) * 2013-03-05 2014-09-12 Pierce Global Threat Intelligence, Inc Systems and methods for detecting and preventing cyber-threats
US9177072B2 (en) 2013-03-14 2015-11-03 Facebook, Inc. Social cache
US8937686B2 (en) * 2013-03-14 2015-01-20 Drs Rsta, Inc. Ultra low latency video fusion
US8990638B1 (en) * 2013-03-15 2015-03-24 Digimarc Corporation Self-stabilizing network nodes in mobile discovery system
US9378065B2 (en) * 2013-03-15 2016-06-28 Advanced Elemental Technologies, Inc. Purposeful computing
US20140281012A1 (en) * 2013-03-15 2014-09-18 Francois J. Malassenet Systems and methods for identifying and separately presenting different portions of multimedia content
US10133816B1 (en) * 2013-05-31 2018-11-20 Google Llc Using album art to improve audio matching quality
US20150020106A1 (en) * 2013-07-11 2015-01-15 Rawllin International Inc. Personalized video content from media sources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332583A1 (en) * 1999-07-21 2010-12-30 Andrew Szabo Database access system
US20100023525A1 (en) * 2006-01-05 2010-01-28 Magnus Westerlund Media container file management
US8225164B2 (en) * 2006-01-05 2012-07-17 Telefonaktiebolaget Lm Ericsson (Publ) Media container file management
US20080109285A1 (en) * 2006-10-26 2008-05-08 Mobile Content Networks, Inc. Techniques for determining relevant advertisements in response to queries

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200053409A1 (en) * 2009-12-18 2020-02-13 Crossbar Media Group, Inc Systems and Methods for Automated Extraction of Closed Captions in Real Time or Near Real-Time and Tagging of Streaming Data for Advertisements
US20150302316A1 (en) * 2014-04-22 2015-10-22 Google Inc. System and method for determining unwanted phone messages
WO2017047876A1 (en) * 2015-09-18 2017-03-23 충북대학교 산학협력단 Reliability evaluation method and system on basis of user activity analysis on social media
US10546034B2 (en) 2015-09-18 2020-01-28 Chungbuk National University Industry Academic Cooperation Foundation Method and system for evaluating reliability based on analysis of user activities on social medium
US10762122B2 (en) * 2016-03-18 2020-09-01 Alibaba Group Holding Limited Method and device for assessing quality of multimedia resource
CN107229624A (en) * 2016-03-23 2017-10-03 百度在线网络技术(北京)有限公司 A kind of page provides method and the page provides device
US20210286988A1 (en) * 2018-07-31 2021-09-16 Claus Eichmann Computer-implemented method for detecting document content from a document
US20210019304A1 (en) * 2019-07-15 2021-01-21 fakeOut Ltd. System and method retrieving, analyzing, evaluating and concluding data and sources

Also Published As

Publication number Publication date
US20230252034A1 (en) 2023-08-10
US20210182301A1 (en) 2021-06-17
WO2015044179A1 (en) 2015-04-02
US11755595B2 (en) 2023-09-12

Similar Documents

Publication Publication Date Title
US11755595B2 (en) Apparatus, systems and methods for scoring and distributing the reliability of online information
US10915539B2 (en) Apparatus, systems and methods for scoring and distributing the reliablity of online information
US9923931B1 (en) Systems and methods for identifying violation conditions from electronic communications
Castillo Big crisis data: social media in disasters and time-critical situations
US9535911B2 (en) Processing a content item with regard to an event
US10146878B2 (en) Method and system for creating filters for social data topic creation
US9323826B2 (en) Methods, apparatus and software for analyzing the content of micro-blog messages
US20190286676A1 (en) Contextual content collection, filtering, enrichment, curation and distribution
JP6538277B2 (en) Identify query patterns and related aggregate statistics among search queries
KR20160021110A (en) Text matching device and method, and text classification device and method
US11443006B2 (en) Intelligent browser bookmark management
Andrews et al. Organised crime and social media: a system for detecting, corroborating and visualising weak signals of organised crime online
Troudi et al. A new mashup based method for event detection from social media
US11423223B2 (en) Dynamic creation/expansion of cognitive model dictionaries based on analysis of natural language content
CN109933709B (en) Public opinion tracking method and device for video text combined data and computer equipment
Gopal et al. Machine learning based classification of online news data for disaster management
Bastin et al. Media Corpora, Text Mining, and the Sociological Imagination-A free software text mining approach to the framing of Julian Assange by three news agencies using R. TeMiS
AU2018273369A1 (en) Automated classification of network-accessible content
US20230090601A1 (en) System and method for polarity analysis
US8195458B2 (en) Open class noun classification
CN109902099B (en) Public opinion tracking method and device based on graphic and text big data and computer equipment
KR101487297B1 (en) Web page contents confirmation system and method using categoryclassification
Kim et al. Predicting the scale of trending topic diffusion among online communities
van Hoof et al. Googling Politics? The Computational Identification of Political and News-related Searches from Web Browser Histories
Fujino et al. Finding similar tweets and similar users by applying document similarity to twitter streaming data

Legal Events

Date Code Title Description
AS Assignment

Owner name: TROOCLICK FRANCE, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOTTE, STANISLAS;RUTI, RAMON;JACOLIN, ARNAUD;AND OTHERS;REEL/FRAME:032064/0789

Effective date: 20130927

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MYSLINSKI, LUCAS J, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TROOCLICK FRANCE;REEL/FRAME:038096/0250

Effective date: 20160323

AS Assignment

Owner name: TROOCLICK FRANCE, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOTTE, STANISLAS;RUTI, RAMON;JACOLIN, ARNAUD;AND OTHERS;REEL/FRAME:047515/0680

Effective date: 20130927