CA2443036A1

CA2443036A1 - System and method for improved searching on the internet or similar networks and especially improved metanews and/or improved automatically generated newspapers.

Info

Publication number: CA2443036A1
Application number: CA002443036A
Authority: CA
Inventors: Yaron Mayer
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-09-14
Filing date: 2003-09-14
Publication date: 2005-03-14

Abstract

Google has recently made available at http://news.google.com an automat ed "newspaper", which searches continuously about 4,500 news sources, and lets users view automatically generated headlines in a few general areas or lets users search for news by keywords. The automatic determination of which news items or news stories are most important is done by 3 main criteria: In how many sources the news item appeared, how important are the news sources in which it appeared, and how close it is to the top in each of these news sources. However, many problems still remain, such as for example: a. The choice of a single main news source and a single image for each item seems arbitrary to the user and limits the user. b. If the use r clicks on the "related items" link for that item the user always gets a linear list of typically hundreds or even more than a thousand links to related news items, sorted either by relevance or by time, however, the new list is now without any images and without any clustering, so that many times news stories that are about the same even t or even identical, may appear at different positions in the list of related links, a nd various other news items may appear between them and are typically also dispersed in various places. This makes it vary hard for the user to take advantage efficiently o f the list of related items. The present invention solves the above problem by creating recursive clustering, so that preferably at any level in the tree the user can preferably either choose a specific news item from the cluster or from the shown sub-clusters or continue in the tree. Another improvement is that searching the Meta News by keywords can generate an automatic newspaper in a way similar to the origina l automatically generated newspaper. Many additional improvements to the conce pt of automated newspapers and/or news MetaSearch are also shown. Other improvemen ts are suggested for improved shareware MetaSearch and for improved Web pages search.

Description

L4/09/03 Yaron Mayer 2/50 Background of the invention Field of the invention:
The present invention relates to improved searching on the Internet or similar networks and especially Meta News and/or improved automatically generated newspapers, and more specifically to a system and method for improved automatic collection and displaying of news items on the Internet.
Back _rground The Internet makes it possible for users to access vast amounts of information, thus becoming effectively the world's largest library and the world's largest database. This opens up fascinating new possibilities, such as for example automatically accessing a huge amount of news sources in order to present to the user for example an automatically edited "news paper", which automatically selects the most important events or news items according to various criteria. However, one of the biggest problems is integrating efficiently vast amounts of information and analyzing it.
Google has recently made available at http:!/news.~oo~le.com an automated "newspaper", which searches continuously about 4,500 news sources, and lets users view automatically generated headlines in one of a few general areas (which are currently: Top Stories, World, US, Business, Sci/Tech, Sports, Entertainment and Health), or one newspaper divided to the above sections, or lets users search for news by keywords. In addition, users can choose between a number of possible countries (which are currently: Australia, Canada, France, Deutschland, India, lt:alia, New Zealand, U.K., US), and thus news items can change according to the chosen country.
The automatic determination of which news items or news stories are most important is done by 3 main criteria: In how many sources the news item appeared, how important are the news sources in which it appeared, and how close is it to the top in each of these news sources.
However, many problems still remain, such as for example:
1. The current system chooses for each headline just one of the possible sources (Including the first sentence in that news item) and also a photo from one of the possible sources (typically from another source), and typically indicates below in smaller print a few additional related headline links below, and then a few 14/09/03 Yaron Mayer 3/$0 additional names of news sources below, which also link to related items, and then there is a final link to typically a few hundreds of additional related links.
This leads to the following problems:
a. The choice of a single main news source and a single image for each item seems arbitrary to the user and leads him to prefer this source for reading the full news item, since he has much less information about the other links.
b. Similarly, the choice of the additional smaller links below also seems arbitrary to the user.
c. Due to space limitations the clustering possibilities in the first page are limited, so if for example there is room for only 2-4 main news items in each category, then very board loosely related items might be presented as a single news item.
d. If the user clicks on the final "related items" link, he typically gets hundreds or even more than a thousand links to related news items (with the headline, source, time, and the first 2 lines), sorted either by relevance or by time, however, the new list is now without any images and without any clustering, so that many times news stories that are about the same event or even identical (for example due to two or more news sources using exactly the same item from a news agency), may appear at different positions in the list of related links, and various other news items which are more different appear between them and are typically also dispersed in various places.
This makes it vary hard for the user to take advantage efficiently of the list of related items. (Although clicking on the next 30 links each time may eventually show for example only for example 25-30% actual links due to removing some very similar entries, like Google does also with normal web pages results, this still leaves the shown items un-clustered, as explained above).

2. Allowing the user to choose between a few top categories is very limited by nature and does not even come close to the true potential of such systems. On the other hand, when searching by keywords, the user immediately reaches a list of results that is similar to the list that he reaches when clicking on the final list of "related items", as explained below, and thus is subject to the same limitations.
Although many times this first list shows for some of the items, especially in the beginning, a few additional sub-items and a link that says "and more", clicking on the "and more" links always apparently generates only a completely linear and non-clustered list again, like in the case of clicking on the "related items"
links in the automatic newspaper front page, as explained above. For example, searching for the world "Israel" in Google news shows that there are 12,600 items, and the 2°d 14/09/03 Yaron Mayer 4/50 results has the headline Israel Wants to Exile Arafat - - But Not ~'et, with a few additional smaller links and the "and more" link. But clicking on the "and more"
list brings up a linear list that says that there are 1,010 items, and now there no clustering at all (except for deleting entries as explained above). Also, sorting by date always seems to create only a linear list with no clustering at all, even when it is the first list generated by searching for the keywords.
Thus, it would be highly desirable to have an improved News MetaSearch or improved automatically generated "Newspaper" which solves the above problems and preferably adds also many additional useful features. Other problems with other types of searches are also explained and solved below.
Summary of the invention The present invention tries to solve the above problems by at least one of the following ways:
1. Preferably instead of one constant headline in each position the user can click on something and switch between similar headlines (preferably those that are automatically generated as most important within the specific news item), and/or for example the chosen news source changes automatically, preferably at the same position on the screen (for example changes instantly at the same position, or for example changes by using effects such as fade-in and fade-out or scrolling). This automatic switching can be for example between the top 1-30 automatically chosen top related headlines (preferably showing each time also the first sentence or more) and when the user clicks anywhere on that position, he is preferably transferred immediately to the news item that is at the position at the tome that he clicks on it. Preferably each such headline (preferably with its first sentence or part of it) is kept long enough for an average user to read it (for example 30-60 second), and preferably even if this switching is automatic the user can interfere for example by clicking on the item or next to it, and thus move the switching for example backwards or forwards. Another possible variation is for example to allow the user to click on something near the main item in order to expend the list of switching items next to each other, preferably without changing the rest of the layout, or for example to open a menu window which allows to choose any one of them in the window. Similarly, the image preferably keeps changing (for example in correspondence with the current source that is in that place in the textual part, or independently) preferably automatically for example every few seconds, 14/09/03 Yaron Mayer 5/50 thus switching between the sources and letting the user view for example 10-30 relevant images instead of just one, which makes the whole experience already more similar to TV. This changing of the image can again be for example instantly, or for example with fade-in and fade out, or any other affects. Another possible variation is to use similar preferably automatic changes also for example in the smaller links below the main link. Again, preferably if the user clicks on the image area, he is preferably instantly transferred to the relevant news item in the relevant news source for the image that is visible at that position at the time of clicking. Another possible variation is showing for example simultaneously more than one main link and/or more than one image for that item. Another possible variation is, when available, showing instead of still images or in addition to them, also streaming video from these news sources, however in this case the automatic switching of images is preferably either disabled so that for example the user has to click on something in order to view related streaming data from a different source or other still images, or for example each streaming source preferably remains in the position for a longer time than still images until switching to the next streaming source (or for example to the next still image).
2. Preferably if the user clicks on the "additional related items" link or searches for keywords, instead of receiving a problematic linear list as explained above in the background, he preferably receives a clustered list, so that the related links or the keyword search results are preferably again clustered according to the similarity of the items, thus enabling preferably recursive clustering, preferably like a tree (However, since the same news item or sub-cluster might belong to more than one cluster or sub-cluster, preferably it is shown and/or can be reached from preferably all the sufficiently relevant clusters or sub-clusters to which it belongs or is related). Preferably the user can indeed choose at least between the options of ordering by time & date and ordering by relevance, but preferably this helps to create order between and/or within the sub-clusters, but preferably without interfering with the cluster structure itself.
In other words, even sorting by date preferably does not contradict the clustering, unless for example the user requests explicitly to sort by date without any additional sub-clustering. Another possible variation is to allow for example also a combined sorting, so that for example the items or sub-clusters are sorted by days or by hours, and for example within each hour frame or within each day frames they are sorted for example by relevance (for example within and/or between the sub-clusters). Another possible variation is to allow the user for example to request to sort the items by the country of the 14/09/03 Yaron Mayer 6/50 source, so that for example the news items are clustered in addition or instead also according to the country of the news source, so that for example the user can see if there are clear difference in the way the same news story is depicted in different countries. Instead or in addition, preferably the user can choose in this list if he/she wants to see the list with at least one photo near each item, when available, (preferably from the same item in the same source), or without photos. Preferably by clicking on a certain cluster the user can again view a list generated for that cluster, preferably again divided into smaller clusters, however at each stage preferably the user can also simply view specific news items of the cluster. Another possible variation is to let the user view for example a graphical or textual hierarchical representation which preferably shows for example at least one typical headline for each sub-cluster or for example all of its individual headlines, and preferably shows multiple levels of the hierarchy at the same time, or for example the entire hierarchy from the first general cluster down to the final nodes or down to the lowest sub-clusters, so that the user can simultaneously view the multi-level structure of related types of items and choose directly to focus on the sub-cluster or sub-clusters that most interest him. Preferably the user can also switch for example between a graphic or textual tree mode to the mode of just seeing the clusters at each stage. This is very important, since, unlike normal web ages, news items typically refer to specific events, so if for example 500 news items refer to about 10 different but related news items, it is much more meaningful to show the various sub-clusters than to just sort them for example by relevance or by the exact time and date, since if for example 50 of them deal with the same event, it is less meaningful to define which of them is more "relevant". These improvements can have the following fascinating implications:
a. It means that by searching for interesting keywords or keywords combinations (for example "homeland security", "rain forests", "science fiction", or any other subject, common or less common), preferably the user can instantly view an automatic "newspaper" that deals with the requested subject (since clustering the first list generated according to the keywords and requesting an image near each cluster or each item can cause the list to look like the default initial automatic newspaper front page). Preferably these images are represented in the MetaNews system as links to these images in the actual news sources, in order to save space on the MateSearch system's own servers. The images can be displayed on the results page for example in the original size that they have on the source news page where they appear. Another possible 14109/03 Yaron Mayer 7/50 variation is that for example in order to save bandwidth and/or in order to keep the size of the images under control for more regularity in the outlay of the results page, preferably the html protocol and/or the html command set is expanded to allow any image to be requested with a given size limit, so that preferably if the original image is bigger it is either truncated automatically to fit in the allowed window, or is for example automatically downscaled in order to fit completely into the allowed space (preferably this is done by the user's browser or for example by the original server). If truncation is used then preferably the improved html protocol allows the web programmer for example to specify for each image the x-y coordinates of its central point of interest, so that the transaction can automatically be around that central point.
Another possible variation is that for example various heuristics are used by the browser (or by the server) in order to find the central point of interest automatically, such as for example finding the human face in the image, starting automatically from the geometrical center, etc.
Another possible variation is that the Metanews system for example automatically chooses only images that are within a certain reasonable range of sizes.
b. It means that by using the same or similar rules recursively, the user can preferably zero-in on a specific type of news item and see in an organized way for example the same event from different angles. This can be used for example in order to read about all the implications of a certain event, and/or for example in order to analyze for example the types of responses of the world press to certain events. So for example, a news item about Israel's intent to expel Arafat, which in the prior art Google News system leads to large variation of 827 related and partially related news items, will instead lead to a page which leads to a hierarchical tree of related types or sub-clusters of items, for example some dealing with What Israeli leaders say, some about what world leaders are saying, some about the new Palestinian Cabinet, some represent views in favor of the expulsion, some against, etc. The clusters can be for example shown all the way down to the final leaves through multiples levels of the hierarchy, or for example only for the current level, which means that preferably simply the same or similar algorithm that was used for selecting the first page is now applied for example to the selected group of 827 related items. Preferably the automatic switching between images and/or between the main items on focus 14/09/03 Yaron Mayer 8/50 (which preferably includes at least the 1 s' sentence or part of it), is also applied similarly on each displayed page in the recursive sub-clustering.

3. If streaming video is used for example in a few or more of the news sources that deal with or are related to the same event (i.e. the same cluster or same sub-cluster), then preferably the user can also request for example an automatic formation of a group of these sources on the same screen so that they can be viewed simultaneously, for example like a split screen in cable TV, except that the group is preferably automatically generated dynamically according to the item of interest and according to current availability. So preferably the user can see for example a few or more preferably small streaming media images on the same screen at the same time and preferably can also for example switch the sound each time to one of them and/or for example there is a volume control near each of them. By clicking for example on or near one of them the user is preferably transferred to that source to view it normally there. Preferably the user can switch to the multi-view of the streaming images next to each other for example by clicking on something near the original preferably automatically switching image.

4. Preferably as additional new related news items come in, the headlines andlor images can be automatically updated even if the user does not click on any refresh button. For example if there is a report on a new suicide bombing in Israel, as additional details come in and the same items in the various sources become more updated or new items are added, preferably this is also automatically updated in the automatic news page that the user has in front of him (for example if the headline or the first sentence have changed or the images have changed). This is preferably done by automatic partial refresh on a need basis, as explained already in Canadian application no. 2,432,817 of Jul.
4, 2003 (and in subsequent continuations of that application in the US and Canada) by the present inventor, as explained below, and preferably by grouping identical data packets in groups so that each group contains a single copy of the identical data packet together with a multiple list of targets, so that each group preferably goes to a certain general area, and when it reaches that general area the data is preferably duplicated back into the individual packets, or into smaller groups with less targets, which are later split up into the individual packets, as explained for example in PCT application PCT/IL
01/01042 of Nov. 8, 2001 and US application 10/375,208 by the present inventor. Similarly all the data and especially for example any streaming video images are preferably distributed this way to the large number of the automatic 14/09/03 Yaron Mayer 9/50 news viewers (for example from the original servers to any mirror sites of the service and from any original server or mirror site to the users). However, since, as explained above, headlines and images preferably keep changing anyway between items of the relevant cluster or sub-clusters, preferably the user gets a different indication when the items and/or images themselves have changed (for example the same item has been updated on the news source where it resides or the image has changed) or new items or images are brought in, such as for example some sound indication, preferably accompanied with a visual indication of the new item or the item that has changed, such for example some red frame around it, and/or for example the words "Fresh update" near it, etc. The vocal indication has a further advantage, since the user can be alerted for example even if he is currently working on another window.
Of course various combinations o the above and other variations can also be used.
The detailed embodiments below show in more details also various implementation issues that can help solve various additional problems involved in supplying the above features.
Similar methods, but with the appropriate relevant adjustments, can be used for example for creating more sophisticated shareware meta-search service: For example shareware programs should appear in higher places in the meta search results according to at least one of the following:
a. How many of the included shareware sites list them.
b. In which position they are listed for the given searched keywords.
c. How important the shareware site is (so that for example larger or more central major shareware search sites are preferably given at least some higher weight).
d. How many times they were already downloaded (in each site that gives this data, except that preferably the data is normalized by the general amount of listed downloads in that shareware site, for example by comparing it the other sharewares that are listed on the same search results page, or by keeping such data for example in general for each shareware site across multiple searches) e. The shareware site's rating for the shareware, if available (for example based on user votes and/or on their own editorial stuff). If based on user votes, the rating of that shareware site for the shareware it is preferably given higher weight than an editorial decision in another site, if the number of votes is given and is sufficiently large. (This rule is preferably used both between sites and across sites, so that if for example the same site shows both editorial rating and 14/09/03 Yaron Mayer 10/50 user votes for the same shareware, then preferably the user votes are preferred if a sufficiently large number of users have voted).
If the same shareware appears for example in different versions in various shareware sites, then preferably the system can for example use also the rankings of the previous versions (for example according to one or more of the above criteria) for determining the score for that shareware in general, or for example the system uses in this case clusters and sub-clusters like in the meta-news, or for example the system treats each version independently like any other shareware. Of course, various combinations of the above and other variations can also be used.
In the normal Google web pages search engine there are also a few improvements that can be made in order to solve various problems as explained below.
Preferably at least one of the following improvements is done:
a. According to the thorough review of Google technology at http://pr.efactory.de, the normal Google PageRank algorithm, which takes into account how many incoming links each page has and how important or authoritative each linking page is (this is defined by how high is the general PageRank of the linking page), also takes into account the number of outbound links for each page, but in a negative way: pages that have more outbound links lose from their own PageRank score, and incoming links from other pages are given lower weight the more other links there are on the linking page. So for example if page A has incoming links from pages X, Y and Z
(from other sites), the PageRank score of A is considerably higher if pages X,Y,Z each have on average for example 3 outgoing links than if they have on average for example 10 outgoing links each. However, this has the consequence of reducing the principle of giving more weight to links form more important or more authoritative pages, since for example a link from a directory page in Yahoo or in Open Directory would thus have a lowered value since each linking page there has a typically a large number of outgoing links.
On the other hand, reducing the value of the link according to the number of other outgoing links on the linking page does have the advantage that it can reduce for example the effects of submitting a web page to multiple giant junk directories just in order to increase the number of links to that page. But on the other hand, such giant junk directories might be for example artificially created in a way that works around this anyway: For example by automatically creating a special page for each linked page so that there is only one outgoing link on that page. Therefore, preferably the reduction in the weight of a link according to the number of other links on that page is preferably eliminated or 14J09/03 Yaron Mayer 1 1 /50 significantly reduced. Instead, preferably other algorithms are used in order to automatically discover specially designed junk directories and ignoring them or giving them much lower weight. (This can be done for example by identifying automatically certain recurring patterns in such junk pages, or for example by using usage data on the linking page in order to determine the value of the links, so that if for example the linking page is in some junk directory that is hardly ever visited, then the link will naturally have a much lower weight). On the other hand, the position of the link on the page is preferably taken into account, so that a link in a higher place in the linking page is preferably given higher weight, except that preferably the system automatically notices if the links are sorted alphabetically on that page (for example if it is a page in a web directory, such as for example Yahoo or OpenDir), and in that case preferably the position is ignored since a higher position is merely the result of the linked Web page having a name that appears higher on the Alphabet. In addition, it does not make sense at all to reduce the PageRank of page A just because page A has more outgoing links. On the contrary, typically the more important a page is, the more outgoing links it has, since pages with no outgoing links are typically end nodes that deal with more limited content. Also, the more important a site is, the more pages it typically has, but by reducing the rank due to outgoing links the Goggle PageRank algorithm actually punishes web sites for containing more pages. Therefore, another possible variation is to increase the PageRank in general for sites that have more pages and more outgoing links, except that of course incoming links from independent sites should remain much more important then outgoing links since otherwise people might add outgoing links just to boost their rank.
b. Another problem with PageRank is that it automatically gives higher scores to older pages simply due to the fact that they have been around long enough to have gathered more links to them, and, conversely, new pages might take a long time to get a high listing in Google simply because at the beginning they have no or too few links to them from other sites. In fact Google have themselves noticed this problem and tried to solve it in US patent application 20020123988, filed March 2, 2001 and published Sep. 5, 2002, by incorporating also automatic usage statistics for each page (from various sources). However, first of all this does not solve the original problem, since older pages with more links, which are therefore already listed higher on the Google directory, will typically also have by definition more visitors than the new page even if the new page is indeed more relevant to the search query.
Secondly, simply incorporating usage statistics into the score creates the 14/09/03 Yaron Mayer 12150 danger of a classical "Mathew effect" of the rich getting richer and the poor getting poorer. In other words, if usage statistics are simply incorporated mathematically into the final score, then pages which currently have high usage (a high number of visitors) for any reason (for example because they gathered links to them over time and are therefore listed high in the Google search results, or for example because some new site managed to convince some journalist to write about it), then the increased usage can create a snowballing effect of higher rank in Google, and therefore more usage, etc., and vice versa, good pages which have initially low usage can enter a negative cycle of decreasing usage and being listed lower. In order to correct this dangerous problem, preferably usage statistics are used only with one or more thresholds, so that for example usage lower than a certain factor preferably does not continue to lower the score, and usage higher than a certain factor preferably does not continue to increase the score. This improvement is extremely important since it allows using usage data while using at the same time a mechanism for preventing it from causing vicious cycles (negative or positive). Another possible variation is that usage statistics are used only for modifying the value of the link in the linking page but not for modifying directly the ranking of a page. In addition, the problem of how long the page has existed is probably solved by taking into account also historical data, so that preferably for example a page that has existed for example for 3 months and has already for example 20 valid links to it might have for example a higher score than page that has existed for 3 years and has for example 30 valid inks to it. So preferably the time factor is taken into account for determining the weight given to the number of links. (Of course the same algorithm can be used whether any valid links are taken into account or for example only links that seem to be related to the searched keywords are taken into account).
Again, preferably at least some threshold is used, so that 0 links or too few links are not compensated by the fact that the page is new, but if the new page has already sufficient valid links, for example at least 10 links (or any other reasonable threshold number) from other sites that preferably do not reside on the same IP address and their domain is not owned by the same person or organization, then the newness of the page is preferably taken into account in requiring less links at that stage. From the point of view of older sites this also makes sense, since this means that if a page for example has 50 valid links to it since it has existed for a number of years but the number of links does not continue to increase over time then probably the site is really not so important, whereas a really important site would continue to gather more links over time, 14%09/03 Yaron Mayer 13/50 thus compensating for the fact that more time has passed. However the system preferably has to use historical data to determine how long a page has existed, since it obviously cannot rely for that on any info on the page itself or on the site where the page resides. Archives such as for example the Internet archives at http:/lwww.arehive.or~ cannot be relied upon since not every page is indexed there, and also they contain much more data that is not necessary for this, such as for example the historical content of each page for example in 1-month jumps or any other temporal jumps. Instead, preferably the system itself, for example Google, preferably keeps historical records which can contain for example just the URL of each page and the time when it started to appear.
c. In addition, Google typically uses also the anchor text of inbound links to determine the relevance of the linked page to the searched keywords, so that for example if the user is searching for the keywords "free sex", instead of being fooled by numerous not-really-free pages that use these words extensively to fool search engines to give them a high rank for these popular search keywords, the meaning of this is that Google in fact relies on the fact that if links in other independent sites state in the link itself that this is indeed a free sex page, then probably the human who made the link checked and found out that the linked page is really free, for example. In fact, Google itself did not invent this idea, since in the basic Google US patent 6,285,999, originally filed in a provisional application on Jan. 10, 1997, and issued on Sept., 4, 2001, Larry Page indicates that this basic idea was already used before by the "World Wide Web Worm" and by "Hyperlink Search Engine", developed by LDD
Information Services. On the other hand, this idea is preferably further improved to include at least some semantic analysis of the anchor href text and/or preferably also at least the surrounding nearby text, or at least for example the immediate text preceding the link. This is important since in the above example if for example the text of the link or the text preceding the link says that the following linked page are not really free sex pages or are for example only partially free, and the system only analyzes the fact that both the word free and the words sex appeared in the anchor text or near it, then the system can still be easily mislead. So preferably the analysis of the href text and/or for example the surrounding near text preferably at least takes into account some basic language structures such as for example negation words, or modifying words, such as for example "really", "partially", etc., and thus is preferably at least able to identify at least part of the meaning and/or avoid 14/09/03 Yaron Mayer 14/50 certain pitfalls that are relevant to the interpretation of the real meaning of the link.
d. Another possible improvement, which can be used also in other types of search engines or metasearch engines, is to include for example in the keywords search (for example in the general web search or in the news Meta-Search or in the newsgroups search and/or in other types of search) also synonyms, so that for example if the user searches for the keywords "deport Arafat" and the system's synonym database suggests that deport is a close synonym of expel and the system for example finds that there would be for example more or much more relevant results if the user had used the keywords "expel Arafat"
instead, then the system can for example automatically include in the displayed search results also the pages that contain the keywords "expel Arafat", or for example the system asks the user if he would like to consider also for example close synonyms (and preferably remembers that as default for that user for following searches, for example in a browser cookie file), or for example the systems responds in a way similar to the way that Google responds today if there is a typing error. So for example if the words "deport Arafat" lead to for example 200 relevant pages (for example in the recent news search) but the words "expel Arafat" lead to for example 470 pages, (or for example any number larger than the exemplary first 200 or any number larger by a certain minimal difference or minimal factor), then preferably the results search page can for example display the results and ask the user at the top "did you mean expel Arafat?" in this example. In this case, preferably the system also indicates to the user already with this message how many results instead would be on the other search. More preferably, the system can ask the user for example "would you like to include also results with expel Arafat?", and in this case this message preferably indicates the number of results that would be in the combined search results, and then if the user clicks on that link then both types of results are preferably integrated, as explained above. In summary, preferably the system can do at least one of the following: 1. Automatically include in the search results also pages that contain synonyms or close synonyms of the requested keywords. 2. Ask the user if he would like to include in the search results automatically also pages that contain close synonyms of the requested search keywords and remember that as default for that user for following searches. 3. Check at least close synonyms of the user's search keywords, and if there are more and/or better results with the synonyms then the system preferably asks the user for example if he wants to switch over to the results of the search that was based on the synonyms, and/or asks the 14/09/03 Yaron Mayer 15/50 user for example if he wants to integrate the current results with the results of the search that was based on the synonyms. This is a most significant improvement that can help users and significantly enhance the efficiency of searches, since many times the biggest problems of users is that they don't know the most appropriate keywords to search for or don't know all the most relevant ones. Similar principles can be used for example while searching for patents for example at the USPTO, since many times users can miss relevant patents for example because they are not searching properly for all the relevant keywords.
e. Another possible variation is for example to allow the user to define various parameters for scoring the results, preferably on certain allowed ranges, such as for example the relative weight of usage statistics, the amount of reduction of the importance of a link as a result of the total number of links on the linking page, the amount of taking into consideration the newness of a web page so that less links to it are required, etc. These values are preferably remembered for example in a browser cookie, and the system preferably displays to the user on each search the parameters that are currently effective.
This can give users an additional important flexibility and control, instead of being dependent on sometimes arbitrary decisions by the search engine.
f. In addition, if usage statistics are collected, preferably from the browser or from a plug-in in the user's browser, preferably they include additional information, such as for example the typical link-clicking sequence when a user enters a site and start going over its links, the average time the user spends on each site altogether or on each page in the site until moving to another site, etc. Such a measure is problematic since the user might for example open additional links in new windows but keep browsing the original page, so preferably the browser itself (or the plug-in) for example checks if the user is still actively moving within the page. This is why it is preferably done by the browser or by a browser plug-in, since for example routers on the way can provide statistics of requested pages for each requesting IP, but cannot know what really happens on the side of the client. In addition, preferably the browser or plug-in also requests from the user, preferably during installation, at least minimal background data, such as for example at least sex, age and education, and the user's country is preferably known automatically according to his IP or his Operating System settings.

14/09/03 Yaron Mayer 16/50 Of course, various combinations of the above and other variations can also be used.
Also, at least some of the above improvements can be used also in various meta-search engines (in addition of course to News meta search engines), so that for example a web meta search engine such as for example Metacrawler can similarly apply for example the above variations of including synonyms to the collected search results of other search engines.
Definitions and clarification Throughout the patent whenever variations or various solutions are mentioned, it is also possible to use various combinations of these variations or of elements in them, and when combinations are used, it is also possible to use at least some elements in them separately or in other combinations. These variations can be in different embodiments, or different versions of the software, or sometimes different options available to choose from. In other words: certain features of the invention, which are described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.

14/09/03 Yaron Mayer 17/50 Brief description of the drawings Fig. 1 is an example of the look of a typical Google automatic "newspaper" front page (prior art).
Fig. 2 is an example of the look of a typical list generated in http://news.<T~~le.corn after clicking on the list of related items of a given item (prior art).
Fig. 3a is an example of a preferable way that the list of related items (or the list generated by searching for news by keywords) can look after clustering it again like the automatically generated front page.
Fig. 3b is an example of a preferable way that the list of related items or the list generated by searching news by keywords can look when showing multilevel sub-clustering at the same page.
Figs. 4a-b are examples of a preferable way in which the headlines and/or the image of each item can scroll automatically between a number of sources.
Fig. 5 is an example of a preferable way in which multiple streaming video images of the same event from various Online news sources can appear on the screen side by side.
Fig. 6 is an example of a condensed packet for much more efficient distribution of the same data to multiple users.
Detailed description of the preferred embodiments All of descriptions in this and other sections are intended to be illustrative examples and not limiting.
Referring to Fig. 1, I show an example of the look of a typical Google automatic "newspaper" front page (prior art). As can be seen, the prior art system chooses for each headline just one of the possible sources as the main item (Including the first sentence in that news item) and usually also a photo from one of the possible sources (typically from another source), and typically indicates below in smaller print a few additional related headline links below, and then a few additional names of news 14/09/03 Yaron Mayer 18/50 sources below, which also link to related items, and then there is a final link to typically a few hundreds of additional related links.
Referring to Fig. 2, I show an example of the look of a typical list generated in http://news.~oagle.com after clicking on the list of related items (prior art). In this case the item that was clicked on was the item about the talks about deporting Arafat.
As can be seen, this generates a linear list with no clustering at all, and various items that should clearly be in the same sub-clusters are dispersed in different places.
Referring to Fig. 3a, I show an example of a preferable way that the list of related items (or the list generated by searching for news by keywords) can look after clustering it again like the automatically generated front page. As can be seen, preferably this can be very similar or even identical to the front page in any of the general areas, except that there might be for example less sub-clusters and less photos, since only some of the individual news items contain photos that can be used, so for example sometimes an entire sub-cluster might be without a photo. As explained above in the patent summary, preferably the user can switch between a mode that shows photos to a mode without, and preferably the photos and/or the main news items and/or the related smaller items below can switch for example automatically, for example every 30-60 seconds within the same area on the page and/or the user can move backwards and forwards with them. Since this is a recursion, any of the improvements described for the main page can preferably also be implemented here, such as for example all the improvements shown in Figs 4a & 4b. Preferably the recursive clustering continues for example until there are sufficiently few items in the final sub-category or until the items are too different to group further. As can be seen in this example, the general items about talks about expelling Arafat are now preferably divided into reasonable sub-clusters, such as for example the response of Arafat's supporters, the US response, talks about killing Arafat instead of deporting him, etc. In order to enable the smarter mufti-level sub-clustering, first of all, in general, the same or similar principles are preferably applied similarly at all levels, except that in each step they are preferably applied now to the items of the previous cluster or sub-cluster in order to further divide them into additional sub-clusters.
In order to improve the clustering ability, preferably at least one or more of the following methods are used:
1. Preferably the time each item was published is taken into account, preferably with the assumption that the closer the time of publication between them, the higher the chance that two items are dealing with the same event. Another possible variation is to analyze also the temporal words or phrases used within 14/09/03 Yaron Mayer 19/50 the item itself (preferably mainly in the headline and/or in the first few sentences), since if for example some event has occurred 30 minutes ago, then any news items that are older than that cannot be reporting about the same event (although they might have mentioned it even before the event for example in case of a prescheduled event, such as for example a sports event or press conference or a ceremony, these items will typically be different from items that describe the event itself after it has already happened). In other words, the system preferably uses this analysis to decide when the event occurred, and this time can be used for example to separate between news items that occurred before this time and items that occurred after this time and/or to help decide the similarity between items that might be referring to the same event. In order to enable this, preferably the system is able to perform also at least some minimal type of semantic analysis and/or preferably has at least knowledge of the relevant temporal nouns (such as for example months names, weekday names, relative terms, such as for example yesterday, today, tomorrow), and relevant verbs (such as for example before, after, during, on), etc. Preferably this includes also various different ways of writing the same dates or times, such as for example with numbers, with names or with abbreviated names (for example Sep. 9 instead of September 9, etc).
2. Similarly, preferably the system has at least a knowledge base of geographic areas, such as for example at least country names and city names, so that for example when the same place appears in two different news items, preferably in the headline and/or for example in the first I or 2 sentences, the system can give it more weight than ordinary keywords. The headline and the first 1 or 2 sentences are most important, since according to common journalistic rules, all the important information of the 5 W's should already be in there (Who, What, Were, When, and sometimes also Why). Again, preferably this includes also different ways of writing the same names, if they are exist.
3. In addition, preferably the system has a knowledge base of at least the most common or most important verbs that typically appear for example in headlines and/or in the first one or two sentences of news items (or even in entire news items). (The original verb list can be for example generated statistically automatically by analyzing a large number of news items, and then human experts preferably define the knowledge base at least for these most common or most important words). Preferably the knowledge base uses for example semantic trees and/or semantic graphs and/or various rules, so that for example the system knows that killing is much more severe than expelling or deporting, and preferably knows for example that the words "said" or 14/09/03 Yaron Mayer 20/50 "accepted" or "opposes"or "demands"refer to transfer of information (and preferably also the them on various dimensions, differences between such as for example giving on the level of negativity, each word a score level of severity, level of urgency, etc.), and that for example words like "expel"
or "kill" refer to physicalactions, o for example each verb might etc. S be characterized by scores (for example between 0-10 or any other suitable range, or at least a binary characterization) on a number of relevant variables or dimensions, for example:

PresentPast Physical Information Reversible Typically Typically Pos/Neg Done by Done to say said No Yes Undef Yes Humans Humans/Animals tell told No Yes Undef Yes Humans Humans/Animals acceptaccepted No Yes Pos Yes Humans Anything agreeagreed No Yes Pos Yes Humans Anything opposeopposed No Yes Neg Yes Humans Humans/Rules expelexpelled Yes No Neg Yes Humans Humans deportdeported Yes No Neg Yes Humans Humans kill killed Yes No Very-Neg No Humans/Animals Humans/Animals murdermurdered Yes No Very-Neg No Humans Humans/Animals executeexecuted Yes No Very-Neg No Humans Humans executeexecuted Yes No undef Yes Humans Action/Document die died Yes No Very-Neg No Humans/Animals/Abstract Self breakbroken Yes No Neg No Humans/Animals Anything On the other hand, a more hierarchical structure has the advantage that the words themselves can be divided into various clusters and sub-clusters and for example inherit various qualities from their parents in the tree (for example "kill", "murder", "execute" and "die" are all related to ceasing to exist). In addition or instead preferably the system includes also a thesaurus (which can be for example based on existing databases and/or learned automatically from various statistical analyzes of a large number of relevant texts). This way for example the system can know that killing Arafat is something much more negative and irreversible compared to expulsion or deporting, or at least something that is not a synonym of deporting 4. Another possible variation is to include at least a database of synonyms for the comparisons of nouns and/or of verbs, so that the system can know if two words are different or similar even without "understanding" their meaning.

14/09/03 Yaron Mayer 21 /50

5. Another possible variation is to supply the system for example in addition or instead with a knowledge base of major known political names and organizations. Preferably all or at least one or more of the above methods are also used at least for the most important other languages (Such as for example Spanish, German, French, Chinese, and Arabic) preferably with links between the corresponding words between these languages, so that the clustering can preferably work OK also across languages. However, this is less important since typically the users will want to view news items only in one language.

6. Another possible variation is to analyze the similarity between two news items not only by counting the number of occurrences of the same keywords (According to a detailed article in httP:,~/pr.efactory.de;, Google currently relies mainly on counting the occurrence of keywords after deleting to most common and the most uncommon keywords), but also the similarity in the occurrence of word combinations, for example how many same 2-words combinations or same 3-words combinations exists in both items (or for example the same 2 words with any 1 or 2 other words between them), or for example same 4-words combinations or same 5-word combinations, etc.). Another possible variation is that this analysis is preferably done only or mainly on the headline and/or on the first 1 or 2 sentences, which should be the most informative, or the results of the analysis of the headline and/or first 1 or 2 sentences are given higher weight than the analysis of the rest of each item, or for example the importance of each next sentence is decreased according to its position.
Another possible variation is for example to generate for the user also a summary of the relevant cluster or of the relevant sub-cluster for example by generating automatically the list of sentences or for example the list of first or 2°d sentences that appeared most often in the items of the cluster or of the sub-cluster, or for example the sentences which have the largest number of sub-combinations (for example 3 word combinations) that repeat in other items of the cluster or of the sub-cluster. Another possible variation is to use this method for example to highlight the most important sentences in a given article (for example by highlighting sentences which appeared in whole or in part more that other sentences also in other items of the cluster or of the sub-cluster or for example by deleting the sentences that are not highlighted, however deleting is less preferable since it can lead to loss of context). However, since the user preferably reads the article itself in the relevant news source site, this highlighting can be added for example dynamically by a browser plug-in.

7. Another possible variation is to take into account similarity in words even if they are not exactly identical, especially for example in the headline, so that for 14/09/03 Yaron Mayer 22/50 example if a name can be spelled in more than 1 way the system will note the similarity, especially for example if the two names appear in a similar structure in two similar headlines.
Referring to Fig. 3b, I show an example of a preferable way that the list of related items or the list generated by searching news by keywords can look when showing multilevel sub-clustering at the same page. As can be seen, this has the advantage that the user can preferably see the entire tree structure with multiple levels of hierarchy and click directly on any final node (i.e. an individual news item at a certain news source), however this has the disadvantage of too much detail for clusters that might interest the user less, and altogether it is less visually appealing that the variation of Fig. 3a.
Referring to Figs. 4a-b, I show examples of a preferable way in which the headlines and/or the image of each item can switch automatically between a number of sources.
For example, the CBS news image of Arafat shown in Fig. 4a can switch automatically for example between for example 3-20 other related images (preferably determined automatically according to the number of relevant images available), so that for example each image stays for example for 5 or 10 seconds (or any other reasonable time) and the switch is for example instant or for example by fade-in and fade-out. As explained in the summary, the images or some of them might be for example also sources of streaming data, in which case preferably an image which is a source of streaming data preferably stays longer before switching over to the next image. Similarly, the main item, and/or for example the sub-items or sub-headlines of the main item or main headline, can also preferably switch automatically between a number of items, for example the entire 27 items that exist in this example in the main sub-cluster of the larger cluster of 877 related items, or for example only among the for example 10 most important or most recent or most relevant of the 27 (or any other reasonable number or percent). However, this switch is preferably without scrolling effects and can be for example instantly or with some fade-in and out, and preferably each such text remains for the time needed to read it comfortably (for example seconds). Another possible variation is to allow the user also to manually switch between the images and/or between the specific items within the main sub-cluster and/or within the sub-clusters represented by the sub-headlines, for example by adding the blue arrows for "Prev" and "Next" near the text and/or near the image, as seen in Figures 4a and 4b. In addition, as shown in these examples, preferably clicking on the sub-headline, for example, Arafat dares Israel to kilt him after cabinet vote, will lead to the relevant specific news item, and the sub-headlines themselves 14/09/03 Yaron Mayer 23/50 preferably each have a separate link to related items next to it, so that for example each such cub-cluster has a smaller number of links related to it. For example in the example about Arafat's suggested deportation on Fig. 4b there are 5 related links to the sub-headline "lsraeli defence minister says 'kill Arafat"', 6 related links to the sub-headline about the response of Arafat's supporters, 5 related links to "US
opposes Arafat expulsion", and at the bottom there is the link to the list of 877 relates items, which means the entire set of items that belong to the wider cluster (however, as explains above, even clicking on this link will preferably show the list of 877 items clustered again into sub-clusters and sub-sub-clusters ,etc.). Another possible variation is to add for example a similar link also next to the main item, so that it wil I
say for example in this case and 2~ re~ated » for example next to the first sentence of the main item, which is preferably the biggest sub-cluster, as shown in Fig. 4a.
Of course, this is just an example and other similar configurations could also be used to display such clusters and sub-clusters, preferably together with their related links.
Preferably the system determines which item to use as the main item of the general cluster (for example this general cluster of 877 items) by first picking the sub-cluster that has the largest number of items (and/or for example the most recent sub-cluster that is big enough relative to other sub-clusters) and then picking for example the item within this largest sub-cluster (or otherwise chosen first sub-cluster) which has for example the highest average similarity to other items in that sub-cluster and/or for example belongs to the largest sub-cluster of that sub-cluster and/or for example is most relevant within the cluster or within the sub-cluster and/or for example is most recent within the cluster or within the sub-cluster, etc. So if for example the entire large cluster of clusters that relates to Arafat's suggested deportation has 877 items, and for example there are 27 items in the cluster about Israel deciding to deport Arafat, and other sub-clusters have less items, then this naturally becomes the main sub-cluster from which the main item or items are chosen, and for example the next two largest sub-clusters become the next two sub-headlines, etc. Another possible variation is for example to put first the more recent sub-cluster for example if it is large enough or for example if the difference in size between it and a larger less recent sub-cluster is small enough.
Referring to Fig. 5, I show an example of a preferable way in which multiple streaming video images of the same event from various Online news sources can appear on the screen side by side. If streaming video is used for example in a few or more of the news sources that deal with the same event, then preferably the user can also request for example an automatic formation of a group of these sources on the same screen, like a split screen in cable TV for example, except that the group is preferably automatically and dynamically generated according to the item of interest 14/09/03 Yaron Mayer 24/50 and according to availability in the various sources. So preferably the user can see for example 4 or 9 (or any other reasonable number of) small streaming media images on the same screen and preferably for example switch the sound each time to one of them (or for example the sound is not enabled in order to force the user to go to the actual site if he wants also the sound), and then by clicking for example on one of them the user is preferably transferred to that source to view it normally there.
Preferably the user can switch to the multi-view of the streaming images next to each other for example by clicking on something near the original preferably automatically switching image, for example the icon of a split screen or the words "Split Screen", shown next to the images in the example of Fig. 4a, so that preferably the split screen is created automatically by expanding the switching available still images and/or streaming images to appear together side by side. Preferably the split screen can contain for example also some normal images instead of just streaming data. If there are for example 20 available images for a certain cluster or sub-cluster, out of which for example 5 images contain steaming data, then preferably the system organizes first of all the streaming data images next to each other, and adds afterwards the still images. Since 20 images in this example might not fit on one screen, then either the user can use for example the browser's scroll lever on the side to view the rest of the images, or for example only 9 or 12 images are shown and the others for example continue to switch automatically or the user can for example press some button to switch between more than 1 split screens that were created. Preferably the streaming data or any other data is supplied to the users more efficiently by the same mechanisms explained in the reference to Fig. 6. Preferably if one of the sources for example stops broadcasting the relevant streaming data, it can automatically be removed from the split screen or for example is replaced with a relevant still image, and if for example a new relevant data stream becomes available from another source, it can preferably be automatically added by the system to the split screen.
Referring to Fig. 6, I show an example of a condensed packet for much more efficient distribution of the same data to multiple users. As explained in the patent summary, Preferably as additional new related news items come in, the headlines are automatically updated even if the user does not request any refresh. For example if there is a report on a new suicide bombing in Israel, as additional detail come in and the same items in the various sources become more updated or new items are added, preferably this is also automatically updated in the automatic news page that the user has in front of him (for example if the headline or the first sentence have changed or the images have changed). This is preferably done by automatic partial refresh on a need basis, as explained already in Canadian application no. 2,432,817 of Jul.
4, 2003 (and in subsequent continuations of that application in the US and Canada) by the 14/09/03 Yaron Mayer 25/50 present inventor, as explained below, and preferably by grouping identical data packets in groups so that each group contains a single copy of the identical data packet together with a multiple list of targets, so that each group preferably goes to a certain general area or direction, and when it reaches that general area the data is preferably duplicated and split up into the individual packets, or into smaller groups with less targets, which are later split up into the individual packets, as explained for example in PCT application PCT/IL 01/01042 of Nov. 8, 2001 and US application 10/375,208 by the present inventor. This is preferably done in combination with using a preferably hierarchical system of routers and Physical (geographical) IP
addresses (preferably for example GPS based), as explained also in these applications.
Similarly preferably all the data and especially for example any streaming video images are preferably distributed this way to the large number of the automatic news viewers. As explained in these applications, this efficient distribution can be used for example both when sending data to users and when sending data to various proxies or mirror sites such as for example Akamai servers. However, since, as explained above, headlines and images preferably keep changing anyway between items of the relevant cluster or sub-clusters, preferably the user gets a different indication when the items themselves have changed or new items or images are added, such as for example some sound indication, preferably accompanied with a visual indication of the new item, such for example some red frame around it, and/or for example the words "Fresh update" near it, etc. The vocal indication has a further advantage, since the user can be alerted for example even if he is currently working on another window.
The automatic partial refresh is preferably done as follows: In order to save bandwidth for example the html protocol is preferably changed so that it is possible to define for example "refresh on a need basis", which means that the refresh command is initiated automatically by the site when there is any change in the page (so that the browser can get a refresh even if it didn't ask for it), or for example the browser asks for refresh more often (for example every 20 seconds or even less), but if nothing has changed then the browser gets just for example a code that tells it to keep the current page or window as is. The first of these two variations is more preferable since it saves also the waste of bandwidth by unnecessary refresh requests by the browsers. In addition, when the refresh is sent, preferably it can be a smart refresh, which tells the browser preferably only what to change on the page instead of having to send the entire page again. Another possible variation is to implement this "refresh on need"
for example by active X and/or Java and/or Javascript and/or some plug-in or other dynamic code that is updated only when there is a need for it. Another possible variation is for example to keep the page open like a streaming audio or video so that 14/09/03 Yaron Mayer 26/50 the browser always waits for new input but preferably knows how to use the new input for updating the page without having to get the whole page again and preferably doesn't have to do anything until the new input arrives. Of course, like other features in this invention, the above features or variations can be used also independently of any other features of this invention, for example also independently of any Metasearch or automatic "newspaper" application.
The structure of automatically condensed identical packets is illustrated in Fig. 6.
Preferably the condensed packet (61) contains just a single copy of the identical data (62) and an extended header (63), which contains a normal header (65) (preferably with a mark that indicates that this is actually a condensed packet), and a list (64) of the preferably physical (geographic) IP target addresses of the original packets that contained the same identical data in their body and were condensed in this group. So, for example, when sending the same streaming data (or any other same data) for example to millions of users at the same time, preferably one or more such condensed packets are created, preferably by the sending web server, and each condensed packet goes to a certain general target area, and as it reaches the general target area the condensed packet is preferably replicated and regrouped into smaller groups, each containing less target addresses, and eventually replicated back to single packets with a single target address each, as the packet nears its final destination. As explained in the above mentioned applications, this can lead to huge savings both in terms of bandwidth and in terms of the number of routing decisions that have to be made on the way.
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications, expansions and other applications of the invention may be made which are included within the scope of the present invention, as would be obvious to those skilled in the art.

Claims

I claim:

1. A system for improved News Meta-Search over a large number of Online news sources on the Internet or similar networks, comprising at least one of:
a. A system for switching between news items from the same cluster or sub-cluster displayed in a given position in an automatically generated newspaper page, wherein said switching is done automatically or with user intervention.
b. A system for switching between news images from the same cluster or sub-cluster displayed in a given position in an automatically generated newspaper page, wherein said switching is done automatically or with user intervention, and wherein said images are at least one of still images and streaming data.
c. A system for creating recursively sub-clusters of the displayed clusters or dub-clusters of news items that are related to a certain event, so that at least one of:

1. For each sub-cluster shown the user can either click on a chosen item from that cub-cluster or click on a link for seeing a list of additional items that belong to the sub-cluster. 2. When the user requests to see the list of additional items of the chosen sub-cluster, the new list can be again clustered similarly. 3.
When the user requests to see the list of additional items of the cluster, the new list can be again clustered similarly.

2. The system of claim 1 wherein the recursive sub-clustering continues until there are sufficiently few items in the final sub-category or until the items are too different to group further.

3. The system of claim 1 wherein if the user searches for keywords in the News Meta Search, the results are displayed recursively in clusters and sub-cluster in a way similar to the automatically generated newspaper page.

4. The system of claim 1 wherein if the user searches for keywords in the News Meta Search, the results can have all the features that exist in the automatically generated newspaper page.

5. The system of any of the above claims wherein the user can switch between a mode that displays also images and a mode without images.

6. The system of any of the above claims wherein the same news item or same sub-cluster might belong to more than one cluster or sub-cluster, and thus it is shown and/or can be reached from all the sufficiently relevant clusters or sub-clusters to which it is related.

7. The system of any of the above claims wherein sorting a list of related items by relevance and/or by time and date can be used to create order between and/or within the sub-clusters, without interfering with the cluster structure itself.

8. The system of any of the above claims wherein the user can request to sort the items by the country of the source, so that for the news items are clustered in addition or instead also according to the country of the news source.

9. The system of any of the above claims wherein the user can view a graphical or textual hierarchical representation which shows simultaneously the multi-level structure of clusters and sub-clusters, showing more than two levels of the hierarchy at the same time, or showing the structure down to the end-nodes.

10. The system of any of the above claim wherein the html protocol and/or the html command set is expanded to allow any image to be requested with a given size limit, so that if the original image is bigger it is either truncated automatically to fit in the allowed window, or is automatically downscaled in order to fit completely into the allowed space.

11. The system of claim 10 wherein for truncation the improved html protocol allows the web programmer to specify for each image the x-y coordinates of its central point of interest, and/or various heuristics are used by the browser or by the server in order to find the central point of interest automatically.

12. The system of any of the above claim wherein the Meta News system automatically chooses only images that are within a certain reasonable range of sizes.

13. The system of any of the above claims wherein the user can request to automatically spread still images and/or streaming images of the same cluster or sub-cluster together next to each other so that they can be viewed simultaneously.

14. The system of claim 13 wherein by clicking on or near one of the simultaneous streaming data images the user is transferred to that source to view it normally there.

15. The system of claim 13 wherein the user can switch the sound between any of the simultaneous streaming data sources.

16. The system of claim 13 wherein the group of images is automatically and dynamically generated according to the item of interest and according to availability in the various sources, so that images or streaming date can be automatically added or removed accordingly.

17. The system of any of the above claims wherein as additional new related news items come in, the headlines and/or the images can be automatically updated even if the user does not click on any refresh button

18. A system for improved News Meta-Search over a large number of Online news sources on the Internet or similar networks wherein as additional new related news items come in, the headlines and/or the images can be automatically updated even if the user does not click on any refresh button.

19. The system of any of claims 17 and 18 wherein said automatic updating is done by partial refresh on a need basis by at least one of the following ways:
a. The refresh command is initiated automatically by the site when there is any change in the page, so that the browser can get a refresh even if it didn't ask for it.
b. The browser can ask for refresh, but if nothing has changed then the browser gets just a code that tells it to keep the current page or window as is.
c. When the refresh is sent, it is a smart refresh, which tells the browser only what to change on the page instead of having to send the entire page again.

20. The system of any of the above claims wherein the user gets a different indication when the items or images themselves have changed or new items or images are brought in, and said indication is at least one of sound indication and visual indication of the item that has changed or the new item that has been inserted.

21. An Online Shareware Meta Search systeme wherein shareware programs appear in higher places in the search results according to how many of the included shareware sites list them, and at least one of the following:
a. In which position they are listed for the given searched keywords.
b. How important the shareware site is.
c. How many times they were already downloaded.
d. The shareware site's rating for the shareware.

22. The system of claim 21 wherein at lest one of the following features exists:
a. The number of downloads data is normalized by the general amount of listed downloads in that included shareware site.
b. The included shareware site's rating for the shareware is based on user votes and/or on their own editorial stuff.
c. If the shareware's site rating of a given shareware is based on user votes, the shareware site's rating is given higher weight than ratings based on editorial decision, if the number of votes is given and is sufficiently large.

23. The system of claim 21 wherein if the same shareware appears in different versions in various shareware sites then the system at least one of:
a. Uses also the rankings of the previous versions for determining the score for that shareware in general, b. Uses in this case clusters and sub-clusters like in the meta-news.
c. Treats each version independently like any other shareware.

24. An improved Online web pages search engine comprising at least one of the following:
a. A system that takes into account the link relations between web pages for scoring the page but does not reduce the value of a link according to the number of other outgoing links in the linking pages, or reduces the value of a link according to the number of other outgoing links in the linking pages only slightly.
b. A system that improves slightly the rank for a page that has many outgoing links.
c. A system that takes into account usage statistics but uses it only for modifying the value of the link in the linking page but not for modifying directly the ranking of a page.
d. A system that takes into account usage statistics but uses it with one or more thresholds, so that usage lower than a certain factor does not continue to lower the score, and/or usage higher than a certain factor does not continue to increase the score g. A system that uses also the anchor text of inbound links to determine the relevance of the linked page to the searched keywords and includes at least some semantic analysis of the anchor href text and/or also at least the surrounding nearby text, in order to be able to identify at least part of the meaning and/or avoid certain pitfalls that are relevant to the interpretation of the real meaning of the link.
h. A system that uses also the anchor text of inbound links to determine the relevance of the linked page to the searched keywords and at least takes into account some basic language structures such as negation words or modifying words.
i. A system that allows the user to define various parameters for scoring the results, wherein said parameters are at least one of: The relative weight of usage statistics, the amount of reduction of the importance of a link as a result of the total number of links on the linking page, and, the amount of taking into consideration the newness of a web page so that less links to it are required.

25. An improved Online web pages search engine which takes into account the number of incoming links for each page but the time factor of how long the page has existed is taken into account for determining the weight given to the number of links.

26. The system of claim 25 wherein at least some threshold is used, so that 0 links or too few links are not compensated by the fact that the page is new, but if the new page has already sufficient valid links, then the newness of the page is taken into account in requiring less links at that stage

27. An improved Online search engine or Meta Search engine which enhances the user's keywords search by checking also synonyms of the requested keywords, comprising at least one of:
a. A system that can automatically include in the search results also pages that contain synonyms or close synonyms of the requested keywords.
b. A system that asks the user if he would like to include in the search results automatically also pages that contain close synonyms of the requested search keywords and remembers that as default for that user for following searches.
c. A system that checks at least close synonyms of the user's search keywords, and if there are more and/or better results with the synonyms then the system asks the user if he wants to switch over to the results of the search that was based on the synonyms, and/or asks the user if he wants to integrate the current results with the results of the search that was based on the synonyms.

28. The system of any of the above claims wherein in order to enable the multi-level sub-clustering the same or similar principles are applied similarly at all levels, except that in each step they are applied now to the items of the previous cluster or sub-cluster in order to further divide them into additional sub-clusters.

29. The system of any of the above claims wherein in order to improve the clustering ability, the time each item was published is taken into account, with the assumption that the closer the time of publication between them, the higher the chance that two items are dealing with the same event.

30. The system of any of the above claims wherein the temporal words or phrases used within the item itself are used to decide when the event occurred, and this time is used to separate between news items that occurred before this time and items that occurred after this time and/or to help decide the similarity between items that might be referring to the same event.

31. The system of claim 30 wherein in order to analyze the temporal phrases within the item, the system is able to perform also at least some minimal type of semantic analysis and/or has at least knowledge of the relevant temporal nouns and relevant verbs.

32. The system of any of the above claims wherein the system has at least one of:
a. A knowledge base of at least one of: country names, city names, and other geographical areas.
b. A knowledge base of at least the most common or most important verbs that typically appear in headlines and/or in the first one or two sentences of news items and/or in entire news items.
c. A knowledge base of verbs that uses semantic trees and/or semantic graphs and/or various rules, so that each verb can be characterized by scores on a number of relevant variables or dimensions.
d. A database of synonyms for the comparisons of nouns and/or of verbs, so that the system can know if two words are different or similar even without "understanding" their meaning.
e. A knowledge base of major known political names and organizations.
f. The ability to take into account also similarity in words at least in the headlines, even if they are not exactly identical.

33. The system of any of claim 29-32 wherein at least one of these methods is also used at least for the most important other languages, preferably with a link between the corresponding words between these languages, so that the clustering can work OK also across languages.

34. The system of any of the above claims wherein for clustering the system analyses the similarity in the occurrence of combinations of two or more words in the headline and/or in the first 1 or 2 sentences and/or in the entire item.

35. The system of any of the above claims wherein when the switching images contain also streaming data, at least one of the following features exists:
a. The automatic switching of images is disabled so that the user has to click on something in order to view related streaming data from a different source or other still images.
b. Each streaming source remains in the position for a longer time than still images until switching to the next streaming source or to the next still image.

36. The system of any of the above claims wherein the system determines which item to use as the main item of the general cluster by at least one of:
a. First picking the sub-cluster that has the largest number of items and/or the most recent cluster that is big enough relative to other sub-clusters b. Picking the item within the chosen first sub-cluster which has the highest average similarity to other items in that sub-cluster and/or belongs to the largest sub-cluster of that sub-cluster and/or is most relevant within the cluster or within the sub-cluster and/or is most recent within the cluster or within the sub-cluster, etc.

37. The system of any of the above claims wherein when sending the same data to many users or to many servers or mirror sites at the same time, the identical data packets are grouped in groups so that each group contains a single copy of the identical data packet together with a multiple list of targets, so that each group goes to a certain general area, and when it reaches that general area the data is duplicated and split up into the individual packets, or into smaller groups with less targets, which are later split up into the individual packets.

38. A method for improved News Meta-Search over a large number of Online news sources on the Internet or similar networks, comprising at least one of the following steps:
e. Switching between news items from the same cluster or sub-cluster displayed in a given position in an automatically generated newspaper page, wherein said switching is done automatically or with user intervention.
f. Switching between news images from the same cluster or sub-cluster displayed in a given position in an automatically generated newspaper page, wherein said switching is done automatically or with user intervention, and wherein said images are at least one of still images and streaming data.
g. Creating recursively sub-clusters of the displayed clusters or dub-clusters of news items that are related to a certain event, so that at least one of: 1.
For each sub-cluster shown the user can either click on a chosen item from that cub-cluster or click on a link for seeing a list of additional items that belong to the sub-cluster. 2. When the user requests to see the list of additional items of the chosen sub-cluster, the new list can be again clustered similarly. 3. When the user requests to see the list of additional items of the cluster, the new list can be again clustered similarly.

39. The method of claim 38 wherein the recursive sub-clustering continues until there are sufficiently few items in the final sub-category or until the items are too different to group further.

40. The method of claim 38 wherein if the user searches for keywords in the News Meta Search, the results are displayed recursively in clusters and sub-cluster in a way similar to the automatically generated newspaper page.

41. The method of claim 38 wherein if the user searches for keywords in the News Meta Search, the results can have all the features that exist in the automatically generated newspaper page.

42. The method of any of the above claims wherein the user can switch between a mode that displays also images and a mode without images.

43. The method of any of the above claims wherein the same news item or same sub-cluster might belong to more than one cluster or sub-cluster, and thus it is shown and/or can be reached from all the sufficiently relevant clusters or sub-clusters to which it is related.

44. The method of any of the above claims wherein sorting a list of related items by relevance and/or by time and date can be used to create order between and/or within the sub-clusters, without interfering with the cluster structure itself.

45. The method of any of the above claims wherein the user can request to sort the items by the country of the source, so that for the news items are clustered in addition or instead also according to the country of the news source.

46. The method of any of the above claims wherein the user can view a graphical or textual hierarchical representation which shows simultaneously the multi-level structure of clusters and sub-clusters, showing more than two levels of the hierarchy at the same time, or showing the structure down to the end-nodes.

47. The method of any of the above claim wherein the html protocol and/or the html command set is expanded to allow any image to be requested with a given size limit, so that if the original image is bigger it is either truncated automatically to fit in the allowed window, or is automatically downscaled in order to fit completely into the allowed space.

48. The method of claim 47 wherein for truncation the improved html protocol allows the web programmer to specify for each image the x-y coordinates of its central point of interest, and/or various heuristics are used by the browser or by the server in order to find the central point of interest automatically.

49. The method of any of the above claim wherein the Meta News system automatically chooses only images that are within a certain reasonable range of sizes.

50. The method of any of the above claims wherein the user can request to automatically spread still images and/or streaming images of the same cluster or sub-cluster together next to each other so that they can be viewed simultaneously.

51. The method of claim 50 wherein by clicking on or near one of the simultaneous streaming data images the user is transferred to that source to view it normally there.

52. The method of claim 50 wherein the user can switch the sound between any of the simultaneous streaming data sources.

53. The method of claim 50 wherein the group of images is automatically and dynamically generated according to the item of interest and according to availability in the various sources, so that images or streaming date can be automatically added or removed accordingly.

54. The method of any of the above claims wherein as additional new related news items come in, the headlines and/or the images can be automatically updated even if the user does not click on any refresh button

55. A method for improved News Meta-Search over a large number of Online news sources on the Internet or similar networks wherein as additional new related news items come in, the headlines and/or the images can be automatically updated even if the user does not click on any refresh button.

56. The method of any of claims 54 and 55 wherein said automatic updating is done by partial refresh on a need basis by at least one of the following ways:
d. The refresh command is initiated automatically by the site when there is any change in the page, so that the browser can get a refresh even if it didn't ask for it.
e. The browser can ask for refresh, but if nothing has changed then the browser gets just a code that tells it to keep the current page or window as is.
f. When the refresh is sent, it is a smart refresh, which tells the browser only what to change on the page instead of having to send the entire page again.

57. The method of any of the above claims wherein the user gets a different indication when the items or images themselves have changed or new items or images are brought in, and said indication is at least one of sound indication and visual indication of the item that has changed or the new item that has been inserted.

58. An Online Shareware Meta Search method wherein shareware programs appear in higher places in the search results according to how many of the included shareware sites list them, and at least one of the following:
a. In which position they are listed for the given searched keywords.
b. How important the shareware site is.
c. How many times they were already downloaded.
h. The shareware site's rating for the shareware.

59. The method of claim 58 wherein at lest one of the following features exists:
a. The number of downloads data is normalized by the general amount of listed downloads in that included shareware site.
b. The included shareware site's rating for the shareware is based on user votes and/or on their own editorial stuff.
c. If the shareware's site rating of a given shareware is based on user votes, the shareware site's rating is given higher weight than ratings based on editorial decision, if the number of votes is given and is sufficiently large.

60. The method of claim 58 wherein if the same shareware appears in different versions in various shareware sites then the method at least one of:
d. Uses also the rankings of the previous versions for determining the score for that shareware in general, e. Uses in this case clusters and sub-clusters like in the meta-news.
f. Treats each version independently like any other shareware.

61. An improved Online web pages search method comprising at least one of the following steps:
e. Taking into account the link relations between web pages for scoring the page but does not reduce the value of a link according to the number of other outgoing links in the linking pages, or reduces the value of a link according to the number of other outgoing links in the linking pages only slightly.
f. Improving slightly the rank for a page that has many outgoing links.
g. Taking into account usage statistics but uses it only for modifying the value of the link in the linking page but not for modifying directly the ranking of a page.
h. Taking into account usage statistics but uses it with one or more thresholds, so that usage lower than a certain factor does not continue to lower the score, and/or usage higher than a certain factor does not continue to increase the score j. Using also the anchor text of inbound links to determine the relevance of the linked page to the searched keywords and includes at least some semantic analysis of the anchor href text and/or also at least the surrounding nearby text, in order to be able to identify at least part of the meaning and/or avoid certain pitfalls that are relevant to the interpretation of the real meaning of the link.
k. Using also the anchor text of inbound links to determine the relevance of the linked page to the searched keywords and at least takes into account some basic language structures such as negation words or modifying words.
l. Allowing the user to define various parameters for scoring the results, wherein said parameters are at least one of: The relative weight of usage statistics, the amount of reduction of the importance of a link as a result of the total number of links on the linking page, and, the amount of taking into consideration the newness of a web page so that less links to it are required.

62. An improved Online web pages search method which takes into account the number of incoming links for each page but the time factor of how long the page has existed is taken into account for determining the weight given to the number of links.

63. The method of claim 62 wherein at least some threshold is used, so that 0 links or too few links are not compensated by the fact that the page is new, but if the new page has already sufficient valid links, then the newness of the page is taken into account in requiring less links at that stage

64. An improved Online search method or Meta Search method which enhances the user's keywords search by checking also synonyms of the requested keywords, comprising at least one of:
d. Automatically including in the search results also pages that contain synonyms or close synonyms of the requested keywords.
e. Asking the user if he would like to include in the search results automatically also pages that contain close synonyms of the requested search keywords and remembers that as default for that user for following searches.
f. Checking at least close synonyms of the user's search keywords, and if there are more and/or better results with the synonyms then the system asks the user if he wants to switch over to the results of the search that was based on the synonyms, and/or asks the user if he wants to integrate the current results with the results of the search that was based on the synonyms.

65. The method of any of the above claims wherein in order to enable the multi-level sub-clustering the same or similar principles are applied similarly at all levels, except that in each step they are applied now to the items of the previous cluster or sub-cluster in order to further divide them into additional sub-clusters.

66. The method of any of the above claims wherein in order to improve the clustering ability, the time each item was published is taken into account, with the assumption that the closer the time of publication between them, the higher the chance that two items are dealing with the same event.

67. The method of any of the above claims wherein the temporal words or phrases used within the item itself are used to decide when the event occurred, and this time is used to separate between news items that occurred before this time and items that occurred after this time and/or to help decide the similarity between items that might be referring to the same event.

68. The method of claim 67 wherein in order to analyze the temporal phrases within the item, the system is able to perform also at least some minimal type of semantic analysis and/or has at least knowledge of the relevant temporal nouns and relevant verbs.

69. The method of any of the above claims wherein the method uses also at least one of:
g. A knowledge base of at least one of: country names, city names, and other geographical areas.
h. A knowledge base of at least the most common or most important verbs that typically appear in headlines and/or in the first one or two sentences of news items and/or in entire news items.
i. A knowledge base of verbs that uses semantic trees and/or semantic graphs and/or various rules, so that each verb can be characterized by scores on a number of relevant variables or dimensions.
j. A database of synonyms for the comparisons of nouns and/or of verbs, so that the system can know if two words are different or similar even without "understanding" their meaning.
k. A knowledge base of major known political names and organizations.
l. The ability to take into account also similarity in words at least in the headlines, even if they are not exactly identical.

70. The method of any of claim 66-69 wherein at least one of these methods is also used at least for the most important other languages, preferably with a link between the corresponding words between these languages, so that the clustering can work OK also across languages.

71. The method of any of the above claims wherein for clustering the system analyses the similarity in the occurrence of combinations of two or more words in the headline and/or in the first 1 or 2 sentences and/or in the entire item.

72. The method of any of the above claims wherein when the switching images contain also streaming data, at least one of the following features exists:
c. The automatic switching of images is disabled so that the user has to click on something in order to view related streaming data from a different source or other still images.
d. Each streaming source remains in the position for a longer time than still images until switching to the next streaming source or to the next still image.

73. The method of any of the above claims wherein the system determines which item to use as the main item of the general cluster by at least one of:
c. First picking the sub-cluster that has the largest number of items and/or the most recent cluster that is big enough relative to other sub-clusters d. Picking the item within the chosen first sub-cluster which has the highest average similarity to other items in that sub-cluster and/or belongs to the largest sub-cluster of that sub-cluster and/or is most relevant within the cluster or within the sub-cluster and/or is most recent within the cluster or within the sub-cluster, etc.

74. The method of any of the above claims wherein when sending the same data to many users or to many servers or mirror sites at the same time, the identical data packets are grouped in groups so that each group contains a single copy of the identical data packet together with a multiple list of targets, so that each group goes to a certain general area, and when it reaches that general area the data is duplicated and split up into the individual packets, or into smaller groups with less targets, which are later split up into the individual packets.