WO2001011444A2

WO2001011444A2 - System and method for searching and indexing world-wide-web pages

Info

Publication number: WO2001011444A2
Application number: PCT/US2000/021770
Authority: WO
Inventors: Jonathan K. Kilberg; Christopher E. Seline
Original assignee: 2Wrongs.Com, Inc.
Priority date: 1999-08-10
Filing date: 2000-08-09
Publication date: 2001-02-15
Also published as: WO2001011444A3; AU6627700A

Abstract

A system and method for indexing, searching and performing related operations on the Internet uses an automated browser (10) to find Internet web pages that are bookmark files (20). These may be identified by being of a particular type of file, such as a 'NETSCAPE' bookmark file, or as being a list of URL's found outside the server where the bookmark resides, or otherwise. The web pages (30) corresponding to the URL's contained within the bookmark files (20) are parsed for relevant information and a database is constructed based thereon. Another database (40) may be constructed to associated categories contained within found bookmark files with URL's contained within the categories. The databases are searched by a user who inputs a query, and relevant URL's and other information is displayed.

Description

SYSTEM AND METHOD FOR SEARCHING AND INDEXING WORLD-WIDE-WEB

PAGES

FIELD OF THE INVENTION

The present invention relates to a system and method for searching world-wide-web

(WWW) pages. The method also includes indexing, databasing, categorizing, and ranking web

pages. BACKGROUND

Use of the Internet, and more particularly the world-wide-web, has increased dramatically

in recent years. Web pages may be found on any conceivable subject for many purposes, such as

to advertise commercial products and serve as a conduit between companies and their customers;

to provide services such as news reporting; to provide direct entertainment such as Internet radio

stations; to provide educational opportunities; and to provide information of general interest on a

particular subject.

As is well known, the WWW is made up of numerous web pages that include text,

graphics, and sometimes other multimedia information. Web pages are uniquely identified by an

address called the URL (uniform resource locator). A typical URL is: http://www.2wrongs.com .

wherein http:// specifies the type of transfer protocol, and 2wrongs.com is the Internet domain

name. Other typical URL's may have directories, subdirectories, and file names specified. A

web browser, such as "NETSCAPE NAVIGATOR" or "MICROSOFT INTERNET

EXPLORER," retrieves and displays the contents of a web page using the URL.

A problem faced by Internet users is finding the URL's of web pages that are of interest.

Users currently have several options to search for desired web pages. Perhaps the simplest method is to enter a URL that seems to correspond to the subject of interest, and see what web

page is retrieved, if any. For instance, a user who is interested in patent law might enter

http://patents.com or http://www.patents.com, and see what appears. There are a number of

limitations to this technique. For example, it only retrieves one web page, instead of providing a

list of the many pages that may be of interest. Also, it is only of use for the simplest searches,

that can be represented by one word that accurately describes the subject of interest. The fact

that this method is used at all is perhaps a reflection of the weaknesses of some of the other

methods described below.

Another method to search the web is to use an index, which is a collection of URL's that

have been categorized by subject matter, generally by human compilation and in a hierarchical

arrangement. A problem with indexes is that human compilation is time consuming, and is thus

expensive and not likely to include current web pages. This is a particular problem in light of the

fast growth of the Internet.

Another method to search the web is to use a search engine, which is a searchable

database that organizes, in some sense, information on at least a portion of the Internet. A search

engine system consists of a spider (an automated browser or agent) which traverses the Internet

gathering information of webpages (by following links of URL's), a database to store the

gathered information, and a search tool for searching the database. Search engines extract and

index information using a number of parameters. In general, they index some or all words found

in documents, and may also index a document's size, title, headings and subheadings, and other

information. While search engines are useful, they have several problems that have not been

overcome. The Internet contains such an enormous amount of information that it is not feasible to index every web page, thus current search engines only index a relatively small portion of the

Internet. Information that is not indexed is ignored by the search engine. Also, the Internet

includes a large amount of web pages that are probably of very little interest to anyone other than

their creator. Search engines nevertheless will, in general, index such web pages. While some

search engines attempt to rank, in some sense, the popularity of URL's, none do so similarly to

that of the present invention as disclosed below. Yet another search engine problem is that the

creators of some web pages will insert words into a web page simply for the purpose of

"tricking" search engines into retrieving the web pages, even when the subject of the web page

has nothing to do with the subject of the search. Taken together, these and other problems have

limited the usefulness of search engines. While search engines are widely used, a search engine

will typically retrieve the URL's of many web pages that are not of interest along with web pages

that are, and, as discussed above, may not find relevant web pages simply because a given web

page may not reside on a portion of the web that has been indexed by a particular search engine.

All documents, including any technical protocols such as http and HTML, referred to in

this document are hereby incorporated by reference in their entirety, although no documents are

admitted to render any of the claims unpatentable either alone or in combination with any other

references known to the applicant.

The present invention is described in the context of the Internet and the WWW.

However, it will be readily understood that the utility of the invention is not limited to the

Internet as that term is used to describe a particular computer network, but instead may be used

by any other computer network that utilizes hyperlinks and addresses similar to URL's, wherein

users collect lists of URL's. SUMMARY

The present invention relates to a method of searching a computer network comprising a

number of pages. The method includes the steps of searching the network for pages that are

bookmark files; and creating a database based upon the contents the bookmarked web pages. A

word database is created based upon the words contained within the bookmarked web pages, and

relating the words to the URL's and other information contained within the web pages. Another

database may be created based upon the categories contained within the bookmark files.

A user may query the database(s) and retrieve web pages of interest that are ranked

according to a desired ranking scheme. A detailed description is provided below, along with

other aspects of the invention.

BRIEF DESCRIPTION OF THE DRAWING

The FIGURE is a schematic representation of the architecture of an embodiment of the

present invention.

DETAILED DESCRIPTION

The present invention is a system and method for searching, indexing and otherwise

classifying web pages on the Internet, and related computer networks.

Bookmarks

The invention uses an automated browser to search for files (web pages) that are

identified as primarily lists of URL's. Files of this type are referred to as "bookmark files"

herein. In general, bookmarks are a feature supported by most web browsers that allow a user to

save important links (as selected by the user) in a bookmark file so they can be found

immediately without the user having to look up the URL and type it into the browser. The user simply views the bookmark file and selects a link therefrom to go to the desired web page.

Bookmark files may be recognized as such by the automated browser in several ways.

The files may be recognized as being "NETSCAPE" bookmark files. Or, the files may be

recognized as being formatted similarly to a "NETSCAPE" bookmark file, even if they are not

formatted identically. Or, the files may be recognized as being a bookmark file supported by

another browser, such as Microsoft's "INTERNET EXPLORER" (called "favorites"), or as files

being similarly formatted. Similarly formatted files may include, for example, lists of hyperlinks

to URL's found outside the server where the bookmark file resides, which can be determined by

comparing the URL of the bookmark file with the listed URL's.

Another useful feature of bookmark files is that they are, in general, categorized by the

user into folders related to a particular subject matter so that the user places the URL's of related

web pages into particular folders. The user creates as many folders as desired, names the folders,

and places URL's within the folders. For example, a computer user who is a patent attorney and

a rock climbing enthusiast might create: a folder titled "Patent and Trademark Office", wherein

URL's of frequently accessed web pages of the PTO are placed: another folder tilted "Patents -

Non-PTO", wherein URL's of other frequently accessed web pages related to patents are placed;

and another folder titled "Rock Climbing", wherein URL's of web pages related to rock climbing

are placed. While the use of categories is not necessary, it assists the user in locating a specific

URL. Since a user may have a potentially unlimited number of URL's saved in the user's

bookmark file, it may be very time consuming and tedious to inspect a complete lists of URL's to

find the one the user has in mind.

Downloading Bookmark Files and Other Web Pages As discussed above, the automated browser searches the Internet to find bookmark files.

The automated browser downloads and locally stores each bookmark file. The automated

browser also locally downloads the web pages stored at each URL contained in each bookmark.

The automated browser continues searching the Internet to find additional bookmark

files. This can be done in several ways. It can be done be selecting an LP address at random, or

according to any predefined search criteria, determining if the selected address is a bookmark

file, downloading if so, and repeating the process. It can also be done by feeding the automated

browser a certain URL and having the browser follow links attached to the URL, and then follow

links attached to these URL's, and so on. Both of these methods, or potentially other methods

may be used. In practice, automated browsers are well understood and are used by current

generation search engines (although not for the purpose of the present invention), so that one

skilled in the art will understand how an automated browser can search the Internet and

download bookmark files and web pages whose URL's are bookmarked in the bookmark files.

While it is not necessary to download all (or substantially all) of the bookmark files on

the Internet and the corresponding bookmarked web pages, it can be appreciated that an

embodiment of the present invention can download all or substantially all of those webpages

using much fewer resources than is presently employed by search engines, which attempt to

download as much of the Internet as possible (at least the portion of the Internet containing user

input text, as opposed to graphics files, Internet resource files, or the like). Further, the

downloaded information of the present invention only corresponds to web pages that have been

bookmarked by Internet users, and so are deemed to be useful. Thus, "junk" that is of little or no

interest to anyone other than the creator is not likely to be included in the downloaded information.

Database Compilation

A. Word Database A word database is compiled by cross-referencing each word of each downloaded page to

a file specific to each unique word. Thus, each unique word has a file associated with it. (If

desired, the database can be constructed exclude unique or very rare words, as defined by

determining the relative frequency of the words, dictionary entries, or otherwise, in order to

conserve resources.) The file has multiple entries, each entry corresponding to: the URL of a

page that has at least one reference to the word; the HTML title of the page; the length of the

page in both number of words and bytes; the word number of each instance of the word with data

specific to each instance including font, case, and HTML type; the date the file was retrieved;

and the frequency that the URL was found within the downloaded bookmark files (i.e., the same

URL may appear in many user's bookmark files, and this would tend to indicate a more popular

web page). A sample database entry of a word file for the word "wrongs" appears below. The

entry would be one entry stored in a file named, for example, "wrongs.txt."

a I b I c I d | e |f | g | h |

45.23|www.2wτongs.com|2wrongs.com Homepage|45al23fl45al78f232a|1050|4|75|43|

a. Precompiled ranking of word

b. URL

c. Title

d. Occurrences and font of each word (45a= 45¹ word with font type a) e. Number of words in page

f. Size of page in kilobytes (rounded)

g. Day, from zeroth day, the file was retrieved

h. # occurrences of URL in bookmarks

Subpart d labeled above is interpreted to mean that in the web page found at

www.2wrongs.com, the word "wrongs" appears in the text of that web page as the 45th word

(having font type a which can be any predefined font type), the 123rd word (having font type f

which can be another predefined font type), the 145th word (having the font type a), the 178th

word (having the font type f), and the 232nd word (having the font type a). As another example,

if a downloaded URL included the text: "The dog is eating the sock" the "the.txt" file would

have a subpart d entry of |la5fj (i.e., the word "the" is the first word of the sentence with font

type a (here normal font), and is the fifth word of the sentence with font type f (here italics); the

"dog.txt" file would have a subpart d entry of (2a|; the "is.txt" file would have a subpart d entry of

|3a|; and so on.

The word "wrongs" would have a similar entry for each URL that is included within the

database that contains at least one instance of the word "wrongs." Thus, the database has a file

for each unique word; and the file for each unique word has an entry for each URL that includes

that unique word in the text of the web page (or domain name, title, etc.) identified by the URL.

It should be understood that the classification of information contained in the downloaded

web pages as described is a specific embodiment of the invention, and could be modified either

by including more or less information. For example, it may be desirable limit the information to

the first 100 (for example) words of text in each web page, to conserve space. Or, only headings and subheading could be indexed. More generally, indexing web pages for the purpose of

constructing a database for use with a search engine is a known procedure, and any procedure

currently used or that becomes known may be incorporated as an aspect of the present invention.

B. Categories

In addition to the word database, the present invention may also include a category

database that includes categories of URL's. These categories may be determined in a first step by

using each (or any desired subset) of categories that are included within each of the downloaded

web pages. For example, in the sample bookmark file of a typical Internet user named Yu Wu

attached hereto as the Appendix, the file includes the categories: Reference, Search Engine, and

News & TV. Another sample bookmark file might include the categories: Search Engine, Law,

and MP3. The invention would then include the categories: Reference, Search Engine, News &

TV, and MP3. The category Search Engine would have the bookmark files of both Yu Wu (here

Excite, AOL Netfind, AltaVista Search: Main Page; Infoseek; and Yahoo!.) and the other user,

since each user used that category. Each other category would have bookmark files of either Yu

Wu or the other user. Thus, the concept can easily be extended to include each of the categories

that are included in each downloaded bookmark file.

The categories can be further combined or otherwise manipulated by human or automatic

intervention in order to overcome some potential complications. In particular, the categories can

be manipulated to prevent multiple similar categories which might otherwise result because of

users' use of synonymous terms. For example, three different users might use three different

categories such as Bicycling, Biking, or Cycling to denote web pages having the same activity.

Or, categories could be translated from one language to another. The significant point is that the resulting category database uses the categorization of the downloaded bookmark files to create a

database of categories and associated URL's.

Searching the Database

A. Word Database

A user can search the database described above to search and rank web pages. It is noted

at the outset that search engines are presently known to search databases to find and rank web

pages in response to a user input search criteria, and the present invention can use any search

method known or developed as well as the particular methods disclosed herein. In general, a

user enters one or more keywords that are intended to describe the information that the user

wants to find. The database is searched based on the keyword(s), and results are returned in

HTML pages. The results are generally ranked according to some ranking system. The results

generally include the URL's of the relevant web pages, and often include first several sentences

of the web pages and/or the title of the web pages.

If one keyword is selected by the user, then the database may retrieve a desired number of

results, such as ten, and display them to the user. The results may be retrieved by the

precompiled ranking associated with each entry of the relevant word file. Referring to the entry

for the word file "wrongs" stored in the file wrongs.txt, the entry for the URL 2wrongs.com has a

precompiled ranking that is stored in the field denoted by reference character "a", and is an

arbitrary ranking to order the entries composing the word files. The user may select additional

results after viewing the initially retrieved results. Such a display is simple to implement and

requires relatively little computing power.

Alternatively, a ranking system can be used. A presently preferred ranking system is described below. For single word queries:

RANK_; = T + Rf + U + D + S + RP

I RANK_{.o al} = ∑ RANKi* Rfi*A i-l For multiple word queries:

where, for words with common URL's

word_ab = the word number of the b' occurrence of the a word from the query

I = number of words in query

X, = number of instances of word i in the URL

X₂ = number of instances of word q in the URL

else, if URL's are not common

A = l

The definitions of the independent variables are as follows:

T = , * (#t in title)

ifPageSize>3000

Rf - w₂ * (# in Page) * / (PageSize/3000) if2000<PageSιze<3000

Rf = w₂ * (# in Page) * / (PageSize/2000)

if l000<PageSize<2000

Rf = w₂ * (# in Page) * / (PageSize/1000)

ifPageSize<1000

Rf = w₂ * (# in Page) * / (PageSize/500)

U = w₃ * (#t in URL)

D = w₄ * (#t in Domain name)

S = w₅ * (# forward slashes in URL)

ifx<=10

RP = w₆ * P / (40 - ( 1.1 * (10 - x) + (1.2 / (10 - x) ) ) )

ifx>10

RP = P * w₆

P = relative number of times the URL occurs in all bookmarks

#t = number of times the word occurs

w,-w₆ = variables to adjust for customized ranking (i.e., weighting indexes).

Briefly discussing the above, it can be seen that the ranking for a single word query

reviews each entry for the word file for the single word, and increases the ranking of an entry

depending upon: the number of times the word appears in the title of the web page; the relative

frequency that that the word appears in the text of the web page; the number of times the word

appears in the URL; the number of times the word appears in the domain name; the number of

forward slashes in the URL (which gives the front page of a domain priority, rather than a page several directories deep); and the relative number of times the URL appears in the downloaded

bookmark files, which corresponds to a relatively more popular web page. The ranking system

for multiple queries uses generally the same parameters, and also ranks web pages more highly

depending upon the proximity of the queried words to each other in a given web page.

B. Categories

A user can also search the database to retrieve URL's (and any associated information) by

searching the category database described above. The search can retrieve all URL's associated

with a user query that is associated with a category that is stored in the database. The search

results may be displayed arbitrarily, or may be ranked by the number of times a URL is included

in the downloaded bookmark files. The category database may be searched independently from

the word database. Or, a search may be performed in both the word database and the category-

database, with the results being determined by combining the results of each database search in

any desired way. By way of illustration, the results could be similar as for the word database,

however additional weighting could be given to URL's that are also found in the categories

database. It will be apparent that any ranking system including the results of searching both the

word and the category database may be used.

Schematic Representation

With reference to the FIGURE, a schematic representation of an embodiment according

to the present invention is described. It should be understood that the schematic representation is

provided solely for the purpose of explaining an embodiment of the invention, and that the

invention is not limited to any particular architecture.

The Internet, denoted I, includes web pages, denoted Wl to W_n (representing all of the web pages on the WWW) The automated browser, denoted with reference numeral 10. crawls

the Internet I and examines the pages Wl to W_n in order to find which pages are bookmark files,

denoted with reference numeral 20 These bookmark files are downloaded and stored locally

The URL's contained withm the bookmark files 20 are used to download the web pages

associated with those URL's, identified with reference numeral 30.

The bookmark files 20 are parsed to create a categoπes database, denoted with reference

numeral 40, as described above. The categories database 40 associates user defined categoπes

with the URL's stored in the user defined categoπes, subject to human or automatic manipulation

to combine synonyms or to perform other actions. The downloaded web pages 30 are parsed to

create a word database, denoted with reference numeral 50, as descπbed above The word

database 50 creates a file for each unique word, each file having an entry for each web page

containing at least one instance of the word

A user U interacts with a search engine, denoted with reference numeral 60, to

search either one or both of the databases 40 and 50. The user U queπes the search engine 60

using one or more search terms, and results from the databases 40 and 50 are returned to the user

While the Internet I, user U, and components 10 - 60 are shown as being connected in a certain

configuration, it should be understood that that is simply one configuration and that any other

configuration could be used as an alternative embodiment of the invention.

Conclusion

It should be understood that a representative embodiment of the invention is disclosed,

and that the scope of the invention should not be unduly limited to the disclosed embodiment

For example, certain operations are descπbed as being performed locally, after downloading information. It will be understood by those skilled in the art that the method disclosed herein can

be performed by software and hardware at any location. It also will be understood that a number

of useful features are disclosed herein, and that not every feature need be incorporated into a

useful product in order to fall within the scope of the present invention. For example, both the

category and word databases have been described. However, a useful product could only include

one or the other database. Additional modifications will be obvious to one skilled in the art.

Appendix

The following is a typical user bookmark file, shown first as it would be displayed upon a

user's screen and then with the HTML tags shown.

As displayed by Web Browsers

Yu Wu's Bookmarks

Reference

Free On-line Dictionary of Computing from FOLDOC GMU Patron databases Eric's Treasure Trove of Mathematics Hypertext Webster Interface

XLibris on the Web

Search Engine

Excite

AOL NetFind

AltaVista Search: Main Page

Infoseek

Yahoo!

News & TV ...

CNN Interactive

Yahoo! - Reuters Hourly News Summary Welcome to WashingtonPost.com Plain Text

<!DOCTYPE NETSCAPE-Bookmark-πTe-l>  <TITLE>Yu Wu's Bookmarks</TITLE> <Hl>Yu Wu's Bookmarks</Hl>

<DLxp>

<DTxH3 FOLDED ADD_DATE="869594090">Reference< H3> <DLxp>

<DTxA HREF="http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?Free+On-line+Dictionary" ADD_DATE="869603871" LAST_VISIT="870100656"

LAST_MODLFIED="869603866">Free On-line Dictionary of Computing from FOLDOC</A> <DTxA HREF="http://library.gmu.edu/lib/dbase/local.html" ADD_DATE="869595248" LAST_VISIT="870099395" LAST_MODLFIED="869595207">GMU Patron databases</A> <DT><A HREF="http://www.asfro.virgiriia.edu/~eww6n/math/ghindex.html'' ADD_DATE="869436954" LAST_VISIT="869437657" LAST_MODLFIED="869437657">Eric's Treasure Trove of Mathematics</A> <DT><A HR£F="http://work.ucsd.edu:5141/cgi-bin/fιttp_webster" ADD_DATE="869436954" LAST_ VISIT="870101208"

LAST_MODLFIED="869594227">Hypertext Webster Interface</A>

<DTxA HREF="http://bluehaze.wrlc.org/webpac-bin/wgbroker?new+-access+top" ADD_DATE="850331540" LAST_VISIT="855072896" LAST_MODLFIED="850331532">XLibris on the Web</A> </DLxp>

<DTxH3 FOLDED ADD_DATE="869593680">Search Engine< H3> <DLxp>

<DTxA HREF="http://www.excite.com " ADD_DATE="870302095" LAST_VISIT="870302084" LAST_MODIFIED="870302084">Excite</A> <DTxA HREF="http://nearnet.gnn.com/search/" ADD_DATE=" 869611014"

LAST_VISIT="86961 1002" LAST_MODLFIED="869611002">AOL NetFind</A>

<DTxA HREF="http://www.altavista.digital.com/" ADD_DATE="869605211" LAST_VISIT-"869605190" LAST_MODLFIED="869605190">AltaNista Search: Main Page</A> <DTxA HREF="http://www.infoseek.com/" ADD_DATE="869604884"

LAST_VISIT="870720350" LAST_MODLFIED="869604879">Infoseek</A>

<DTxA HREF="http://www.yahoo.com/" ADD_DATE="858107417" LAST_VISIT=" 870720901 " L AST_MODLFIED=" 858107414">Yahoo ! </A> </DLxp> <DTxH3 FOLDED ADD_DATE="869594585">Νews & TV ...</H3> <DLXp>

<DT><A HREF="http://w w.yahoo.com/headlines/cuπ-ent/news/summary.htm ' ADD_DATE="869611 179" LAST_VISIT-"869611146"

LAST_MODLFIED="869611146">Yahoo! - Reuters Hourly News Summary</A>

<DTxA HREF="http://www. washingtonpost.com/" ADD_DATE=" 862020505" LAST_VISIT="866206162" LAST_MODLFIED=" 862020495 ">Welcome to WashingtonPost. com</ A> </DLxp>

Claims

THE INVENTION CLAIMED IS:

1. A method of searching a computer network comprising a number of pages, the

method comprising the steps of:

(a) searching the network for pages that are bookmark files, each of the bookmark files having a

number of bookmarked URL's that identify corresponding bookmarked web pages; and

(b) creating a database based upon the contents the bookmarked web pages.

2. The method of claim 1, further comprising the step of: searching the database in

response to a user query.

3. The method of claim 2, wherein step (a) uses an automated browser.

4. The method of claim 3, wherein step (a) identifies bookmark files by comparing

the searched pages with a known bookmark file format.

5. The method of claim 3, wherein step (a) identifies bookmark files by searching

each page having a server for a list of hyperlinks to URL's found outside the server of the

searched page.

6. The method of claim 3, wherein step (b) downloads the contents of each page that

is a bookmark file.

7. The method of claim 6, wherein the database includes a list of files corresponding

to words that are contained within the downloaded pages of step (b).

8. The method of claim 7, wherein each file corresponding to a word has an entry

corresponding to each downloaded web page that contains the word.

9. The method of claim 8, wherein each entry corresponding to each web page

includes the URL of the web page.

10. The method of claim 9, wherein each entry corresponding to each web page includes at least one of a group of parameters related to that page selected from the group

consisting of an HTML title of the page, a length of the page in number of words and bytes, a

word number of each instance of the word; a date the page was retrieved, and a frequency that

the URL was found within the downloaded pages.

11. The method of claim 10, wherein each entry corresponding to each web page

includes more than one of the parameters.

12. The method of claim 11, wherein the searching step (c) retrieves web page URL's

and ranks the retrieved URL's based upon the parameters.

13. The method of claim 12, wherein the user query is a single word query and the

ranks of the retrieved URL's are based upon: the number of times the word appears in a title of

the web page; the relative frequency that that the word appears in a text of the web page; the

number of times the word appears in a URL; the number of times the word appears in a domain

name; the number of forward slashes in the URL; and the relative number of times the URL

appears in the downloaded bookmark files.

14. The method of claim 12, wherein the user query is a multiple word query

consisting of a group of words, and the ranks of the retrieved URL's are based upon: the number

of times that each of the group of words appears in a title of the web page; the relative frequency

that that each of the group of words appears in a text of the web page; the number of times that

each of the group of words appears in a URL; the number of times that each of the group of

words appears in a domain name; the number of forward slashes in the URL's; and the relative

number of times the URL's appears in the downloaded bookmark files and a proximity of group

of queried words to each other in a given web page.

15. The method of claim 3, wherein at least some of the bookmark files include categories, and further comprising the step of creating a database based upon the categories.

16. The method of claim 14, wherein step (c) includes searching the database based

upon the categories.

17. The method of claim 16, wherein step (c) includes ranking web pages based upon

the searching of both databases.

18. The method of claim 1, wherein step (a) searches the world- wide- web.

19. A method of searching a computer network comprising a number of pages, the

method comprising of:

(a) searching the network for pages that are bookmark files, each of the bookmark

files having a number of bookmarked URL's that identify corresponding bookmarked web pages,

at least some of the bookmark files having categories categorizing at least some of the URL's;

and (b) creating a database based upon the contents the bookmarked web pages including the

categories.

20. The method of claim 19, further comprising the step of: (c) searching the

database in response to a user query.

21. A system for searching a computer network comprising a number of pages, the

system comprising:

(a) an automated program for fetching bookmarks files, the bookmark files

referencing web pages, the program crawling the web and the pages referenced by the bookmark

files;

(b) a database including information based upon the contents the bookmark web

pages; and

(c) means for searching the database in response to a user query.