CA2170564A1 - Method of propagating data through a distributed information network - Google Patents

Method of propagating data through a distributed information network

Info

Publication number
CA2170564A1
CA2170564A1 CA002170564A CA2170564A CA2170564A1 CA 2170564 A1 CA2170564 A1 CA 2170564A1 CA 002170564 A CA002170564 A CA 002170564A CA 2170564 A CA2170564 A CA 2170564A CA 2170564 A1 CA2170564 A1 CA 2170564A1
Authority
CA
Canada
Prior art keywords
server
servers
list
data
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002170564A
Other languages
French (fr)
Inventor
Frank Michael Kappe
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CA002170564A priority Critical patent/CA2170564A1/en
Priority to US08/608,302 priority patent/US5805824A/en
Publication of CA2170564A1 publication Critical patent/CA2170564A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24565Triggers; Constraints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Abstract

A method of propagating data through a distributed information system is disclosed wherein in a computer implemented distributed information system, a method of maintaining referential integrity of a plurality of links and documents by propagating updates from a server to a plurality of servers, comprising the steps of; i) the server maintaining an ordered list of the plurality of servers in the distributed information system; ii) the server maintaining a link database containing the plurality of links for locating remote documents stored remotely which are referenced by documents stored locally at the server;
iii) the server maintaining an update list including messages reflecting changesto local documents and links and remote documents and links; iv) selecting a priority value (p) with which to transmit the update list wherein the priority value is a real number greater than or equal to 1; v) on a predetermined clock cycle, the server transmitting the update list according to the priority value wherein, the server transmitting the update list to a receiving server located adjacent to it on the ordered list, to an integer portion of p-1 other receivingservers selected at random from the ordered list and to another receiving serverselected at random from the ordered list with a probability equal to a fractional portion of p; vi) the receiving servers updating their link databases and the locally stored documents with messages from the update list and appending the receiving servers respective lists of object data with the received list of object data; and vii) repeating steps v) through vii).

Description

2170~564 A METHOD OF PROPAGATING DATA THROUGH A
DISTRIBUIED INFORMATION SYSTEM
BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION
s The present invention relates to distributed information systems.
More specifically, the present invention relates to a method for prop~g~ting data objects such as documents and referential link updates through a distributed information system.
DESCRIPTION OF THE PRIOR ART
As would be apparent to those f~mili~r with the Internet, occasionally when a user activates a link, as in the case of the World-Wide Web (WWW) or menu item, as in the case of Gopher, the resource the link or 15 menu item referenced cannot be retrieved. In some instances, this situation is the result of a temporary problem with the system or the server on which the resource resides. For the purposes of this discussion, system will generally mean colllpuler networks and/or a plurality of servers interconnected for the purposes of transmitting information. Furthermore, for the purposes of this 20 discussion, servers will generally mean file servers, news servers and the like as would be understood by those of skill in the art. A server is further understood to mean a central processor or computer responsible for m~int~ining and distributing information such as files, documents and/or references and the like to termin~l.c or colll~ulels in communication with the server. However, it 25 is often the case that this situation indicates that the resource has been permanently removed. Since the systems mentioned above rely on Uniform Resource Locators (URL)s for accessing information, this situation may also indicate that the resource has been moved to a new location. It is further possible that a resource is eventually replaced by a different one under the same 30 name, at the same location.

21 7056~

Consequently, a significant percentage of references are invalid.
It is reasonable to assume that this percentage will rise as time goes by due tothe following factors: an increase in the amount of documents which become out(l~te-l and eventually removed; server services becoming discontimle~l; server S services being moved to different server addresses URLs or; being re-used and new resources being given identical link or menu names. As will be apparent to those of skill in the art, it would be desirable to have some support system for automatically removing such "(1~n~ling" references to a resource which is deleted, or at least to inform the m~in~iners of those resources.
For the purpose of clarity throughout the following discussion, conventional hypertext terminology will be used with respect to documents and links as opposed to the more general object terms of resources and references respectively. However, as would be understood by those of skill in the art, the 15 method of the present invention will work for any relationship between any objects.

An important issue that needs to be considered when ~e~ling with distributed information systems and/or protocols is that of scalability.
20 Preferably, the behaviour of a scalable system should not, in most cases, depend on variables such as: the number of servers; documents; links; or concurrent users of the system. In a distributed information system such as the Internet environment, scalability is a significant aspect of system design when considering the rate at which the Internet is currently expanding. Scalability 25 can be further distinguished by four related characteristics namely, performance, traffic, robustness and management.

The perform~nre~ which is typically measured by the response time perceived by the user, should not depend on the number of concurrent 30 users or documents. However, it is difflcult to meet this requirement in a 2170~ 64 centralized system. Therefore the use of a distributed system, where users and documents are more or less evenly distributed over a number of servers which are connected by the system, is plefelled.

Unfortunately, situations can occur where a large number of users access a small set of documents residing on a single server. Under such circumstances the performance of the distributed information system is similar to the centralized system, in which the majority of the load is placed on a single computer at a certain location in the system. As will be apparent to those of skill in the art, it is preferable to avoid such situations. One solution to this situation is through the use of replication, in which copies of the document areplaced on a plurality of servers. For example, a well known scalable system which relies heavily on replication is the USENET news service. When the user reads news, they are connected to a local news server which m~int~in~
copies of the recently posted news articles. As a result, when a replicated article is accessed, it does not need to be retrieved from the origin~ting site.Therefore, the response time does not depend on how many other Internet users access the same article at the same time. However, the response time does depend on the number of users connected to the local news server.
When searching through a plurality of documents, the response time will increase with the number of documents searched. Good search engines use data structures giving O (log n) (where O is understood to mean "on the order of") access performance, whereby there exists a constant c such that the time t taken to search n documents is smaller than c log n. Intuitively, this implies that for large n, a further increase of n will have less effect on t, which is a preferable situation. Therefore it is generally accepted that logarithmic performance is an acceptable method to qualify as being scalable in performance.

Replication typically requires that additional traffic must to be sent via the distributed information system. As is apparent from the foregoing discussion, every news article has to be sent to every news server so that it can be accessed locally. However, it may well be that a significant number of 5 articles sent to the user's local news server are never read by any users fromthat news server. Care should be taken to ensure that total traffic increases not more than linearly with the number of available servers. For example, a solution wherein every server periodically sends server status information directly to all other servers is not scalable, since it requires O (N2) messages10 to be sent where N is the number of servers.

For the purposes of this discussion, "robustness" means that the distributed information system should not rely on a single server or a single system connection to work at all times, nor should it assume that all servers of15 a given set are available at a given time. Prior art services such as Multi-Server Transaction, Master-Slave and Distributed Update control systems, as will be described in greater detail below, are all examples that do not scale inthis respect.

Preferably, the operation of the system should not rely on a single management entity. For example, the Internet's Domain Name Service is effective because its management is distributed. When one considers that the current Internet growth rate is approximately 3 million hosts per year or approximately 10,000 hosts per work day, centralized registration is not feasible. The scalable management requirement also suggests that configuration and reconfiguration of server to server commlmication paths should be automatic, as opposed to being managed by a single central service.

In the WWW data scheme documents are connected by links. The links are stored directly inside the documents, which offers the advantage of a 21 70~6q relatively simple server implementation. However, the absence of a separate link database not only limits the set of linkable document types and prohibits advanced user interfaces such as overview maps, three-dimensional navigation, and the like, it is also difficult to ensure ler~lellLial integrity of the WWW. For 5 example, removing a document requires parsing of all other documents to find all links pointing to that document. This is required to ensure that all the links are removed or at least the owners of the other documents informed. While such a tool would be conceivable for a local server, it is simply impossible to scan all WWW documents on all servers in the world, without the aid of pre-10 indexed link databases. Consequently there is no l~r~relllial integrity in today'sWWW, even between documents stored on the same server.

Interestingly, the more primitive, older technology Gopher system does m~int~in ler~lelllial integrity in the local server case. When a document 15 which is an ordinary file on the server's file system is deleted, moved or modified, the menu item that refers to it, which is a directory entry, is updated as well. In the Gopher system, this referential integrity is automatically takencare of by the underlying operating system. However, references to remote servers remain unsecure.
While both Gopher and WWW scale adequately with respect to the number of servers, documents, and links, there is a scalability problem withrespect to the number of users. For example, when a large number of users access the same document at the same time, the affected server and the system 25 region around it become overloaded. This phenomenon known as a "flash crowd" was observed during the 1994 Winter Olympics in Lillehammer, when the Norwegian Oslonett provided the latest results and event photographs over the Web and was overloaded with information requests. Similar but smaller flash crowds often appear when a new service in announced on the National 21 70t~6~

Center for Supercomputing Applications (NCSA) "What's New" page or in relevant newsgroups.

One strategy for alleviating this problem is to use cache servers, 5 which keep local copies of documents which have been recently requested.
Repeated requests for the document result in users receiving the local copy instead of the original document at the ori~in~ting site. However, this strategydoes not work in two cases: when users access many different documents from a large data set such as an encyclopedia or a reference database; and, when the 10 documents are updated frequently.

When users access many different documents from a large data set, replication of the whole dataset is helpful. However, this would generally require: (a), moving from URLs to Uniform Resource Names or URNs, which 15 identify the document by its name or ID rather than location; and (b), when the documents are updated frequently, some update protocol would be required which ensures that caches are updated so that the latest version of the documentis delivered.

In the Hyper-G system as implemented by the present inventor, a database engine is employed to m~int~in meta-information about documents as well as their relationships to each other, including, but not restricted to, links. R~r~relltial integrity can easily be m~int~ined for local documents as the links are stored in a link database and not in the documents themselves.
Modifications of documents or their relationships are only possible via the Hyper-G server.

One advantage of the link database is that links are bidirectional, such that users can find the document from the destination and vice versa. In order to m~int~in bidirectionality, when a link spans more than one physical 21 70~6~

server boundary, both servers store the link information as well as replicas of the linked document's meta-information. This implies that all updates related to the documents and the link in question have to be performed on both servers in order to m~int~in ler~lelltial integrity, therefore requiring a form of update 5 protocol between servers.

A prior art attempt at a solution for this update problem is what is typically known as the multi-server tr~n~ction method or collaborating servers method. This method has been implemented in the Xerox Distributed 10 File System and an early version of the Hyper-G system. When a document on one server has to be modified or deleted, the server storing the document acts as a coordinator by cont~cting and updating all other recipient servers which store replicas of the document. Once all the recipient servers have acknowledged the receipt of the update, the coordinator then instructs them to 15 make the change permanent, thereby completing the transaction.

However, in some situations the multi-server transaction method has scalability problems. For example, when a large number of servers reference a specific document by pointing a link to it or by replicating it, all of 20 the servers must be informed and acknowledge the update before it can be performed. This is a significant disadvantage in that this method considerably increases system traffic. Further, this method also requires that all servers involved have to be on-line or the transaction cannot be completed. As the number of participating servers increases, the probability that all of them are 25 on-line at any given moment approaches zero. Therefore by employing this method for m~int~ining lerelelltial integrity, it becomes practically impossibleto modify a heavily-referenced document.

Another prior art attempt at m~int~ining referential integrity is what is known as a Master/Slave System comprising one primary server (the master) and a plurality of secondary servers (the slaves). The master server holds a master copy of the replicated document and services all update requests.5 The plurality of slave servers are updated by receiving notification of changes from the master server or by downloading copies of the master copy. Users may read the document from both the master and slave servers, but write only to the master copy of the document.

This scheme is well-suited to applications where documents are read frequently and updates happen only infrequently. For example, the Sun Yellow Pages (YP) service, now known as System Information name Service (NIS), is a master/slave system.

The master server simplifies conflicts between update requests and m~int~in~ refelell~ial integrity. However, one disadvantage is that the master server has to be up and online in order to perform updates.

Yet another prior art attempt at m~int~ining l~Ç~lell~ial integrity is what is known as a the Distributed Update Control method. This method allows any server holding a copy of a document to perform updates on it, without the use of a single coor lin~tin~ server, even when some servers are unreachable, and without the possibility for conflicts.

One requirement of this method is that any one server is knowledgable of all the other servers, known as a server-set, m~int~ining copiesof the document. Ideally, all the document copies should be identical, but due to system failures and performance problems, it may not be possible or desirable to immediately notify all servers of an update. Instead, a weak consistency is adopted in which all copies eventually converge to the same value at some time interval upon completion of the updates.

However, a basic requirement of the method is to ensure that all 5 read requests are based on up-to-date copies and all updates are performed on the latest version. This is accomplished by a majority consensus in which updates are w~illen to some (greater than 50%) of the servers, selected at random, in the server-set. Prior to each read or write operation, the server that is in charge of performing the request polls the servers and requests the 10 documents version number or last modification time to identify the current version. When the majority has answered, it is assumed that at least one server holds the most up-to-date version. This is based on the principle that in the two majorities of successive polls, there is at least one common member.

The advantage of the Distributed Update Control method is its robustness as there is no single point of failure even if approximately 50% of the server-set fails. However, the primary disadvantage of this method is scalability, because the server-set for any document must be known to all servers in the set. Furthermore, greater than 50% of the servers in the set have20 to be contacted before every write or read operation. For example, if the server-set contains 1000 servers, a response from 501 servers is required. This requirement may be relaxed for read operations if the weak consistency approach is acceptable. However, it is m~n-l~tory for write operations to ensure that no confiicting updates can occur.
Harvest, as taught by Danzig et al. in "Replicating Services in Autonomously Managed Wide-Area Internetworks", 1994, is a relatively new Internet-based resource discovery system which supports a distributed "information gathering" architecture. "Gatherers" collect indexing information 30 from a resource, while "Brokers" provide an indexed query interface with the 21705~4 -gathered information. Brokers retrieve information from one or more Gatherers or other Brokers, and incrementally update their indexes. The idea is that Gatherers should be located close to the resources they index, while Brokers are located close to the users.

Harvest heavily relies on replication to achieve acceptable performance. The indexes created by the Gatherers are periodically replicated to the Brokers and as the indexes tend to be large, this has to be done efflciently. Harvest uses a technique called flooding for this purpose. Rather 10 than having a Gatherer send its indexes to all Brokers, they are sent to only k of them, for example k = 2. It is then the responsibility of the k chosen Brokers to distribute the indexes to another k each, and so on. While the total number of indexes that have to be transferred remains the same, flooding provides the advantage of distributing the system and server load over the 15 whole system.

The particular flood method employed by Harvest is called flood-d or flood-daemon. Flood-d allenl~L~ to minimi7e the system cost and propagation time of the flood by conl~ulillg a "cheap", k-connected logical 20 update topology based on bandwidth measurements of the underlying physical system. An important requirement is that this topology should not need m~ml~l configuration, but shall be computed and updated automatically. However, determining a good approximation of the optimal topology is computationally expensive, especially when a replication group becomes very large. A
25 replication group is a plurality of servers which replicate the same data.
Danzig et al. therefore suggests the use of a hierarchical scheme of smaller replication groups. However, Danzig et al. does not suggest a means for determining and updating this hierarchy automatically, ' _ 2170~64 The method of the present invention provides a solution to the above-identified problems by operating on the principle that a link database would be m~int~ined at every server. The function of the link database is to track all the links pointing to its associated server, i.e. links em~n~ting fromand/or pointing to a document(s) residing on the server. By employing this approach, the present invention offers several advantages. In particular, storing the links outside of the documents in a link database provides an efficient solution for the ~ngling link problem as will be described in greater detail below. Further, a link database also enables more advanced user interfaces for navigation in the information space, such as local maps and location feedback.

Furthermore, each of the plurality of servers m~int~in~ an ordered list of all servers in the distributed information system arranged in a predefined orientation. An update list is also m~int~ined at each of the plurality of servers which comprises a list of messages which reflect changes to the documents and /or links.

SUMMARY OF THE INVENTION
It is an object of the present invention to provide a novel method of prop~g~ting data through a distributed information system which obviates or mitig~tçs at least one of the disadvantages of the prior art systems.

According to one aspect of the present invention, there is provided a method of prop~g~tin~ data through a plurality of servers, comprising the steps of; i) each of the servers individually m~int~ining an ordered list of theservers in the distributed information system; ii) each of the servers individually m~int~ining a database cont~ining address data for locating remote data stored remotely which is referenced by local data stored locally at each of the servers;
iii) each of the servers individually m~int~ining a list of data reflecting changes 21 70~64 to the local data; and the remote data of other servers; iv) selecting a priority value (p) with which to transmit the lists of data wherein the priority is greater than or equal to 1; v) on a predetermined clock cycle, the servers each transmitting their respective lists of data to at least one server selected from5 the ordered list according to the priority value and deleting their lists of data;
vi) each of the at least one servers receiving and processing the lists, and appending the list to each of the at least one servers' list of local surface data;
and, vii) repeating steps v) through vii).

Preferably, the method of the present invention includes wherein in step i), the ordered list is arranged by an approximate geographical orientation of the plurality of server.

Also preferably, the approximate geographical orientation is circular.

Also preferably, the priority value is a real number in the range from about 1 to about 3.

Also preferably, when p=1 each of the servers transmit their respective lists to a server located adjacent to each of servers on the ordered list.

Also preferably, when p is an integer number greater than one, each of the servers transmit their respective lists to a the adjacent server andto p-1 other servers selected at random from the ordered list.

Also preferably, when p is a real number greater than one, each of the servers transmit their respective lists to the adjacent server, to the integer portion of p-1 other servers selected at random from the ordered list and to one -12-2170~6g other server selected at random with a probability equal to the decimal portion of p.

According to another aspect of the present invention there is provided a method of propag~ting object data from a server to a plurality of servers, comprising the steps of; i) the server m~int~ining an ordered list of the plurality of servers in the distributed information system; ii) the server m~int~ining a database cont~ining address data for locating remote objects stored at some of the plurality of file servers which are referenced by local objects stored at the server; iii) the server individually m~int~ining a list ofobject data including changes to the address data, local objects and remote objects; iv) selecting a priority value with which to transmit the list of object data wherein the priority is greater than or equal to 1; v) on a predetermined clock cycle, the server tr~n.~mitting the list of object data to at least one server selected from the ordered list according to the priority value; vi) the server deleting the list of object data once acknowledgement of the transmission is received from the at least one server; vii) the at least one server receiving the list of object data and updating the database and the local objects with the object data, and appending the at least one servers' list of object data with the received list of object data; and, viii) repeating steps v) through viii).

According to another aspect of the present invention there is provided a method of m~int~inin~ l~r~lelltial integrity of a plurality of links and documents by prop~gating updates from a server to a plurality of servers, comprising the steps of; i) the server m~int~ining an ordered list of the plurality of servers in the distributed information system; ii) the server m~int~ining a link database cont~inin~ the plurality of links for locating remote documents stored remotely which are referenced by documents stored locally at the server;
iii) the server m~int~ining an update list including messages reflecting changes 21 70~6q to local documents and links and remote documents and links; iv) selecting a priority value (p) with which to transmit the update list wherein the priority value is a real number greater than or equal to l; v) on a predetermined clock cycle, the server tr~n~mitting the update list according to the priority value 5 wherein, the server tr~n~mitc the update list to a receiving server located adjacent to it on the ordered list, to an integer portion of p-l other receivingservers selected at random from the ordered list and to another receiving serverselected at random from the ordered list with a probability equal to a decimal portion of p; vi) the receiving servers updating their link databases and the 10 locally stored documents with messages from the update list and appending thereceiving servers' respective lists of object data with the received list of object data; and, vii) repeating steps v) through vii).

BRIEF DESCRIPTION OF THE DRAWINGS
A presently ~l~r~lled embodiment of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 shows a link map of a system which exemplifies the relationship of a plurality of servers, server boundaries, and links between a plurality of documents;

Figure 2 shows an shows a portion of the method of the present invention for flooding server with data such as update lists in accordance with an embodiment of the present invention;

2170~
Figure 3a shows a plot which measures propagation delay by plotting percentage of servers versus cycles, for different values of p for a ideal condition sim~ tion according to the method of the present invention;

Figure 3b, shows a plot which measures the effect of average update list size on cycle time by plotting number of messages versus cycles for different values of p according to the simlll~tion of Figure 3a;

Figure 3c shows a plot which measures system traffic by plotting the number of messages of messages versus cycles for different values of p according to the simlll~tion of Figure 3a;

Figure 4 shows a plot of cycles versus number of servers for updating 50% and 99% of the servers on the system according to the simulation of Figure 3a;

Figure 5 shows a logarithmic plot of propagation delay versus priority p for updating 50% and 99% of the servers according to the simulation of Figure 3a;
Figure 6a shows a plot exemplifying the effect on propagation delay by plotting cycles versus soft error rate under a "real world" condition simulation according to the method of the present invention;

Figure 6b shows a plot exemplifying the effect on system traffic by plotting % optimum versus soft error rate according to the simulation conditions of Figure 6a;

Figure 7a shows a plot of time cycles versus MTTR cycles illustrating the effect propagation delay for updating 50% and 99% of the servers in accordance with the simulation of Figure 6a;

Figure 7b shows a plot of % optimum versus MTTR cycles illustrating the effect on system tra~ic delay for updating 50% and 99% of the servers in accordance with the simulation of Figure 6a;

Figure 8a shows a plot of % of update lists processed versus time in cycles illustrating the effect on the number of updates processed according to a second "real world" sim~ tion condition for p = 1.5, N = 1000, m =
1000;

Figure 8b shows a plot of number of messages in an update list versus time in cycles illustrating the effect on average list size according to the sim~ tion of Figure 8a;

Figure 8c shows a plot of number of messages in an update list ( in thousands) versus time in cycles illustrating the effect on system traffic for updates sent and updates sent and acknowledged according to the simlll~tion of Figure 8a;

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

A method of prop~ting data in a computer implemented distributed information system, in accordance with an embodiment of the present invention is described in detail below. While the method of the present invention may be used to propagate various types of data, data will hereinafter be referred to as updates, update messages or update lists. However, prior to a discussion of the method of the present invention, a brief description of _ surface topology as relating to servers and document is required. Figure 1 shows a link map of a portion of a system, indicated generally at 10, which exemplifies the relationship of a plurality of servers, server boundaries, and links between a plurality of documents.

For purposes of illustration, three servers 14, 18 and 22 are shown in Figure 1, their boundaries indicated by the dashed lines. However, as would be apparent to those of skill in the art, the number of servers on a distributed system such as the Internet is virtually unlimited. The plurality of10 links which span server boundaries are indicated at 26a through 26g, hereinafter referred to as surface links. Documents having links which span server boundaries are called surface documents and are indicated for example at 30, 34 and 38, shown as solid dots. A plurality of core documents 42, 46 and 50 and their respective core links 54, 58 and 62 reside within and m~int~in 15 links locally within servers 14, 18 and 22 respectively. Although not apparent from Figure 1, a server's surface will typically be small compared to its core.

In general, a link is an address or path description from document to another. In some instances a document may have a link attached which 20 addresses or references an identical copy or replication of the same documentstored at another location (another server). In other instances, a link may reference another document stored locally or remotely and therefore require an address path between documents for the purposes of establishing a connection when either document is retrieved by a user. For the purposes of this 25 discussion, addresses or address path will be hereinafter referred to as surface links, core links or simply links when lerel~ g to both, as previously described.

In order to m~int~in the property of bidirectional links, surface 30 link information of surface links 26a through 26g are stored in a link database 2170~ 6~1 (not shown) attached to each of servers 14, 18 and 22. The link database stores all surface link information for surface documents stored locally at the server.For example, the link database at server 18 would store surface link information for surface links 26a through 26d as associated with surface 5 documents 34, 35 and 36. For increased performance, servers 14, 18 and 22 also keep replicas of the other surface document's meta-information. For example, server 14 stores surface document 30 plus a replica of surface document 34 meta-information and the surface link 26b between them. Server 18 stores surface document 34 plus replicas of surface documents 30 and 38, 10 meta-information and the surface links. Surface link 26b, from surface documents 34 to 30 is stored in the link database at each of files servers 14 and 18. Surface link 26c, from surface document 34 to 38 is stored in the link databases at each of servers 18 and 22.

By employing this approach, surface documents residing at different servers are interconnected as tightly as the core documents residing on a single server. This bidirectional nature of the surface links enables more advanced navigation techniques as surface link 26a through 26g can be computed and displayed to a user in the form of a link map similar to the 20 illustration of Figure 1. Furthermore, the method of the present invention also simplifies maintenance of the distributed system. For example, when a user chooses to remove surface document 34, the link database can inform the user that removal of surface document 34 will affect surface document 30 on server 14 and surface document 38 on server 22. The user may either employ this 25 information to m~ml~lly modify the affected surface documents 30 and 38 and surface links 26b, 26c respectively or permit server 18 to automatically ensure that at least surface links 26b and 26c are removed.

R~r~ ial integrity is herein understood to mean that documents 30 and links on the distributed system are m~int~ined up-to-date. The underlying 2170~4 problem with regard to m~int~ining lefelelllial integrity is the method with which affected servers are propagated with update data informing them of changes to surface documents and /or surface links. As previously mentioned, an earlier implementation of the Hyper-G system employed the multi-server 5 transaction method and knowledge about the affected surface documents to directly engage other servers in order to remove surface document 34 and all surface links 26 to and from it. However, this approach was problematic when many servers participate in the transaction. Therefore, the present invention adopts a weak consistency approach, whereby it is accepted that the distributed 10 system may not display ler~lelltial integrity for a certain duration of time, but is intended to converge to a consistent state eventually. Of course, it is preferable that the duration of the inconsistency is kept to a minimllm.

It is also preferable for consistent updates that data propagation 15 may only take place from a pre-defined server. However, in contrast to the previously described master/slave method, this pre-defined server is not a single server for the entire distributed system. The server selected for performing thedata propagation depends on the surface document or surface link being modified, inserted and/or removed. For example, with regard to surface 20 documents, updates and subsequent propagation of update data is performed by the server which m~in~in~ the original surface document; for surface links, update data propagation is performed by the server which holds the surface document from which the link originates. With respect to partial system 10 of Figure 1, server 18 is responsible for updates of surface document 34, while 25 the surface link 26b from surface document 30 to surface document 34 would be updated by server 14(assuming that surface link 26b ori~in~te~l from surface document 30). This reduces the problem of overloading any given server, while elimin~ting the problem of conflicting updates as updates are sequential.
For security reasons, users wishing to update surface document 34 must have 2170~4 write permission for that surface document. Permission is therefore checked by server 18 m~int~ining the ori~in~tin~ document 34.

Updates of core documents 42, 46 and 50 or core links 54, 58 5 and 62 require no further action as integrity of the referential information is m~int~ined at the local link database. However, to m~int~in referential integrity, other servers need to be notified of updates of surface documents andsurface links or, in other words, changes to the server's surface.

The present inventors have developed a novel flood type method for prop~g~ting data thereby performing updates and m~int~ining referential integrity over a distributed system which exhibits the following advantages:
good scalability as the traffic generated does not depend on the number of surface links to the surface document requiring updating; recipient servers 15 receiving updates are not required to be available at update tirne; other types of data can be distributed, such as server addresses and statistics, m~int~iningthe consistency of replicas and caches etc.

In contrast with the conventional, previously described, flood-d 20 method which is optimized for minimi7ing the cost of the flood due to the large volumes of data handled, the flood method of the present invention can send update notifications which can be encoded in a few bytes. The preferred parameters of the flood method of the present invention, herein referred to as the p-flood method are as follows: speed; robustness; scalability; and 25 automation.

Speed is a pler~ d parameter because, as previously mentioned, the weak consistency approach is a condition, whereby it is accepted that the distributed system may not display referential integrity for a certain duration of 2I 705~4 time. Therefore, speed is required in order to minimi7e the duration of inconsistencies.

Robustness is a pl~rled parameter as a conventional protocol 5 employed for tr~n~mi~ion should ensure eventual delivery of every update to every server, even if some servers are temporarily unavailable. When a server that has been unavailable comes on-line, it should receive all the updates that were missed during its time off-line.

Scalability is a plererled parameter as the time taken to inform all servers of the update should not depend heavily on the number of servers requiring the update. Similarly, the amount of traffic generated by the propagation of the update should not depend heavily on the number of servers.
However, as every update must be sent to every server at least once, O(N) is 15 a lower bound for the total traffic generated, where N is the number of servers on the distributed system.

Automation is a pler~.led parameter because it is undesirable to configure flood paths m~ml~lly as is currently performed with conventional 20 Internet news services.

Priority is also a pler~ d parameter because it is intended that the protocol be used for other purposes such as attaching a priority parameter to every update that determines its acceptable propagation delay and bandwidth 25 consumption.

The p-flood method as implemented by the present inventors is a probabilistic method which fulfils the above-described parameters. Figure 2 shows a first few steps of the method of the present invention. A plurality of 30 servers are arranged in an approximate geographically circular arrangement as 21 70~6~

indicated at 104a through 104h. As will be understood by those of skill in the art, the present invention is not limit~l to the number of servers shown in Figure 2 and it is coll~elll~lated that other arrangements may be employed. The approximate geographically circular arrangement can be determined by sorting 5 the servers according to their Internet address as will be described in greater detail below. Each server 104 is aware of all other servers via a server list which is updated regularly and propagated throughout the distributed system to each of the plurality of servers 104. The server list is arranged in the order determined by the approximate geographically circular configuration.
Servers 104a through 104h accllmlll~te update messages which are generated either by the servers themselves, as a result of modification of a surface document or a surface link, or are received from other servers belonging to the server list. On a predetermined clock cycle, an update list 15 formed of the accumlll~te~l update messages is sent to p other servers where (p > 1). Priority p is a parameter which determines the propagation rate throughout the distributed system. For p= 1, the update list is sent only to theserver successor (hereinafter simply referred to as successor), as determined bythe adjacent position on the server list. The update list is further propagated 20 to p - 1 other servers that are chosen at random. For example, when p=3 the update list is propagated to the successor and to two (3-1) other server on the server list, selected at random. If p has a fractional or a decimal portion, theupdate is propagated to other servers at random with probability p - 1. For example, p= 1.3 results in one update list sent to the successor, and another 25 update transmitted with a probability of 0.3 to a server selected at random.
p = 3.2 means that the update is sent to the successor, two other servers at random plus one other server selected at random with a probability of 0.2.

One iteration of propagation based on a predetermined value of 30 p is called a cycle. Figure 2 shows one cycle of the p-flood method with p =

21 7056~

1.5. Note that at every cycle, the operation described above is performed by all servers 104a though 104h, m~int~ining their own update lists, in parallel.
Within the cycle time period every server performs one cycle however there is no requirement that the clocks of each of servers 104a through 104h be 5 synchronized. Also, at every step p N update lists are distributed on the server list.

The larger the value of p, the shorter the time it takes to reach all servers on the server list. However, larger values of p result in greater 10 traffic being generated. This is because of the random nature of the propagation, whereby the same update may be received more than once by a single server. The method allows for the assignment of different values of p to individual update lists, therefore, p identifies the priority of the update list.

After an update list has been successfully transmitted from a sending server to its' immediate successor, the receiving servers update their link databases or surface documents based on the update messages in the update list. Deletion of the update list by the sending server is not dependent on successful receipt by a randomly selected server. The sending server then 20 deletes the update message from its update list. For example with regard to servers 104a and 104b, once the update list is prop~g~te~l, the message is removed from the sending server's 104a update list. This prevents the update message from being retr~n.~mitte-l in future cycles. Updates are time-stamped using a per-server sequence number, so that duplicates can be discarded and 25 updates can be processed in the correct order by the receiving server. This ensures that update message(s) are removed after they have been received by all servers and m~int~in.~ a relatively short update list.

When a server is down or unreachable the update message(s) are 30 not discarded from the update list until it has successfully been sent to the 2l 7o564 successor. It is assumed that a reliable transport protocol like Tr~n~mi~.~ion Control Protocol or TCP is used and that receipt of the message is acknowledged. The message will remain on the update list until the successor comes on-line whereafter the accum~ te~l update list is sent by the originator.
5 Consequently, every sender server is responsible for delivering messages to its successor. The penalty is that when a successor is down for an extended period of time, the update list accllm~ tes.

Setting the priority p= 1 whereby the sending server only 10 tr~n.~mit~ the update list to the successor effectively blocks update messages in the event of a successor server being off-line and is therefore an undesirable priority value. Greater values of p not only increase the propagation rate of the messages significantly, but also contributes to the robustness of the method.
For example, in Figure 2, a crash of server 104b would not inhibit update lists 15 sent with p> lfrom server 104a being propagated to the other servers 104c through 104h on the system.

The present inventors have implemented the p-flood method in several extensive simlll~tions which model various "perfect world" and "real 20 world" situations and conditions, the results of which are outlined below. For the purposes of the simlll~tions, the behaviour of the parallel server processeswas modelled on a single computer. Under these conditions, for each cycle, the server propagated updates serially as opposed to operating in parallel.
However, by processing serially, the simnl~tions indicated that the serialization 25 of the servers could lead to wrong results. For example, for the situation shown in figure 2, processing the servers in the order 104a through 104h sequentially would always propagate all update lists to the servers in a single cycle. Subsequently, reality was modelled more closely by processing the servers in an order determined by a randomly chosen permutation of servers.
30 This enabled mimicking the random expiration of the server's timers.

21 70~64 However, it is reasonable to expect that under real world situations, update lists will transmit slightly slower, especially when the cycle time is not large compared to the average tr~n~mi~sion time of the update lists.

The first simulation conducted using the p-flood method modelled a "perfect world" situation in which all servers are on-line and reachable. Thissimulation was conducted to determine exactly how the weak consistency philosophy performed. For example, one variable observed was how long it 10 would take to arrive at a consistent state after update list propagation was termin~te~l. Another variable observed was how this time depends on the number of servers, the priority p, and how much traffic is generated over time.

Figures 3a through 3c illustrate the performance of p-~ood under lS the above described conditions. It is assumed that m update messages have been generated at the N di~elent servers 104a through 104N prior to commencing the simulation. As the propagation of update lists from sending servers to the successors and random servers was observed, it was determined that it does not matter whether all m updates are made on a single server or 20 whether they are distributed randomly over the N servers.

Figure 3a shows the propagation delay for flooding 1000 servers using different values of p. As shown in the Figure, a higher value of p gives faster propagation, for example whenp = 2 and N = 1000, 50% of the servers 25 are reached after about 4 cycles, 99~ after 7 cycles, and the server is typically updated after 10-13 cycles. However, the cost of a faster propagation rate is a higher load on the servers and system. Figure 3b shows the average size of the update list held at each server, and the Figure 3c shows the traffic in messages that is sent at each cycle.

21705S~

Figure 3c shows a graph of messages sent by all servers measured against time in cycles. As indicated in the Figure, as the number of messages increases the best performance is achieved with p =2 (ie. 600,000 messages in approximately 4 cycles).

As every message in the update list has to be sent to every server at least once, optimum traffic is achieved when each of the N servers transmits m messages i.e. optimum=m n Under ideal sim~ tion conditions, as indicated from the experiments above, the total traffic generated by p-flood is p-N m 10 messages or p-optimum This illustrates the fact that the p-flood method distributes traffic evenly over time and over the entire system as opposed to the conventional solution wherein every server simply sends all its updates to otherservers. The lower the value of p, the more "system friendly" the update list performs. However, as Figures 3a through 3c suggest, there exists a tradeoff 15 between fast propagation rates and peak system load. Consequently, the results of the sim~ tion indicate that an acceptable value of p lies somewhere between the values of 1 and 2.

Figure 4 shows the scalability parameter of the p-flood method 20 with respect to the number of servers in a given server list. As indicated in the figure, for p = 1.5, the time in cycles for the update list to reach 50 % and 99 %
of the servers is plotted against the number of servers. Figure 4 shows that thep-flood method displays logarithmic performance and further indicates that this method is well suited for use in very large server groups.
Figure 5 is a plot of propagation delay versus the priority p for an update list to reach 50% and 99% of the servers with the number of servers held constant at 1,000. For example, when p= 1, it takes approximately 500 cycles to reach all 1,000 servers. Whenp=2, it takes approximately 4 cycles 30 to reach the first 500 servers. As indicated by this simlll~tion, the clocks in 2170~61 each of the plurality of servers, which determine the cycle time, are not synchronized. When one sending server's timer expires, the update list is tr~n~mitte-l to the successor and other servers at random. The successor's timer will in general not expire a full cycle time later but rather after some 5 random time interval between zero and the cycle time of the sending server.
On average, the successor's timer will expire after half the sending server's cycle time which explains the observed effect.

As previously mentioned, one of the parameters of the p-10 flood method is robustness with respect to system and server failures. Prior toa discussion of robustness, a distinction should be made between soft or transient errors and hard or persistent errors. Soft errors are temporary systemerrors due to routing problems or system overload. This type of error occurs rather frequently on systems such as the Internet but usually for a short period15 of time. On the other hand, hard errors last longer and usually occur as a result of a hardware failure.

Figure 6a shows the effect of soft errors on propagation delay and the tr~n~mi~sion traffic generated. The propagation delay is measured as the 20 time for an update list to reach 50% and 99% of the servers. As indicated in the figure, propagation delay increases slowly with an increasing soft error rate.
A soft error rate of 10% indicates that for every cycle, 10% of the update propagation will fail, whereby 10% of the servers, selected at random, will be unreachable. During the next cycle, another random set of 10% will fail.
25 However, it is unlikely that the two sets will be identical.

As shown in Figure 6b, system traffic increases with increased soft error rate. The graph indicates that as the set of messages in the update list sent increases, the number of acknowledged messages remains 30 constant. The results of Figures 6a and 6b were acquired under the condition 21 70~64 whereby p=1.5 and N = 1,000. As the p-flood method detects duplicate messages and messages that arrive out of order, it is contemplated that a conventional protocol such as User Datagram Protocol or UDP could be used to transmit update lists and server lists. However, UDP is susceptible to soft errors such as packet dropping. Therefore, it is preferred that the TCP transferprotocol be used which is capable of repairing a large number of soft errors by itself. During a situation in which a server is temporarily unavailable, the unavailable state is usually detected during the opening of the connection and the update list will not be sent. This indicates that, for a protocol such as TCP, the messages sent plot shown in Figure 6b is not significant as the number of messages actually sent remains constant with respect to an increasing soft errorrate.

Hardware errors are usually described by two variables: mean time between failure (MTBF) and mean time to repair (M~R). On-line time is defined as the fraction of time the server is online and is given by the equation:
On-line time = MTBF
MTBF + MTTR
During "real world" simulations, MI~Fand Ml~R were measured in terms of cycles. MITR= 1 indicates soft errors whereas larger values of M17R indicate that a server is not available for a longer period of time and therefore a hard error. An assumption was made that all server are on-line over 90% of the time. At the be~inning of the sim~ tion, MlTR . (MTBF +
M7TR) servers are assumed to be off-line and the time off-line is chosen randomly between 0 and Ml-rR. The rem~ining servers are assigned an on-line time between 0 and Ml~F. A further assumption was made that the servers which are off-line also carry update lists which could have been accumulated prior to their off-line state. Throughout the "real world" simulation, servers 21 70~4 -that come on-line remain up for MTBF cycles and those that go off-line remain down for MITR cycles.

Figure 7a shows the effect on the propagation delay when cycles are plotted against the MITR under the conditions that on-line time remains constant at 90%, and M~R varies between 1 and 100. Due to the probabilistic nature of the p-flood method, there is very little effect on the rem~ining servers. This result is indicated in the figure in which the time to reach 50%
of the servers remains relatively constant. However, there is an impact on the time to reach 99% of the servers bec~ e only 90% the servers are available at any given time. Therefore the propagation delay increases approximately linearly with MITR and the number of messages sent also grows approximately linearly. Figure 7b further shows the effect of on system traffic when %
optimum is plotted against M7TR.

Figures 8a through 8c shows more detailed views of the effect of hard errors on the performance of the p-flood method. In order to make the effects clearly visible, 10% of the servers (100) are held off-line until cycle 50, at which point the servers are brought on-line simultaneously. The graphs of Figures 8a through 8c can be divided into three distinct phases. The first phaseoccurs up to approximately cycle 17. During phase 1, updates propagate only slightly slower than the p = 1.5 curve of Figure 3a through 3c under the "perfect world" simulation. Updates then level off when approximately 81%
of the updates are processed. This is due to the fact that 90% of the updates are then processed by 90% of the servers keeping in mind that the 10% of the servers which are unreachable also generate 10% of the updates.

The second phase occurs between cycles 18 and 49 during which the system is in a relatively stable state. As this state occurs until the 21 70S~

rem~ining 10% of the servers come on-line again, it is desirable to analyse the events occurring during this phase. The present inventors have observed that the set of servers which are off-line can be partitioned in three disjoint groups:
Group 1 are servers which are off-line. The fraction of these are S represented by the variable d, then the number of servers in this group is;
d n.
Throughout the presently described simnl~tion, d = 0.1, n =
1000, therefore d n = 100;
Group 2 servers are the predecessors of the servers that are down. As the probability of a predecessor also being down equals d, the predecessor therefore belongs to the previous group. The number of members of this group is therefore (d - d2) . n.
Again, throughout the presently described simulation, the number of servers in group 2 is 90; and Group 3 comprises the rem~ining servers, those which are on-line and not predecessors of any unavailable servers (group 2). The number of such servers in group three is:
(1 - d)2 n or in the case of the present example, 810.

The servers of group 3 are able to elimin~te their entire update list during the first phase because the message is deleted from the senders' update list once received by the successor. Therefore their update list size during phase 2 is zero. Group 2 servers retain messages destined for the group 1 members. The number of such messages is (1 - d) m at each server and the 21 70~5!~

unavailable servers carry m/n messages each. The average update list size, as p~otted in Figure 8b during phase 2 is therefore (d-d2) . n . (1-d) m + d n . m n = m . (d-2 d2+d3 +--) n n The servers of group 2 continuously attempt to send messages to 5 their successors which are group 1 members and to other random servers.
When p = 1.5, all group 2 servers send 0.5 update lists per cycle to random file-servers. The success rate of which has a probability of (1 - d)), plus one message per cycle to the server's successor which is guaranteed to fail. The servers belonging to groups 1 and 3 do not send messages. The total number 10 of messages sent during each cycle of phase 2 can therefore be calculated as: p(d-d2) n (1- d) m or in the case of the present example, 1.5 90 900 = 121,500. Of these messages, (1 - p)lp are successful or one third (40500). This calculation corresponds well with the simulated results shown in the graph of Figure 8c.
15 Once again, if TCP is employed as a transport protocol, the number of messages sent but not acknowledged is insignificant, due to the fact that a server which is off-line is discovered when a~lem~ g to open its connection.
No message is sent if the open request fails.

The third phase occurs at cycle 50, after the unavailable servers come on-line. The members of group 2 immediately succeed in prop~g~ting their update lists to their successors, which causes the dramatic effects as shown in Figures 8a through 8c. After cycle 50, only a few more cycles are required to propagate the updates that were kept by the unavailable servers to all rem~ining servers.

2170,~64 In allelllplillg to estimate the additional amount of system traffic that would occur when the described architecture is applied to a distributed information system, the present inventor assumed the following values for the previously described variables:
s 1,000 servers (n);

100 surface objects (documents and links) per server. An assumption is made that a total of 100,000 surface objects exist;
10% of the surface objects are updated per day. Therefore, a total of 10,000 updates messages (m) need to be propagated each day;

Note that while the traffic generated is dependent on the number of servers and surface documents, traffic does not depend on the number of users on the system.

Therefore, the total number of messages sent per day is p~ptimum, with optimum = n m or 107 messages where every message is 20 delivered to every server. One message is a approximately 10 bytes in length.Whenp = 1.5, generate system-wide traffic generated would be approximately 1.5 X 108 bytes (150 MB) per day, or 4.5 GB per month.

On the other hand, recent statistics based on the Natlonal Science Foundation network or NSFnet as of November 1994 reveal that approximately 22,462 GB are tr~n~mitted per month. If a further assumption is made that 25 % of the entire (non-local) Internet traffic passes through the NSFnet, wherethe entire traffic on the Internet is approximately 90,000 GB/month, the update messages resulting from implementing the p-flood method would generate an 2I 7056~

additional 0.005% of system traffic. Therefore the effect of applying such a method is negligible with regard to traffic.

Throughout an actual implementation of the p-flood method there S are several details which should be addressed. Due to the random selection of flood paths, updates propagate faster than the previously described cost-based selection but at a higher cost. However, the p-flood method chooses its propagation paths in both non-probabilistic (successor) and probabilistic (randomly) ways where the amount of randomness is controlled by the p 10 parameter. For reasonable values of p, as exemplified with the real-world sim~ tion as previously described, the majority of traffic transmits across the static circle of servers as depicted in Figure 2. Optimal arrangement of serversin this circle can vastly reduce system cost and delay, without sacrificing the advantages of fast propagation and robustness by employing random choice of 15 some of the flood paths.

Co~ ulhlg the best approximate geographical circle using actual bandwidth measurements would be difficult, as it implies gathering a fully connected matrix of bandwidth between servers. Furthermore, the 20 measurements would have to be repeated quite frequently, because global system ~ltili7~tion of the Internet changes with the time of the day.

Therefore, by applying a heuristic approach, the plurality of servers are sorted according to their reversed fully-qualified domain name. For 25 example, server i is the successor of server i - 1, and the first server is the successor of the last one. Sorting by reverse domain name (last character first)results in all servers in for example, Belgium (domain .be) being neighbours in the circle, followed by the servers in Germany (domain .de) and so forth.
Within Germany, the servers located in, e.g., Berlin will be neighbours 30 (domain -berlin.de). In the majority of cases, local connectivity is cheaper and faster than international connections. Therefore, this scheme will result in improved use of the available bandwidth. Furthermore, no computations or bandwidth measurements are necessary other than sorting.

When a server is added or removed from the server list, the p-~ood method, operating with a high priority p is used to notify all servers. Theservers modify their server list accordingly using the sort order previously described.

During propagation of server list updates where servers are added, removed or moved to a dilre~ l host, it is pl~r~ d that a server employs its old server list for flooding, until the message has been acknowledged by the successor. However, simple modifications of server attributes such as description, Internet address, e-mail of ~tlmini~trator etc. do not require suchprecautions.

When operating a large number of servers a catastrophic event such as a head crash on the server's disk may occur, which results in loss of information. In such an event, operation is resumed from a backup copy of the information base. If the backup copy is x days old, then the restarted server has lost all its updates over the last x days. Servers may also have a presentlyobsolete picture of the server's surface. For example, a new document may have been created less than n days ago in the server having a link pointing to (or away from) another document on another server. This document has now disappeared and consequently, the link has to be destroyed in order to keep a consistent state. In other words, the servers have to roll back to the state in existence x days ago.

In such a situation, the server may propagate a special message that contains a description of its entire surface (documents and links),
2 1 7 0 r~ ~ 4 requesting all servers to check this picture against their view of our server, and adjust their information about the server accordingly.

Under certain conditions an inconsistency on the Internet may S occur. For example, ler~lling back to Figure 1, assume that a new surface linkis placed from surface document 30 on server 14 to a core document 46 on server 18 thereby ch~nging the state of the document from core to surface. At about the same time, before the update message reflecting this operation arrivesat server 18, server 18 deletes the core document 46 which the surface link 10 intended to point to. As core document 46 is not on the surface there is no need to inform other servers about the deletion. Consequently, server 14 will not be notified and will keep its link.

Server 18 can detect this inconsistency when the update message 15 from server 14 eventually arrives, since the message is a request to create alink to a surface document 34. Server 18 will then propagate a "document removed" message for this non-existing document, as if it had been a surface document 34.

Alternatively, a choice may be made to endure such relatively rare inconsistencies for a while, and have all servers periodically propagate their entire surface, similar to post catastrophic events. This would serve as a fall-back mech~ni~m that deals with various types of inconsistencies and errors, including unforeseeable hardware and software errors in the update server. As these types of updates may be rather long, they should be sent infrequently and with low priority.

The present invention has been described with reference to a presently preferred embodiment. Other variations and embodiments of the present invention may be apparent to those of ordinary skill in the art.

_ 21 7D~6~

Accordingly, the scope of protection sought for the present invention is only limited as set out in the attached claims.

Claims (23)

What is claimed is:
1. In a computer implemented distributed information system, a method of propagating data through a plurality of servers, comprising the steps of;
i) each of the servers individually maintaining an ordered list of the servers in the distributed information system;
ii) each of the servers individually maintaining a database containing address data for locating remote data stored remotely which is referenced by local data stored locally at each of the servers;
iii) each of the servers individually maintaining a list of data reflecting changes to the local data and remote data;
iv) selecting a priority value (p) with which to transmit the list of data wherein the priority is a real number greater than or equal to 1;
v) on a predetermined clock cycle, each of the servers transmitting their respective list of data to at least one server selected from the ordered list according to the priority value;
vi) each of the servers deleting their respective list of data once acknowledgement of the transmission is received from the at least one server;
vii) each of the at least one server receiving their respective list of data and updating their database and their local data with the received list of data, and appending their list of data with the received list of data;
viii) repeating steps v) through viii).
2. The method according to claim 1 wherein in step i), the ordered list is arranged by an approximate geographical orientation of the plurality of server.
3. The method according to claim 2 wherein the approximate geographical orientation is circular.
4. The method of claim 1 wherein in step iv) the priority value is a real number in the range from about 1 to about 3.
5. The method of claim 1 wherein in step v), when p = 1 each of the servers transmit their respective lists to a server located adjacent to each of servers on the ordered list.
6. The method of claim 5 wherein in step v), when p is an integer number greater than one, each of the servers transmit their respective lists to a the adjacent server and to p-1 other servers selected at random from the ordered list.
7. The method of claim 6 wherein in step v), when p is a real number greater than one, each of the servers transmit their respective lists to the adjacent server, to the integer portion of p-1 other servers selected at random from the ordered list and to one other server selected at random with a probability equal to the decimal portion of p.
8. In a computer implemented distributed information system, a method of propagating object data from a server to a plurality of servers, comprising the steps of;
i) the server maintaining an ordered list of the plurality of servers in the distributed information system;
ii) the server maintaining a database containing address data for locating remote objects stored at some of the plurality of file servers which are referenced by local objects stored at the server;
iii) the server individually maintaining a list of object data including changes to the address data, local objects and remote objects;
iv) selecting a priority value with which to transmit the list of object data wherein the priority is greater than or equal to 1;

v) on a predetermined clock cycle, the server transmitting the list of object data to at least one server selected from the ordered list according to the priority value;
vi) the server deleting the list of object data once acknowledgement of the transmission is received from the at least one server;
vii) the at least one server receiving the list of object data and updating the database and the local objects with the object data, and appending the at least one servers' list of object data with the received list of object data;
viii) repeating steps v) through viii).
9. The method according to claim 8 wherein the plurality of servers individually perform the steps of i) through viii) in parallel.
10. The method according to claim 9 wherein in step v) each of the plurality of servers individually maintains their own predetermined clock cycle;
11. The method according to claim 10 wherein the individually maintained clock cycles are not synchronized across the distributed information system.
12. The method according to claim 9 wherein in step i), the ordered list is arranged by an approximate geographical orientation of the plurality of server.
13. The method according to claim 12 wherein the approximate geographical orientation is circular.
14. The method of claims 8 or 9 wherein in step iv) the priority value is a real number in the range from about 1 to about 3.
15. The method of claim 14 wherein in step v), when p = 1 the server transmits the list to a server located adjacent to it on the ordered list.
16. The method of claim 15 wherein in step v), when p is an integer number greater than one, the server transmits the list to the adjacent server and to p-1 other servers selected at random form the ordered list.
17. The method of claim 16 wherein in step v), when p is a real number greater than one, the server transmits the lists the adjacent server, to the integer portion of p-1 the other servers selected at random and to another server selected at random with a probability equal to the decimal portion of p.
18. In a computer implemented distributed information system, a method of maintaining referential integrity of a plurality of links and documents by propagating updates from a server to a plurality of servers, comprising the steps of;
i) the server maintaining an ordered list of the plurality of servers in the distributed information system;
ii) the server maintaining a link database containing the plurality of links for locating remote documents stored remotely which are referenced by documents stored locally at the server;
iii) the server maintaining an update list including messages reflecting changes to local documents and links and remote documents and links;
iv) selecting a priority value (p) with which to transmit the update list wherein the priority value is a real number greater than or equal tol;
v) on a predetermined clock cycle, the server transmitting the update list according to the priority value wherein, the server transmits the update list to a receiving server located adjacent to it on the ordered list, to an integer portion of p-1 other receiving servers selected at random from the ordered list and to another receiving server selected at random from the orderedlist with a probability equal to a fractional portion of p;
vi) the receiving servers updating their link databases and the locally stored documents with messages from the update list and appending the receiving servers' respective lists of object data with the received list of object data;
vii) repeating steps v) through vii).
19. The method according to claim 18 wherein each of the plurality of servers individually perform the steps of i) through vii) in parallel.
20. The method according to claim 19 wherein in step v) each of the plurality of servers individually maintains their own predetermined clock cycle;
21. The method according to claim 20 wherein the individually maintained clock cycles are not synchronized across the distributed information system.
22. The method according to claim 18 wherein in step i), the ordered list is arranged by an approximate geographical orientation of the plurality of server.
23. The method according to claim 22 wherein the approximate geographical orientation is circular.
CA002170564A 1996-02-28 1996-02-28 Method of propagating data through a distributed information network Abandoned CA2170564A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA002170564A CA2170564A1 (en) 1996-02-28 1996-02-28 Method of propagating data through a distributed information network
US08/608,302 US5805824A (en) 1996-02-28 1996-02-28 Method of propagating data through a distributed information system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CA002170564A CA2170564A1 (en) 1996-02-28 1996-02-28 Method of propagating data through a distributed information network
US08/608,302 US5805824A (en) 1996-02-28 1996-02-28 Method of propagating data through a distributed information system

Publications (1)

Publication Number Publication Date
CA2170564A1 true CA2170564A1 (en) 1997-08-29

Family

ID=25678356

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002170564A Abandoned CA2170564A1 (en) 1996-02-28 1996-02-28 Method of propagating data through a distributed information network

Country Status (2)

Country Link
US (1) US5805824A (en)
CA (1) CA2170564A1 (en)

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185619B1 (en) * 1996-12-09 2001-02-06 Genuity Inc. Method and apparatus for balancing the process load on network servers according to network and serve based policies
US5920695A (en) * 1997-01-10 1999-07-06 International Business Machines Corporation Method and means for bidirectional peer-coupled communication across a single ESCON interface
US5941944A (en) * 1997-03-03 1999-08-24 Microsoft Corporation Method for providing a substitute for a requested inaccessible object by identifying substantially similar objects using weights corresponding to object features
US6523022B1 (en) * 1997-06-09 2003-02-18 Allen Hobbs Method and apparatus for selectively augmenting retrieved information from a network resource
US6189043B1 (en) * 1997-06-09 2001-02-13 At&T Corp Dynamic cache replication in a internet environment through routers and servers utilizing a reverse tree generation
US6061686A (en) * 1997-06-26 2000-05-09 Digital Equipment Corporation Updating a copy of a remote document stored in a local computer system
US6154769A (en) * 1998-03-27 2000-11-28 Hewlett-Packard Company Scheduling server requests to decrease response time and increase server throughput
FR2777725B1 (en) * 1998-04-15 2003-02-21 Franck Jeannin COMPUTER SYSTEM FOR MANAGING LINKS AND METHOD FOR IMPLEMENTING SUCH SYSTEM
US6898791B1 (en) * 1998-04-21 2005-05-24 California Institute Of Technology Infospheres distributed object system
US6311216B1 (en) 1998-05-29 2001-10-30 Microsoft Corporation Method, computer program product, and system for client-side deterministic routing and URL lookup into a distributed cache of URLS
US6377991B1 (en) * 1998-05-29 2002-04-23 Microsoft Corporation Method, computer program product, and system for migrating URLs within a dynamically changing distributed cache of URLs
US6341311B1 (en) * 1998-05-29 2002-01-22 Microsoft Corporation Directing data object access requests in a distributed cache
US7765179B2 (en) * 1998-12-01 2010-07-27 Alcatel-Lucent Usa Inc. Method and apparatus for resolving domain names of persistent web resources
JP3674351B2 (en) * 1999-01-08 2005-07-20 富士通株式会社 Master server
US7062765B1 (en) 1999-05-25 2006-06-13 Realnetworks, Inc. System and method for updating information via a network
US6996627B1 (en) 1999-05-25 2006-02-07 Realnetworks, Inc. System and method for providing update information
US7039674B1 (en) * 1999-07-14 2006-05-02 Samsung Electronics Co., Ltd Method of changing program of network node remote from network management system
MXPA02004199A (en) * 1999-10-27 2004-04-21 Mci Worldcom Inc System and method for web mirroring.
US6574749B1 (en) * 1999-10-29 2003-06-03 Nortel Networks Limited Reliable distributed shared memory
US6973464B1 (en) * 1999-11-15 2005-12-06 Novell, Inc. Intelligent replication method
US6601066B1 (en) 1999-12-17 2003-07-29 General Electric Company Method and system for verifying hyperlinks
US6564233B1 (en) 1999-12-17 2003-05-13 Openwave Systems Inc. Server chaining system for usenet
US6804710B1 (en) * 2000-03-10 2004-10-12 Hitachi, Ltd. Configuration information management system, method, program, and program storage device
US20050021862A1 (en) * 2000-03-31 2005-01-27 Dickens Coal Llc Automatic selection of content-delivery provider using link mapping database
US6883168B1 (en) * 2000-06-21 2005-04-19 Microsoft Corporation Methods, systems, architectures and data structures for delivering software via a network
US7318107B1 (en) 2000-06-30 2008-01-08 Intel Corporation System and method for automatic stream fail-over
GB2365556B (en) * 2000-08-04 2005-04-27 Hewlett Packard Co Gateway device for remote file server services
US7117136B1 (en) 2000-08-18 2006-10-03 Linden Research, Inc. Input and feedback system
US7197555B1 (en) 2000-09-13 2007-03-27 Canon Kabushiki Kaisha Directory server tracking tool
US20020083124A1 (en) * 2000-10-04 2002-06-27 Knox Christopher R. Systems and methods for supporting the delivery of streamed content
US20020133398A1 (en) * 2001-01-31 2002-09-19 Microsoft Corporation System and method for delivering media
US7219122B1 (en) * 2001-04-23 2007-05-15 Massachusetts Institute Of Technology Software service handoff mechanism with a performance reliability improvement mechanism (PRIM) for a collaborative client-server system
JP4355124B2 (en) * 2002-01-31 2009-10-28 インターナショナル・ビジネス・マシーンズ・コーポレーション Entrance / exit management system, entrance / exit management method, program for executing entrance / exit management, and recording medium recording the program
US7257584B2 (en) * 2002-03-18 2007-08-14 Surgient, Inc. Server file management
US7096228B2 (en) * 2002-03-27 2006-08-22 Microsoft Corporation Method and system for managing data records on a computer network
US8612196B2 (en) * 2002-04-11 2013-12-17 Linden Research, Inc. System and method for distributed simulation in which different simulation servers simulate different regions of a simulation space
US7089323B2 (en) * 2002-06-21 2006-08-08 Microsoft Corporation Method for multicasting a message on a computer network
US20040003007A1 (en) * 2002-06-28 2004-01-01 Prall John M. Windows management instrument synchronized repository provider
US20040024807A1 (en) * 2002-07-31 2004-02-05 Microsoft Corporation Asynchronous updates of weakly consistent distributed state information
US7792683B2 (en) * 2002-11-07 2010-09-07 Siemens Industry, Inc. Method and system for address information distribution
US7739240B2 (en) * 2002-12-09 2010-06-15 Hewlett-Packard Development Company, L.P. Replication and replica management in a wide area file system
US7415672B1 (en) 2003-03-24 2008-08-19 Microsoft Corporation System and method for designing electronic forms
US7275216B2 (en) 2003-03-24 2007-09-25 Microsoft Corporation System and method for designing electronic forms and hierarchical schemas
US7913159B2 (en) * 2003-03-28 2011-03-22 Microsoft Corporation System and method for real-time validation of structured data files
US8078758B1 (en) * 2003-06-05 2011-12-13 Juniper Networks, Inc. Automatic configuration of source address filters within a network device
US7406660B1 (en) 2003-08-01 2008-07-29 Microsoft Corporation Mapping between structured data and a visual surface
US7334187B1 (en) 2003-08-06 2008-02-19 Microsoft Corporation Electronic form aggregation
US7769004B2 (en) * 2003-09-26 2010-08-03 Surgient, Inc. Network abstraction and isolation layer for masquerading machine identity of a computer
WO2006002215A1 (en) * 2004-06-23 2006-01-05 Qualcomm Incorporated Efficient classification of network packets
US7693840B1 (en) * 2004-07-30 2010-04-06 Sprint Communications Company L.P. Method and system for distribution of common elements
US7805517B2 (en) * 2004-09-15 2010-09-28 Cisco Technology, Inc. System and method for load balancing a communications network
US8487879B2 (en) 2004-10-29 2013-07-16 Microsoft Corporation Systems and methods for interacting with a computer through handwriting to a screen
US8200975B2 (en) 2005-06-29 2012-06-12 Microsoft Corporation Digital signatures for network forms
US8001459B2 (en) 2005-12-05 2011-08-16 Microsoft Corporation Enabling electronic documents for limited-capability computing devices
US8078728B1 (en) 2006-03-31 2011-12-13 Quest Software, Inc. Capacity pooling for application reservation and delivery
JP4862463B2 (en) * 2006-04-11 2012-01-25 ブラザー工業株式会社 Information communication system, content catalog information search method, node device, etc.
JP2007280303A (en) * 2006-04-11 2007-10-25 Brother Ind Ltd Information communication system, content catalogue information distribution method and node device
JP4655986B2 (en) * 2006-04-12 2011-03-23 ブラザー工業株式会社 Node device, storage control program, and information storage method
US20130110777A1 (en) * 2007-06-06 2013-05-02 Kunio Kamimura Synchronization of data edited in parallel
GB0724434D0 (en) * 2007-12-14 2008-01-30 Cambridge Silicon Radio Ltd Distributed arbitration
US8194674B1 (en) 2007-12-20 2012-06-05 Quest Software, Inc. System and method for aggregating communications and for translating between overlapping internal network addresses and unique external network addresses
US7920569B1 (en) * 2008-05-05 2011-04-05 Juniper Networks, Inc. Multi-link transport protocol translation
US7797445B2 (en) * 2008-06-26 2010-09-14 International Business Machines Corporation Dynamic network link selection for transmitting a message between compute nodes of a parallel computer
US8751925B1 (en) * 2010-04-05 2014-06-10 Facebook, Inc. Phased generation and delivery of structured documents
JP5907251B2 (en) * 2012-03-27 2016-04-26 富士通株式会社 Database management method, program, and information processing apparatus
US20160011944A1 (en) * 2014-07-10 2016-01-14 International Business Machines Corporation Storage and recovery of data objects

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5542047A (en) * 1991-04-23 1996-07-30 Texas Instruments Incorporated Distributed network monitoring system for monitoring node and link status

Also Published As

Publication number Publication date
US5805824A (en) 1998-09-08

Similar Documents

Publication Publication Date Title
US5805824A (en) Method of propagating data through a distributed information system
Kappe A scalable architecture for maintaining referential integrity in distributed information systems
US7653668B1 (en) Fault tolerant multi-stage data replication with relaxed coherency guarantees
US7739240B2 (en) Replication and replica management in a wide area file system
Hildrum et al. Distributed object location in a dynamic network
US8315975B2 (en) Symbiotic wide-area file system and method
Saito et al. Taming aggressive replication in the Pangaea wide-area file system
EP1351141B1 (en) Method and system for managing data records on a computer network
US6256634B1 (en) Method and system for purging tombstones for deleted data items in a replicated database
US8311980B2 (en) Namespace consistency for a wide-area file system
TWI476610B (en) Peer-to-peer redundant file server system and methods
JP5090450B2 (en) Method, program, and computer-readable medium for updating replicated data stored in a plurality of nodes organized in a hierarchy and linked via a network
US7822711B1 (en) Conflict resolution for a distributed file sharing system
US6760756B1 (en) Distributed virtual web cache implemented entirely in software
US7254617B2 (en) Distributed cache between servers of a network
EP3200431A2 (en) Massively scalable object storage system
US20130159366A1 (en) Data distribution system
Grönvall et al. The design of a multicast-based distributed file system
WO2001035211A2 (en) Synchronizing data among multiple devices in a peer-to-peer environment
US20080010299A1 (en) File management system
Amir et al. Practical wide-area database replication
Ekenstam et al. The Bengal database replication system
Yu et al. Granary: A sharing oriented distributed storage system
KR100399502B1 (en) Mass bibliography searching service system
Burkard Herodotus: A peer-to-peer web archival system

Legal Events

Date Code Title Description
FZDE Discontinued
FZDE Discontinued

Effective date: 20010228