WO2013071428A1 - System and method for data synchronization over a network - Google Patents

System and method for data synchronization over a network Download PDF

Info

Publication number
WO2013071428A1
WO2013071428A1 PCT/CA2012/050784 CA2012050784W WO2013071428A1 WO 2013071428 A1 WO2013071428 A1 WO 2013071428A1 CA 2012050784 W CA2012050784 W CA 2012050784W WO 2013071428 A1 WO2013071428 A1 WO 2013071428A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
file
version
files
difference
Prior art date
Application number
PCT/CA2012/050784
Other languages
French (fr)
Other versions
WO2013071428A8 (en
Inventor
Ram Sudama
Brad Moore
Balash Akbari
Charles Elliott
Michael Ye
Sam Demooy
Original Assignee
Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc. filed Critical Dassault Systemes Geovia Inc., Dba Gemcom Software International Inc.
Priority to AU2012339532A priority Critical patent/AU2012339532B2/en
Priority to CN201280065691.2A priority patent/CN104272649A/en
Publication of WO2013071428A1 publication Critical patent/WO2013071428A1/en
Publication of WO2013071428A8 publication Critical patent/WO2013071428A8/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the following relates generally to data communication over a network.
  • synchronizing information between these devices may be difficult if the network connection has low bandwidth or is only intermittently available.
  • the method comprises a first node establishing a connection with a second node, the first node having stored thereon one or more data files, each being associated with a version identifier, and a first synchronization list of data files to be synchronized.
  • the second node has stored thereon one or more corresponding data files, each being associated with a version identifier.
  • the first node determines, based on the version identifiers, whether a more recent version of each of the one or more data files on the first synchronization list exists on the second node. Upon determining that the second node comprises a more recent version, the first node obtains, from the second node, a difference file to update the data file on the first node.
  • the difference file represents the difference between the version of the data file on one node and the version of the data file on the other node.
  • at least two of a sequence of difference files are used to update the file on the first node; one of the at least two difference files representing the difference between the version of the data file on the second node and an intermediate version.
  • the synchronization list may comprise data files selected by a user. Data files associated with those selected by the user may also be included in the synchronization list.
  • the second node may further comprise a second synchronization list of data files to be synchronized.
  • the second node may determine whether the first node comprises a more recent version of each of the files on the second synchronization list.
  • the second node obtains, from the first node, a difference file to update the data file on the second node.
  • a priority ranking may be with each of the files on the synchronization list, wherein the data files are synchronized according to the priority ranking.
  • the priority ranking may be generated based on metadata associated with each of the data files.
  • the priority ranking may be generated based on the magnitude of difference between the version identifier of the data file on the first node and the more recent version of the data file on the second node.
  • a reference file for each of the one or more data files is also stored on the first node.
  • a modification detection module on the first node determines the degree to which each of the data files on the first synchronization list differ with respect to the respective reference file.
  • the priority ranking is generated based on the magnitude of difference between the data file and the reference file.
  • FIG. 1 is a block diagram illustrating a system in accordance with the present invention
  • FIG. 2 is a block diagram illustrating a node
  • FIG. 3 is a flow diagram illustrating a process of a node updating a file on a server
  • FIG. 4 is a flow diagram illustrating generating a difference file
  • FIG. 5 is a flow diagram illustrating a version history of an example file
  • FIG. 6 is a flow diagram illustrating an example process of a node obtaining, from a server, a more recent version of a file stored on the node;
  • FIG. 7 is an example diagram illustrating various types of data and links therebetween relevant to a mining operation.
  • the file management system enables transfer and synchronization of files over a network to enable data communication between two or more devices.
  • the network may be one that is unreliable or reliable and may exhibit characteristics comprising low bandwidth and/or low quality of service (QoS).
  • QoS quality of service
  • a node 10 may be a computer device, such as a desktop computer, a laptop computer, a mobile device such as a smartphone, a network-enabled piece of industrial equipment (e.g. an automated drill), a network-enabled piece of sensing equipment (e.g. an aerial gravimeter), a rack-mount server, a cloud-based server, or any other network-enabled computing device.
  • the node 10 may comprise, or be linked to, a processor 9 and a memory 8.
  • the memory 8 may have stored thereon computer instructions which, when executed by the processor 9, provide the functionality of the file management system as described herein.
  • the node 10 may further comprise, or be linked to, a transceiver 15 for communicating with a network 14, for example, an intranet or the Internet. Further nodes 10, being substantially similar to the aforementioned node 10, can also be linked to the network. Each node 10 may be operable to communicate with one or more other nodes 10 via the network.
  • the node 10 may be user-controllable or automatically controlled by computer executable instructions.
  • Each node 10 may further comprise, or be linked to, a data store 16.
  • the data store 16 is operable to have stored thereon at least one file.
  • the at least one file may comprise at least one reference file, and at least one difference file associated with each reference file.
  • a reference file is a file that can be considered a complete file, in that if a node 10 provides the reference file to another node 10, the other node 10 can receive and read the reference file for the purposes of operating upon it (i.e., opening file, modifying the file, etc.).
  • a difference file is a file providing information sufficient to generate a file that can be operated upon when used in conjunction with a reference file.
  • a difference file may map the differences between a first file and a second file.
  • a node 10 having received a difference file and a reference file can recover, from the difference file, information relating to modifications that have been made relative to the reference file.
  • the node 10, having the reference file and the difference file can generate a modified file corresponding to the modifications made relative to the reference file and can operate upon the modified file.
  • a differencing file which embodies the differences between two or more generations of file revisions, may also be used to generate a modified file based on a reference file.
  • a differencing file is a subset of a difference file which can be generated by combining two or more in a sequence of difference files.
  • a difference file may be substantially smaller than the size of a reference file.
  • Transferring only a difference file, rather than the entire reference file between nodes 10 over the network 14 may substantially reduce the amount of data that must be transferred over the network 14, which may enable faster and more efficient synchronization of files.
  • one of the nodes 10 may be referred to as a "server” for the purposes of storing one or more particular reference files and associated difference files.
  • the first node may act as a server for a first file or first group of files while the second server may act as a server for a second file or a second group of files.
  • server or “server node” identifies which node is acting as the server in a particular exchange. It will be understood that the server may act as a node other than the server in a different exchange.
  • a plurality of nodes 10 may each be referred to as a server, wherein each node 10 may be a server of particular reference files and associated difference files.
  • a first reference file may be provided on a first node 10 which is the server of the first reference file
  • a second reference file may be provided on a second node 10 which is the server of the second reference file, and so on for any number of reference files.
  • a third node may act as a server for a third file but act as a node other than the server for the first and second files.
  • any particular node 10 could also be the server of a plurality of such reference files.
  • the server may track and store data associated with files, for example, the version number of the file. Newer versions of a file may comprise incremental changes from an older version of the file. Version information controlled by the server may be accessed by the nodes 10. By centralizing storage of the version number for a particular file, the server is operable to determine whether each node 10 is operating upon the most recent version of that file.
  • At least one of the nodes 10 further comprises, or is linked to, a database 17.
  • the database 17 may be used to store information relating to files stored on the data store 16.
  • a server may store on its database 17 a version number for each reference file (which may correspond to the number of associated difference files for that reference file) stored on its data store 16, as well as one or more annotations for each such reference file and/or difference file. These annotations may be implemented by metatags associated with such file.
  • Nodes other than the server may also comprise or be linked to a database for storing information relating to files stored on their respective data stores 16, for example version numbers of such files.
  • each node 10 includes a differencing module 20 and a synchronization module 22.
  • Node 10 may further comprise a modification detection module 28, a compression module 24, a search module 26 and a file monitor 30.
  • the differencing module 20 is operable to generate a difference file from a reference file and modifications made relative to the reference file, which may be embodied in a permanently or temporarily stored modified file.
  • the modified file need not be stored on the date store at any time, though it may be stored in memory 8 while the node 10 is operating upon it. However, it will be appreciated that the modified file may also be stored on the data store in some embodiments.
  • the differencing module 20 is also operable to apply the difference file to the corresponding reference file (or a differencing file) to generate a modified file to be operated upon.
  • the synchronization module 22 may be operable to facilitate the synchronization of files between a node 10 and a server.
  • the synchronization module 22 may cooperate with the node's transceiver 15 to enable a node 10 to communicate (receive and transmit) reference files, difference files and differencing files with another node 10.
  • the node's modification detection module 24 is operable to detect whether modifications have been made to a file on node 10 regardless of whether the node 10 is connected to a server over network 14.
  • the node 10 may comprise a compression module 24 to compress and decompress reference files and difference files to reduce the size of such files.
  • Compression of files may be provided to reduce the amount of data that must be sent over the network 14, thereby reducing the time required to send a file.
  • the node's search module 26 may be operable to perform searches for files regardless of the connectivity state of the node 10 to the network 14. For example, the node's search module 26 may enable searching for files on other nodes despite an unreliable connection using metadata, as will be further described herein. Typically, performing a search from a node 10 for information on a server requires a connection between the server and the node 10. Over an unreliable network, the connection may not always be available, which may cause long search times and timed-out searches. [0042] In operation, in one example, a node 10 (or a user of the node 10) may request to access a particular file. The node's synchronization module 22 determines whether a version of the requested file is stored on the node's data store 16.
  • FIG. 3 an example process for a node obtaining a file from a server, updating the file, and providing the updated version of the file to the server is shown.
  • access to a file at the node is requested.
  • the synchronization module on the node determines that file does not exist in the node's data store.
  • the node further determines the server for the requested file and requests the file from that server in 1 14.
  • the server accesses its database 17 to determine the version number of the requested file, and correspondingly accesses its data store 16 to retrieve the requested file.
  • the synchronization module of the server, in 1 16 provides to the node the file that the node has requested, as well as the associated version information.
  • the node stores the file in the node's data store 16 as a reference file and stores the corresponding version information in the database 17 of the node.
  • the node generates a modified file by making modifications to the reference file and stores, in the database 17, version information associated with the file.
  • 120 could comprise a user making an addition to the file and storing the modified file as a new version.
  • differencing module 20 generates a difference file based on the modified file and the reference file stored in 1 18.
  • the node provides the server with the difference file and associated version information. Upon obtaining the difference file, the server updates the reference file based on the difference file and saves the updated file as a new version in 126.
  • the server node's synchronization module 22 accesses its database 17 to determine the most recent version number of the requested file, and correspondingly accesses its data store 16 to retrieve the reference file and one or more associated difference files.
  • the number of difference files may be derived from the version number. For example, a modified file generated from a reference file associated with four difference files may have a version number of five.
  • the server then provides the reference file and the associated difference files to the node, where the differencing module 20 of the node generates a first modified file based on the reference file and the associated difference files and stores the first modified file in 1 18.
  • the differencing module of the node Upon the node further modifying the file to produce a second modified file, the differencing module of the node would then generate an additional difference file which maps the differences between the first modified file and the second modified file. Similarly to the above description, the node would then provide the server with the difference file in 124. The server may then apply the difference file to a reference file or simply store the difference file to be shared with other nodes.
  • the first node's synchronization module 22 determines its corresponding version number, for example, by accessing the first node's database 17 which stores the version number for each file stored on the first node's data store 16. The first node's
  • the synchronization module 22 transmits the version number to the node functioning as a server.
  • the server node's synchronization module 22 accesses its database 17 to determine the version number corresponding to the requested file on the server's data store 16. If the version numbers of the first node and server node are identical, the server node need not transmit any file to the first node 10. If the version numbers differ, the server node's synchronization module 22 directs its differencing module 20 to generate a differencing file corresponding to the set of difference files for the intervening version numbers. The server's synchronization module 22 may transmit the differencing file 20 and version number to the node's synchronization module 22.
  • the node's differencing module 20 then generates a modified file by applying the differencing file to its version of the file.
  • the first node 10 stores the modified file in its data store 16 and stores the version number in its database 17.
  • the first node 10 may overwrite its previous version of the requested file with the modified file.
  • the server node may store in its database 17 an indicator that the node has accessed the file.
  • the node 10 may operate upon a file to create a further modified file. Once the node 10 has finished modifying the file, resulting in a further modified file, the differencing module 20 constructs a difference file based on the further modified file and the (received) modified file. The node 10 stores the further modified file on its data store 16 and an updated version number on its database 17. The node 10 may overwrite the (received) modified file with the further modified file on its data store 16. The node's synchronization module 22 transmits the difference file to the server node's synchronization module 22. The server node's differencing module 20 may save the difference file to its data store 17 and update the version number corresponding to that file on its database 16.
  • the file management system may further enable a first node 10 and a second node 10 to synchronize a file through an intermediary server. Once modifications are made on a first node 10, the updated file may be provided to the second node 10 via the intermediary server.
  • the server 10 may be done, for example, by the node 10 providing the difference file to a server in accordance with the foregoing, the server determining from its database other nodes that have accessed the file (in this case, the second node), the server 16 updating the version number of the file in accordance with the foregoing, the server requesting that the second node 10 provide it with its version number for the file, and the server 16 correspondingly generating a differencing file to enable the second node 10 to generate a modified file in accordance with the foregoing.
  • the difference file is provided to the second node in a push model, rather than as a response to a request by the second node.
  • Such a request from the node acting as a server to any one or more nodes may be provided as follows.
  • the server's synchronization module 22 transmits a notification to other nodes 10 that have previously accessed the reference file associated with a received difference file (from a node 10). Any such other node's synchronization module 22 may correspondingly determine the version number of its corresponding copy of the file and send the version number to the server.
  • the server's synchronization module 22 transmits to each node 10 a differencing file to update the node's file to the current version.
  • any two or more of such nodes 10 may receive a distinct differencing file as they may not have been previously updated to the same version number, for example if any such nodes 10 were offline at the previous update.
  • Such nodes' differencing module 20 may update such file on its data store, along with a corresponding version number on its database when it receives the differencing file.
  • the file management system also enables a node 10, or a user operating a node 10, to view the actual contents of the file, as well as the metatags associated with the state of the locally modified file and the state of the file on a server.
  • the server may be out of contact for a period of time while the file is being modified at the node 10.
  • a reference file preferably corresponds to the file on the server at the time that the file was most recently synchronized.
  • a difference file may be generated on the first node based on the reference file and a version of the file that was modified on the first node. For example, if the first node obtained a reference file from the server, the first node may modify the reference file to form a modified file. A difference file may then be generated on the first node based on the modified file and the reference file.
  • the modification detection module 24 may be operable to determine whether the reference file differs from a more recent version of the file on the node 10.
  • the modification detection module 24 may provide updated information regarding the state of the local file with respect to the state of a reference file in the temporary absence of a network connection.
  • the modification detection module 24 may also provide an indication to a user that the reference file differs from a modified file, as is described below.
  • the file on the server may have been modified since the last synchronization by a second node, the first node or a user at the first node may be able to determine, from the indication, which of the files have been modified at the node since the time of the last synchronization.
  • the node may be provided with a display, the display being operable to provide a displayed list of files on a node.
  • the list of files may be provided with a visual indication that provides the user with an indication that the version of the file on the node differs from the reference file, based on information received from the modification detection module 24.
  • the indication that the file on the node differs from the reference file may comprise, for example, a pair of arrows that point in opposite directions when the files differ.
  • the indication may also comprise further details, for example, the date and time that the file was last modified as well as the date and time that the last synchronization occurred, or a percentage outlining what percent of the blocks in the file are identical. Other indications that provide the user with a sensory experience based on differences in the files may also be possible.
  • the file management system may further provide compression of files.
  • the amount of time required to transfer the file between nodes 10 over the network 14 may be inconveniently or prohibitively long.
  • a node's compression module 24 may compress the file that is being transmitted by applying a compression algorithm.
  • the compression algorithm may comprise, for example, VCDIFF, another format for encoding compressed and/or differencing data, or any appropriate compression algorithms for the types of files being transmitted.
  • the differencing module 20 segments a reference file into a plurality of blocks.
  • the size of the blocks may be determined based on preconfigured segmenting parameters which the differencing module 20 may adaptively adjust. Examples of block sizes may be 4092 bytes, 8192 bytes, etc.
  • the differencing module 20 may compute a hash of each block, and assign an identifier (e.g. a number) to each of the blocks in step 38.
  • the differencing module 20 may segment the corresponding modified file in step 40, assign an identifier (e.g. a number) and compute the hash of each of its blocks in step 42.
  • the hash of the blocks of the modified file may then be compared with the hash of the blocks of the reference file in step 44.
  • the differencing module 20 determines which blocks have been modified between the modified file and the reference file in step 46.
  • the differencing module 20 may generate a difference file comprising the modified blocks and modified block identifiers in step 48. It will be appreciated that the differencing module 20 may alternatively compare the blocks without hashing. It will also be appreciated that certain of the foregoing steps can be performed in different sequences without an affect on functionality.
  • the differencing module 20 may segment blocks by setting markers based on the contents of the file, enabling a modified file to be compared to a reference file even if it comprises significant rearrangement. For example, each block may be hashed. The hashes of one file may then be compared with the hashes of another file to determine which blocks are identical and which blocks must be transmitted in a difference file over the network 14, to synchronize the files.
  • the hashes are compared locally by a node's differencing module 20 rather than at a server, however the hashes may be transmitted to the server for comparison by its differencing module 20. Since only those blocks that have been changed may need to be transferred over the network 14, the generated difference file may comprise the data located in these blocks.
  • the differencing module 20 may generate and cache the difference file locally before the synchronization process is initiated. For example, if the differencing module 20 generates the difference file on a node 10, the difference file may be cached on the node's memory 8 prior to the synchronization process being initiated.
  • the difference files may be generated by the node prior to re-establishing the network connection.
  • the node may generate the difference files in the background while performing other tasks.
  • the difference files have been generated in full or in part, expediting transmission over the network.
  • the computational steps required once the network connection is reestablished may be significantly lower than if the difference files had been generated at the time that the network is re-established.
  • the synchronization process may be completed more quickly, with a lower risk of again losing the connection during synchronization, and with less interruption to other activities that require use of the node's processor and must take place while the network is established.
  • the differencing module 20 may also generate a difference file as the modifications to the file are taking place.
  • the synchronization module 22 may apply a break and resume transfer algorithm to continue synchronization when the network connection is re-established.
  • the break and resume transfer algorithm may be any algorithm enabling a file to be transferred where it has previously not been transferred or only partially transferred.
  • some files may be associated with other files.
  • an executable program may require an input from a spreadsheet. If the executable file is synchronized but the spreadsheet is not, the executable file may not have access to the required input value.
  • a first file may comprise an executable that retrieves data from a comma separated value (.csv) file to perform a pre-determined calculation. If the node's synchronization module 22 is set to synchronize the executable file, all related data, including the .csv file, may also be synchronized. As the executable file may not be of use without the most recent .csv file, associating the .csv file to the spreadsheet for the purposes of the synchronization process prevents erroneous, incomplete, or non-functional groups of files from
  • synchronizing It will be appreciated that other examples of associations between files may exist. If one of the group of associated files cannot be synchronized, a warning message may appear to alert a user at a node 10 that a related file is missing. Alternatively, the file management system may block the file from being shown, or even delete the file, as this file may contain or initiate an error at another node 10. The newly synchronized file may otherwise, or in addition, remain hidden until all associated files may be synchronized.
  • only one node 10 may modify a particular file at any given time, to reduce the likelihood that two nodes 10 will simultaneously operate upon a particular version of the file and attempt to synchronize different modified files.
  • a node 10 may be restricted to updating only the most recent version of the file on the server. If a node 10 accesses a particular file, the server's synchronization module 22 may indicate in its database that the file is "checked out" by the node 10. The node 10 that has checked out the file may be given the authority to designate a file as a master copy when the
  • synchronization module 22 synchronizes the file with the server.
  • the master copy designation may be saved on the database 17.
  • the node 10 may then check in the file to allow other nodes 10 to designate the file as a master copy.
  • the node 10 may save the file as a new version and store the version information in the database 17.
  • the version information which may comprise information indicating whether the file is a master copy, may be stored on the database 17, whereas the reference file itself, and any associated difference files (or differencing files), may be stored on the data store 16. If another node 10 attempts to access the file, the other node 10 may be provided the file, however, because the first node 10 had already checked out the file, any modifications made by the second node 10 will not be applied as differencing files for that file. The second node 10 may save its modifications to a new file, however.
  • the synchronization module 22 when the synchronization module 22 synchronizes a file on a node 10 with the corresponding file in the data store 16, the file may remain on the server. Similarly, when the synchronization module 22 updates a file on a node 10 with a newer version that is available on the data store 16 and delivered through the server, the older version of the file may not be deleted. The older version of the file may be retained and the update may be stored by way of storing one or more difference files that are associated with the file.
  • a reference file on node 10 is version one of a file which has since been modified in two successive iterations to yield modified versions two and three
  • the synchronization module 22 on node 10 provides the server node with both revisions in the form of two difference files or a single differencing file.
  • a first difference file may provide the differencing module on the server with the information necessary to construct the modified file corresponding to version two.
  • a second difference file may provide the differencing module on the server with the information necessary to construct version three of the file based on version two.
  • both the first version of the file and the difference files that enable the differencing module 20 to construct the second and third versions of the file are required to enable access to all three versions of the file.
  • the node may apply a single differencing file, corresponding to the difference between version one and version three, to obtain version three of the file.
  • access to version two of the file may not be available.
  • the synchronization module 22 on a node 10 other than the server may cause the server to update its data store 16. For example, if modifications to a reference file 50 were made on a node 10 that had checked out the file, the synchronization module 22 of that node may provide the server with the difference file that had been calculated by that node's differencing module 20. The server may store this difference file on its data store 16. The difference file may then be used by the differencing module of the server to construct the second version of the file 52.
  • the server may store the reference file on the data store 16 as well as each of the difference file updates. It may be noted that in this example, the reference file, as well as the difference files provided to the data store 16, are saved on the data store 16 of the server.
  • the reference file 50 may be uploaded to the data store 16 on the node acting as the server.
  • the synchronization module 22 of a node 10 not acting as the server then checks out the reference file 50. While the file is checked out, a second node 10, also not acting as a server, may access the reference file 50 on the data store 16 of the server by downloading the reference file. Once the second node 10 has finished modifying the file, the second node's synchronization module 22 may provide the modifications to the data store 16 on the server. To provide the modifications to the data store 16 on the server, the second node's differencing module 20 can calculate a difference file 54 locally on the second node based on the reference file 50 and the modified file 52.
  • the second node's synchronization module 22 may then provide the difference file 54 to the server to be stored on the data store. Since the reference file 50 was checked out by the first node 10, the file produced by the second node 10 may not be designated as a master copy. Hence, the difference file may be saved separately as B1 . The information relating to the file's version may be stored on the database of the server.
  • a second difference file 56 may be saved on the data store 16 of the server.
  • the first node's differencing module 20 may compute the difference file, which may then be provided to the data store 16 of the server by the synchronization module 22.
  • the difference file uploaded to the data store 16 of the server corresponds to the second master version.
  • the synchronization module 22 may provide a further difference file 58 to the data store 16 of the server as a master version of the file and save the corresponding version information on the database 17 of the server.
  • the first node 10 may then check in the file once the first node 10 has completed any modifications.
  • the synchronization module 22 may provide the resulting difference file, as outlined in the process explained above, to the data store 16 of the server.
  • This difference file 62 may be stored on the data store 16 of the server with the master copy designation.
  • the information relating to the master copy designation may be saved in the database 17 of the server.
  • the modifications may be uploaded to the data store 16 of the server in the form of a difference file 60 by the synchronization module 22 but may not be saved as a master copy.
  • the synchronization module 22 may provide a copy of the most recent difference file 64 computed by the differencing module 20 to the data store 16 of the server. Hence, in this example, at each new update of the reference file, a new difference file is provided to the data store 16 and no files are deleted.
  • a node 10 may log operations performed on files stored on its data store, for example to determine the history of file updates, particularly if the node 10 is a server for such files.
  • the operations performed on the data store 16 may be identified with, for example, a timestamp, identification of the node 10 (and/or its user), location of the node 10 (and/or its user), MAC address/computer ID, etc.
  • a node 10 may request a version of the file from the server that is not the most recent version of the file stored on the server.
  • the node 10 may also request a copy of the file that is more than one version behind the most recent version.
  • the server's differencing module may be operable to generate a checkpoint file.
  • a checkpoint file is a complete file that can be accessed by a node 10 without requiring the server to apply a difference file to a reference file.
  • a checkpoint file may be saved at predetermined intervals to reduce the number of computations that must be performed by the server's differencing module 20 if there are many versions of file.
  • the file management system may be operable to track the number of times a new version of a particular file has been saved.
  • the file management system may be also operable to save a checkpoint file based on other parameters, for example, the number of version changes, the date, time elapsed since the last checkpoint was saved, the amount of content that has changed between version updates, etc.
  • the file management system may also be operable to save checkpoint files at intervals that are based upon how often nodes 10 update files and how often nodes 10 request older versions of the file.
  • the server can transmit the checkpoint file to the node 10.
  • the server's differencing module 20 may be operable to compute the requested version of the file by applying a difference file to the most appropriate checkpoint file.
  • the tenth version of a checkpoint file may be provided to the node 10 by the server.
  • the server may calculate the eleventh version by applying a difference file to the tenth version.
  • the server may also, depending on the difference files that are stored, apply one or more difference file to the fifteenth version to obtain the tenth version.
  • the node's synchronization module 22 may be operable to provide the requested version of the file to the node 10 through the network 14.
  • the server may compute a differencing file mapping the differences between the twentieth version and the ninth version and transmit this difference file to the node 10.
  • the transmission may be completed more quickly.
  • the node's differencing module 20 may be operable to compute version nine of the file. By saving only a certain number of version files but saving enough difference files to enable the intermediate versions of the file to be calculated by the server's differencing module, the required amount of memory on the data store 16 of the server may also be reduced.
  • the differencing module may be operable to calculate relevant combinations of difference files in coordination with the checkpoint files.
  • the server saves a checkpoint file for every ten versions of updates, and there are a total of fifty five versions of a particular file, then forty five difference files may be required to provide access to each of the versions in between the checkpoints.
  • most nodes 10 will have a version that is no more than ten versions old and will want to update the version of the file to the most recent version. This reduces the number of required difference files (or differencing files) to ten.
  • the difference files enabling a particular number of past versions to be updated to the most recent version can be stored in order to reply to any node's request for an update more rapidly.
  • a node 10 may be operable to generate a synchronization list to request a plurality of files from the server. All files on the synchronization list may be updated as a group, or in priority, when there is access to the network 14.
  • the node 10 may update the files on the synchronization list when the user is away from the node 10 or not using the network connection, resulting in more bandwidth being available for synchronization processes.
  • some version of each file may need to be stored on the node 10, for example, a reference file.
  • the node's synchronization module 22 requests that the server 16 provide it with corresponding difference files for each such reference file.
  • FIG. 6 a method of a first node, which is not acting as a server, receiving an updated version of a file from a server is explained.
  • an operator of the node requests access to a file.
  • the synchronization module on the node determines, from the node's database 17, the version of the file in the node's data store 16. The node determines whether the file is on the synchronization list.
  • the node determines that the file is on the synchronization list.
  • the node requests an updated version from a server and provides the server with the version identifier of the file that exists on the node in 136.
  • the server determines whether the version identifier of the file on the server is more recent than the version on the node. Upon determining that the version on the server is more recent, the server provides the node with the one or more difference files required to generate an updated version of the file from the reference file on the node in 140.
  • the difference files are generated as described above. As explained above, the difference files may be generated before, or after, the update request from the node.
  • the node stores the difference file in memory and generates a modified file based on the difference file obtained from the server and an existing reference file on the node.
  • the node 10 may request, based on the synchronization list, an extensive list of files that are to be synchronized. To conserve memory on the node 10, and bandwidth over the network 14, the files on the synchronization list may be compressed by the node's compression module 24 and stored in a compressed format prior to transmission over the network 14. Similarly, the files transmitted by the server 16 to the node 10 may be compressed prior to
  • the time required for synchronization can be reduced. This can be particularly advantageous if network access is only available for a limited number of hours in a day.
  • the utility of the network 14 may be maximized while it is available.
  • the transfer since only one file is being transferred, the transfer may be interrupted and resumed in an intermittently available network without significant loss. Since only certain versions are saved as checkpoint versions and the other versions can be calculated based on difference files by the differencing module 20, the amount of space required from the data store 16 may be significantly reduced.
  • a difference file can be used for updating future files without requiring an extra computation step. This increases the efficiency of the synchronization system and reduces the load on both the server and the nodes 10.
  • the server may distribute updates to each of the nodes 10 in the form of difference files as a more recent version of the file is created, the number of difference files that must be calculated may be reduced. Moreover, since difference files are typically smaller than reference files, there may be a lower probability of file corruption during transmission.
  • Another advantage of the synchronization process of the current invention is that compression may be applied to a particular difference file, further reducing the quantity of data that must be transmitted over the network 14.
  • the file management system enables metadata tagging of files in the data store 16 of the server or locally on the node.
  • Metadata tags may be stored in a database 17 of the node, as well as in database of the server. By storing metadata tags on the database 17 of the node, the metadata tags may be used to perform searches during a temporary interruption of the network connection.
  • the database 17 of the node may provide metadata to the search module 26 of the node.
  • the search module 26 of the node provides search functionality to the nodes 10.
  • each file on a server may be tagged by the node 10, the server, or a search module 26, based on a class of the file.
  • Metadata searches may be performed by the search module on the server using metadata in the database 17 of the server.
  • the metadata may comprise a class.
  • Classes may be user-created or may be automatically created by the search module 26. Classes may exist for particular work sites, particular types of projects, particular employee types and/or particular file types, for example. Files may also be tagged based, for example, on the creator or editor of the file, the date that the file was created, the program used to create the file, the content in the file, the number of times that the file has been accessed, and particular information in the file. For example, in a mining operation, a certain file may be tagged as belonging to the class containing drill-hole data. Each of the files in this class may have a unique set of properties and the node's search module 26 may be operable to search for files based on their class.
  • Metadata tagging in accordance with the foregoing may optimally be applied in connection with data transmission over unreliable networks.
  • the class and tagging information may also be provided to the node 10. This may ensure that class information, as well as other metadata tags associated with the file, are available to a node's search module 26.
  • the node 10 may save the class and tagging data.
  • the server may also provide the tagging data of other files that have been tagged as being similar to the synchronized files.
  • the server may also provide a larger subset of the tagging metadata available on the data store 16 or may provide all metadata associated with the tagging to the node 10.
  • a user at a node not acting as a server 10 may apply metadata tags to search the entire body of files on the data store 16 or a subset of the files on the data store 16. For example, if the user is working at a node 10 that has downloaded the metadata tags for all the files on the data store 16 from a server, the user may search for all files of a specific class or all files tagged with particular information. For example, a user may wish to search information from all drill holes bored using tool steel bits in a particular area. A corresponding search may bring up all files in the class of drill holes bored using tool steel bits in the particular area.
  • the user may then select to have particularly relevant files incorporated into the synchronization list to enable the user to view the file and maintain the file in its most recent version. For example, if a file having an association other files is added to the synchronization list, all associated files may be similarly added to the synchronization list.
  • files created by a node 10 can be added to the synchronization list of files that must be synchronized with the server of those files. Since no copy of the file may exist on the data store 16 of the server, the server's synchronization module 22 may upload the file to the data store 16 during the synchronization process. If modifications are made to the file either at a node 10 or at some other node, the differencing module 20 on the server may incorporate updates into the data store 16.
  • Metadata may also comprise folder information that may be relevant to the contents of a particular file. For example, if an existing folder structure is uploaded into the data store 16, the server may create metadata from the folder names or other information associated with the folders being uploaded.
  • the file management system enables off-line synchronization.
  • a node 10 may determine the synchronization status of each file on the node 10 or particular files on the node 10.
  • the system may implement a file monitor 30.
  • the file monitor is operable to determine the difference between modified files and the most recent version of the file downloaded to the node 10 from the server. Since, as explained above, the node 10 stores the most recent copy of the file downloaded from the server as a reference file, the node's modification detection module 24 may compare the modified version of the file to the reference file.
  • the node's modification detection module 24 may provide a warning to the user that the file should be synchronized when access to a network 14 becomes available.
  • the file management system may further prioritize the synchronization of files that are most different from the version that had previously been synchronized with the server. For example, if the node's synchronization module 22 is set to synchronize two files, the file that has been most heavily modified compared with the file last accessed from the server will be synchronized first.
  • the user may also provide a manual priority ranking of which of the files on the synchronization list of node 10 should be synchronized first.
  • the priority ranking of the synchronization list may also be determined based on metadata tags or classes applied to the file. Synchronizing higher priority files first may ensure that the most high priority files are synchronized prior to an interruption in network access.
  • Files may be modified without the node's knowledge, for example, if the file comprises information gleaned during a drilling process monitored by a sensor, the file may be updated in the background on the node 10.
  • the node's modification detection module 24 may monitor for differences between the locally stored version of the most recent file during the last synchronization and the most recently updated file on the node 10.
  • the node's modification detection module 24 may be registered with the node's operating system in order to capture file changes from a plurality of programs and processes, similarly to a virus scanning program. All files that should be synchronized with the server may then be synchronized once the network 14 becomes available.
  • the file management system may be better suited for use with unreliable networks than past systems. This allows minimal data transfer and ensures that the files that should be synchronized are synchronized as soon as possible. Furthermore, by enabling a user to search for files on a node 10 when the node 10 is not acting as a server for those files, searches may be conducted in the absence of a network connection.
  • FIG. 7 an example of operation of the file manager is shown in one example implementation, in which various types of data relating to the preparation, construction and operation of a mine are shown. It shall be understood that this is but one example, and that numerous other example implementations, and processes related to this implementation, may be provided.
  • three types of data may be stored in files on a node acting as a server for those files.
  • the first type of data may be updated frequently.
  • blast- hole data 76, the ore control block model 82, and the short-term plan 90 may be updated on a daily basis.
  • the second type of data may be updated less frequently, for example, the mine design 84 may be updated on a weekly or monthly basis.
  • the third type of data may be updated infrequently, for example, the assay data 70, the drill hole data 72, the solids data 74, the block model 78, and the long-term plan 86 may be updated on a yearly basis.
  • Older versions of a file may not be replaced; however, they may be accessed, edited as saved as a new file or a new version. This ensures that the historical order of the files can always be retrieved from the data store 16.
  • the user When a user wishes to edit a document, the user must check out the document to make edits in the master copy. Other users may access the same document; however, changes made by these other users may not be saved as a newer version of the document. These changes may be saved as a side branch of the document, as is shown in FIG. 5. Only the user who has checked out the document may save the master version of the document.

Abstract

A system and method for synchronizing data files over a network is provided. The method comprises a first node establishing a connection with a second node. Stored on the first node are one or more data files, each of which is associated with a version identifier, and a synchronization list of data files that are to be synchronized. One or more corresponding data files, each being associated with a version number, are also stored on the second node. The first node determines, based on the version identifiers, whether a more recent version of each of the one or more data files on the first synchronization list exists on the second node. Upon determining that the second node comprises a more recent version, the first node obtains from the second node, a difference file to update the data file on the first node.

Description

SYSTEM AND METHOD FOR DATA SYNCHRONIZATION OVER A NETWORK
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority from U.S. Provisional Patent Application No. 61/555,999 filed November 4, 201 1 , wherein the contents of which are hereby incorporated by reference.
TECHNICAL FIELD
[0002] The following relates generally to data communication over a network.
BACKGROUND
[0003] In geographically remote locations, access to a reliable network may not be possible. For example, in developing countries or mining sites, the infrastructure required to provide a stable, continuous and high-bandwidth network connection may not be available. Access to a reliable network via portable infrastructure such as a satellite receiver, for example, can be hindered by weather or other environmental conditions. Network access may be intermittently available and may potentially be unavailable for unpredictable periods of time. Limitations and fluctuations in the bandwidth of the network may also affect the performance of a network while a connection is established.
[0004] In certain applications, such as mining, research, or prospecting, for example, it may be necessary to share information between devices located at a site and devices located remotely from that site, such as an office or server site. Exchanging or
synchronizing information between these devices may be difficult if the network connection has low bandwidth or is only intermittently available.
[0005] These issues can be exacerbated where transfer of relatively large files is required.
[0006] Certain disadvantages are apparent when using existing synchronization protocols. Some such protocols require a first device to communicate with a second device multiple times during synchronization. If the network is unreliable, the synchronization process might stall or fail. Additionally, some such protocols require time-consuming file processing steps to optimize data exchange to accomplish synchronization. Where the network is unreliable, the aggregate time required for communication and computation may be unreasonably long.
[0007] Many existing synchronization protocols also do not support compression. The file size of a compressed file is typically smaller than the file size of an uncompressed file. Therefore, some existing protocols depend even more heavily on maintaining a network connection between the devices. These may not be suitable for unreliable networks.
[0008] It is an object of the present invention to mitigate or obviate at least one of the above disadvantages.
SUMMARY
[0009] Provided herein is a method of synchronizing data files over a network. The method comprises a first node establishing a connection with a second node, the first node having stored thereon one or more data files, each being associated with a version identifier, and a first synchronization list of data files to be synchronized. The second node has stored thereon one or more corresponding data files, each being associated with a version identifier. The first node determines, based on the version identifiers, whether a more recent version of each of the one or more data files on the first synchronization list exists on the second node. Upon determining that the second node comprises a more recent version, the first node obtains, from the second node, a difference file to update the data file on the first node.
[0010] In an embodiment, the difference file represents the difference between the version of the data file on one node and the version of the data file on the other node. In another embodiment, at least two of a sequence of difference files are used to update the file on the first node; one of the at least two difference files representing the difference between the version of the data file on the second node and an intermediate version.
[0011] The synchronization list may comprise data files selected by a user. Data files associated with those selected by the user may also be included in the synchronization list.
[0012] The second node may further comprise a second synchronization list of data files to be synchronized. Upon establishing a connection with the first node, the second node may determine whether the first node comprises a more recent version of each of the files on the second synchronization list. Upon determining that the first node contains a more recent version, the second node obtains, from the first node, a difference file to update the data file on the second node.
[0013] A priority ranking may be with each of the files on the synchronization list, wherein the data files are synchronized according to the priority ranking. The priority ranking may be generated based on metadata associated with each of the data files. The priority ranking may be generated based on the magnitude of difference between the version identifier of the data file on the first node and the more recent version of the data file on the second node.
[0014] In an embodiment, a reference file for each of the one or more data files is also stored on the first node. A modification detection module on the first node determines the degree to which each of the data files on the first synchronization list differ with respect to the respective reference file. The priority ranking is generated based on the magnitude of difference between the data file and the reference file.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Embodiments will now be described by way of example only with reference to the appended drawings wherein:
[0016] FIG. 1 is a block diagram illustrating a system in accordance with the present invention;
[0017] FIG. 2 is a block diagram illustrating a node;
[0018] FIG. 3 is a flow diagram illustrating a process of a node updating a file on a server;
[0019] FIG. 4 is a flow diagram illustrating generating a difference file;
[0020] FIG. 5 is a flow diagram illustrating a version history of an example file;
[0021] FIG. 6 is a flow diagram illustrating an example process of a node obtaining, from a server, a more recent version of a file stored on the node; and
[0022] FIG. 7 is an example diagram illustrating various types of data and links therebetween relevant to a mining operation.
DETAILED DESCRIPTION OF THE DRAWINGS
[0023] Provided herein is a file management system. The file management system enables transfer and synchronization of files over a network to enable data communication between two or more devices. The network may be one that is unreliable or reliable and may exhibit characteristics comprising low bandwidth and/or low quality of service (QoS). Devices communicating with one another over the network may, for a particular
implementation, benefit from a relatively higher rate of transfer and/or level of reliability (e.g. QoS) than is otherwise possible given the network characteristics.
[0024] Turning to Fig. 1 , a plurality of nodes 10 are shown. A node 10 may be a computer device, such as a desktop computer, a laptop computer, a mobile device such as a smartphone, a network-enabled piece of industrial equipment (e.g. an automated drill), a network-enabled piece of sensing equipment (e.g. an aerial gravimeter), a rack-mount server, a cloud-based server, or any other network-enabled computing device. The node 10 may comprise, or be linked to, a processor 9 and a memory 8. The memory 8 may have stored thereon computer instructions which, when executed by the processor 9, provide the functionality of the file management system as described herein.
[0025] The node 10 may further comprise, or be linked to, a transceiver 15 for communicating with a network 14, for example, an intranet or the Internet. Further nodes 10, being substantially similar to the aforementioned node 10, can also be linked to the network. Each node 10 may be operable to communicate with one or more other nodes 10 via the network.
[0026] The node 10 may be user-controllable or automatically controlled by computer executable instructions.
[0027] Each node 10 may further comprise, or be linked to, a data store 16. The data store 16 is operable to have stored thereon at least one file. The at least one file may comprise at least one reference file, and at least one difference file associated with each reference file.
[0028] A reference file is a file that can be considered a complete file, in that if a node 10 provides the reference file to another node 10, the other node 10 can receive and read the reference file for the purposes of operating upon it (i.e., opening file, modifying the file, etc.).
[0029] A difference file is a file providing information sufficient to generate a file that can be operated upon when used in conjunction with a reference file. For example, a difference file may map the differences between a first file and a second file. For example, a node 10 having received a difference file and a reference file can recover, from the difference file, information relating to modifications that have been made relative to the reference file. The node 10, having the reference file and the difference file, can generate a modified file corresponding to the modifications made relative to the reference file and can operate upon the modified file.
[0030] A differencing file, which embodies the differences between two or more generations of file revisions, may also be used to generate a modified file based on a reference file. A differencing file is a subset of a difference file which can be generated by combining two or more in a sequence of difference files. [0031] As will be appreciated, if the amount of modification made to a file is relatively small, a difference file may be substantially smaller than the size of a reference file.
Transferring only a difference file, rather than the entire reference file between nodes 10 over the network 14 may substantially reduce the amount of data that must be transferred over the network 14, which may enable faster and more efficient synchronization of files.
[0032] In one aspect, one of the nodes 10 may be referred to as a "server" for the purposes of storing one or more particular reference files and associated difference files. In an arrangement with a first node connected to a second node over a network, the first node may act as a server for a first file or first group of files while the second server may act as a server for a second file or a second group of files. As such, the term "server" or "server node" identifies which node is acting as the server in a particular exchange. It will be understood that the server may act as a node other than the server in a different exchange.
[0033] In a distributed embodiment of the above, a plurality of nodes 10 may each be referred to as a server, wherein each node 10 may be a server of particular reference files and associated difference files. For example, a first reference file may be provided on a first node 10 which is the server of the first reference file, while a second reference file may be provided on a second node 10 which is the server of the second reference file, and so on for any number of reference files. A third node may act as a server for a third file but act as a node other than the server for the first and second files. As will be appreciated, any particular node 10 could also be the server of a plurality of such reference files.
[0034] The server may track and store data associated with files, for example, the version number of the file. Newer versions of a file may comprise incremental changes from an older version of the file. Version information controlled by the server may be accessed by the nodes 10. By centralizing storage of the version number for a particular file, the server is operable to determine whether each node 10 is operating upon the most recent version of that file.
[0035] In certain embodiments, at least one of the nodes 10 further comprises, or is linked to, a database 17. The database 17 may be used to store information relating to files stored on the data store 16. For example, a server may store on its database 17 a version number for each reference file (which may correspond to the number of associated difference files for that reference file) stored on its data store 16, as well as one or more annotations for each such reference file and/or difference file. These annotations may be implemented by metatags associated with such file. Nodes other than the server may also comprise or be linked to a database for storing information relating to files stored on their respective data stores 16, for example version numbers of such files.
[0036] Referring now to Fig. 2, each node 10 includes a differencing module 20 and a synchronization module 22. Node 10 may further comprise a modification detection module 28, a compression module 24, a search module 26 and a file monitor 30.
[0037] The differencing module 20 is operable to generate a difference file from a reference file and modifications made relative to the reference file, which may be embodied in a permanently or temporarily stored modified file. The modified file need not be stored on the date store at any time, though it may be stored in memory 8 while the node 10 is operating upon it. However, it will be appreciated that the modified file may also be stored on the data store in some embodiments. When provided with a difference file (or a differencing file, which is described below), the differencing module 20 is also operable to apply the difference file to the corresponding reference file (or a differencing file) to generate a modified file to be operated upon.
[0038] The synchronization module 22 may be operable to facilitate the synchronization of files between a node 10 and a server. The synchronization module 22 may cooperate with the node's transceiver 15 to enable a node 10 to communicate (receive and transmit) reference files, difference files and differencing files with another node 10.
[0039] The node's modification detection module 24 is operable to detect whether modifications have been made to a file on node 10 regardless of whether the node 10 is connected to a server over network 14.
[0040] The node 10 may comprise a compression module 24 to compress and decompress reference files and difference files to reduce the size of such files.
Compression of files may be provided to reduce the amount of data that must be sent over the network 14, thereby reducing the time required to send a file.
[0041] The node's search module 26 may be operable to perform searches for files regardless of the connectivity state of the node 10 to the network 14. For example, the node's search module 26 may enable searching for files on other nodes despite an unreliable connection using metadata, as will be further described herein. Typically, performing a search from a node 10 for information on a server requires a connection between the server and the node 10. Over an unreliable network, the connection may not always be available, which may cause long search times and timed-out searches. [0042] In operation, in one example, a node 10 (or a user of the node 10) may request to access a particular file. The node's synchronization module 22 determines whether a version of the requested file is stored on the node's data store 16.
[0043] Referring to FIG. 3, an example process for a node obtaining a file from a server, updating the file, and providing the updated version of the file to the server is shown. In 1 10, access to a file at the node is requested. At 1 12, the synchronization module on the node determines that file does not exist in the node's data store. The node further determines the server for the requested file and requests the file from that server in 1 14. Upon the server of the requested file receiving the request, the server accesses its database 17 to determine the version number of the requested file, and correspondingly accesses its data store 16 to retrieve the requested file. The synchronization module of the server, in 1 16, provides to the node the file that the node has requested, as well as the associated version information.
[0044] In 1 18, the node stores the file in the node's data store 16 as a reference file and stores the corresponding version information in the database 17 of the node. In 120, the node generates a modified file by making modifications to the reference file and stores, in the database 17, version information associated with the file. For example, 120 could comprise a user making an addition to the file and storing the modified file as a new version. In 122, differencing module 20 generates a difference file based on the modified file and the reference file stored in 1 18. In 124, the node provides the server with the difference file and associated version information. Upon obtaining the difference file, the server updates the reference file based on the difference file and saves the updated file as a new version in 126.
[0045] In an embodiment of 1 16, the server node's synchronization module 22 accesses its database 17 to determine the most recent version number of the requested file, and correspondingly accesses its data store 16 to retrieve the reference file and one or more associated difference files. The number of difference files may be derived from the version number. For example, a modified file generated from a reference file associated with four difference files may have a version number of five. The server then provides the reference file and the associated difference files to the node, where the differencing module 20 of the node generates a first modified file based on the reference file and the associated difference files and stores the first modified file in 1 18. Upon the node further modifying the file to produce a second modified file, the differencing module of the node would then generate an additional difference file which maps the differences between the first modified file and the second modified file. Similarly to the above description, the node would then provide the server with the difference file in 124. The server may then apply the difference file to a reference file or simply store the difference file to be shared with other nodes.
[0046] Alternatively, if a first node's data store 16 already has a version of the requested file, the first node's synchronization module 22 determines its corresponding version number, for example, by accessing the first node's database 17 which stores the version number for each file stored on the first node's data store 16. The first node's
synchronization module 22 transmits the version number to the node functioning as a server. The server node's synchronization module 22 accesses its database 17 to determine the version number corresponding to the requested file on the server's data store 16. If the version numbers of the first node and server node are identical, the server node need not transmit any file to the first node 10. If the version numbers differ, the server node's synchronization module 22 directs its differencing module 20 to generate a differencing file corresponding to the set of difference files for the intervening version numbers. The server's synchronization module 22 may transmit the differencing file 20 and version number to the node's synchronization module 22. The node's differencing module 20 then generates a modified file by applying the differencing file to its version of the file. The first node 10 stores the modified file in its data store 16 and stores the version number in its database 17. The first node 10 may overwrite its previous version of the requested file with the modified file.
[0047] In either of the above examples, the server node may store in its database 17 an indicator that the node has accessed the file.
[0048] The node 10 may operate upon a file to create a further modified file. Once the node 10 has finished modifying the file, resulting in a further modified file, the differencing module 20 constructs a difference file based on the further modified file and the (received) modified file. The node 10 stores the further modified file on its data store 16 and an updated version number on its database 17. The node 10 may overwrite the (received) modified file with the further modified file on its data store 16. The node's synchronization module 22 transmits the difference file to the server node's synchronization module 22. The server node's differencing module 20 may save the difference file to its data store 17 and update the version number corresponding to that file on its database 16.
[0049] The file management system may further enable a first node 10 and a second node 10 to synchronize a file through an intermediary server. Once modifications are made on a first node 10, the updated file may be provided to the second node 10 via the intermediary server. This may be done, for example, by the node 10 providing the difference file to a server in accordance with the foregoing, the server determining from its database other nodes that have accessed the file (in this case, the second node), the server 16 updating the version number of the file in accordance with the foregoing, the server requesting that the second node 10 provide it with its version number for the file, and the server 16 correspondingly generating a differencing file to enable the second node 10 to generate a modified file in accordance with the foregoing. As such, the difference file is provided to the second node in a push model, rather than as a response to a request by the second node.
[0050] Such a request from the node acting as a server to any one or more nodes may be provided as follows. The server's synchronization module 22 transmits a notification to other nodes 10 that have previously accessed the reference file associated with a received difference file (from a node 10). Any such other node's synchronization module 22 may correspondingly determine the version number of its corresponding copy of the file and send the version number to the server. The server's synchronization module 22 transmits to each node 10 a differencing file to update the node's file to the current version. In this way, any two or more of such nodes 10 may receive a distinct differencing file as they may not have been previously updated to the same version number, for example if any such nodes 10 were offline at the previous update. Such nodes' differencing module 20 may update such file on its data store, along with a corresponding version number on its database when it receives the differencing file.
[0051] The file management system also enables a node 10, or a user operating a node 10, to view the actual contents of the file, as well as the metatags associated with the state of the locally modified file and the state of the file on a server. With an unreliable network connection, the server may be out of contact for a period of time while the file is being modified at the node 10.
[0052] A reference file preferably corresponds to the file on the server at the time that the file was most recently synchronized. When a network connection between a first node and the server is unavailable, a difference file may be generated on the first node based on the reference file and a version of the file that was modified on the first node. For example, if the first node obtained a reference file from the server, the first node may modify the reference file to form a modified file. A difference file may then be generated on the first node based on the modified file and the reference file. The modification detection module 24 may be operable to determine whether the reference file differs from a more recent version of the file on the node 10. The modification detection module 24 may provide updated information regarding the state of the local file with respect to the state of a reference file in the temporary absence of a network connection. The modification detection module 24 may also provide an indication to a user that the reference file differs from a modified file, as is described below. Although the file on the server may have been modified since the last synchronization by a second node, the first node or a user at the first node may be able to determine, from the indication, which of the files have been modified at the node since the time of the last synchronization.
[0053] The node may be provided with a display, the display being operable to provide a displayed list of files on a node. The list of files may be provided with a visual indication that provides the user with an indication that the version of the file on the node differs from the reference file, based on information received from the modification detection module 24. The indication that the file on the node differs from the reference file may comprise, for example, a pair of arrows that point in opposite directions when the files differ. The indication may also comprise further details, for example, the date and time that the file was last modified as well as the date and time that the last synchronization occurred, or a percentage outlining what percent of the blocks in the file are identical. Other indications that provide the user with a sensory experience based on differences in the files may also be possible.
[0054] The file management system may further provide compression of files. In certain implementations, depending on the size of a file and the bandwidth and reliability of the network 14, the amount of time required to transfer the file between nodes 10 over the network 14 may be inconveniently or prohibitively long. To reduce the amount of data that must be transferred over the network 14, a node's compression module 24 may compress the file that is being transmitted by applying a compression algorithm. The compression algorithm may comprise, for example, VCDIFF, another format for encoding compressed and/or differencing data, or any appropriate compression algorithms for the types of files being transmitted. By compressing the file, the size of the file may be reduced and, consequently, the time required to transfer the file over the network 14 may be reduced.
[0055] Referring now to FIG. 4, a method of generating a difference file is now provided. In step 36, the differencing module 20 segments a reference file into a plurality of blocks. The size of the blocks may be determined based on preconfigured segmenting parameters which the differencing module 20 may adaptively adjust. Examples of block sizes may be 4092 bytes, 8192 bytes, etc. The differencing module 20 may compute a hash of each block, and assign an identifier (e.g. a number) to each of the blocks in step 38. The differencing module 20 may segment the corresponding modified file in step 40, assign an identifier (e.g. a number) and compute the hash of each of its blocks in step 42. The hash of the blocks of the modified file may then be compared with the hash of the blocks of the reference file in step 44. The differencing module 20 determines which blocks have been modified between the modified file and the reference file in step 46. The differencing module 20 may generate a difference file comprising the modified blocks and modified block identifiers in step 48. It will be appreciated that the differencing module 20 may alternatively compare the blocks without hashing. It will also be appreciated that certain of the foregoing steps can be performed in different sequences without an affect on functionality.
[0056] The differencing module 20 may segment blocks by setting markers based on the contents of the file, enabling a modified file to be compared to a reference file even if it comprises significant rearrangement. For example, each block may be hashed. The hashes of one file may then be compared with the hashes of another file to determine which blocks are identical and which blocks must be transmitted in a difference file over the network 14, to synchronize the files.
[0057] Preferably, the hashes are compared locally by a node's differencing module 20 rather than at a server, however the hashes may be transmitted to the server for comparison by its differencing module 20. Since only those blocks that have been changed may need to be transferred over the network 14, the generated difference file may comprise the data located in these blocks.
[0058] To further overcome the drawbacks of existing difference file-based
synchronization methods, and to increase the speed and reliability of the file management system operating over the unreliable network 14, the differencing module 20 may generate and cache the difference file locally before the synchronization process is initiated. For example, if the differencing module 20 generates the difference file on a node 10, the difference file may be cached on the node's memory 8 prior to the synchronization process being initiated.
[0059] In other words, instead of generating the difference files at the time that a network connection is re-established with the server, the difference files may be generated by the node prior to re-establishing the network connection. For example, the node may generate the difference files in the background while performing other tasks. Once the network connection is re-established, the difference files have been generated in full or in part, expediting transmission over the network. By distributing the computational task of generating a difference file over a longer period of time, including while a network connection is unavailable, the computational steps required once the network connection is reestablished may be significantly lower than if the difference files had been generated at the time that the network is re-established. Further, by having the difference files ready to synchronize in advance of the network connection being re-established, the synchronization process may be completed more quickly, with a lower risk of again losing the connection during synchronization, and with less interruption to other activities that require use of the node's processor and must take place while the network is established.
[0060] The differencing module 20 may also generate a difference file as the modifications to the file are taking place.
[0061] If the network 14 is unreliable, for example the network connection between the node 10 and the server is lost during synchronization, the synchronization module 22 may apply a break and resume transfer algorithm to continue synchronization when the network connection is re-established. The break and resume transfer algorithm may be any algorithm enabling a file to be transferred where it has previously not been transferred or only partially transferred.
[0062] Additionally, some files may be associated with other files. For example, an executable program may require an input from a spreadsheet. If the executable file is synchronized but the spreadsheet is not, the executable file may not have access to the required input value. As such, it may be advantageous to define groups of files or associations between files to promote the synchronization of these files together. For example, a first file may comprise an executable that retrieves data from a comma separated value (.csv) file to perform a pre-determined calculation. If the node's synchronization module 22 is set to synchronize the executable file, all related data, including the .csv file, may also be synchronized. As the executable file may not be of use without the most recent .csv file, associating the .csv file to the spreadsheet for the purposes of the synchronization process prevents erroneous, incomplete, or non-functional groups of files from
synchronizing. It will be appreciated that other examples of associations between files may exist. If one of the group of associated files cannot be synchronized, a warning message may appear to alert a user at a node 10 that a related file is missing. Alternatively, the file management system may block the file from being shown, or even delete the file, as this file may contain or initiate an error at another node 10. The newly synchronized file may otherwise, or in addition, remain hidden until all associated files may be synchronized.
[0063] Preferably, only one node 10 may modify a particular file at any given time, to reduce the likelihood that two nodes 10 will simultaneously operate upon a particular version of the file and attempt to synchronize different modified files. Thus, a node 10 may be restricted to updating only the most recent version of the file on the server. If a node 10 accesses a particular file, the server's synchronization module 22 may indicate in its database that the file is "checked out" by the node 10. The node 10 that has checked out the file may be given the authority to designate a file as a master copy when the
synchronization module 22 synchronizes the file with the server. The master copy designation may be saved on the database 17. The node 10 may then check in the file to allow other nodes 10 to designate the file as a master copy. The node 10 may save the file as a new version and store the version information in the database 17.
[0064] The version information, which may comprise information indicating whether the file is a master copy, may be stored on the database 17, whereas the reference file itself, and any associated difference files (or differencing files), may be stored on the data store 16. If another node 10 attempts to access the file, the other node 10 may be provided the file, however, because the first node 10 had already checked out the file, any modifications made by the second node 10 will not be applied as differencing files for that file. The second node 10 may save its modifications to a new file, however.
[0065] In one embodiment, when the synchronization module 22 synchronizes a file on a node 10 with the corresponding file in the data store 16, the file may remain on the server. Similarly, when the synchronization module 22 updates a file on a node 10 with a newer version that is available on the data store 16 and delivered through the server, the older version of the file may not be deleted. The older version of the file may be retained and the update may be stored by way of storing one or more difference files that are associated with the file.
[0066] In an example, if a reference file on node 10 is version one of a file which has since been modified in two successive iterations to yield modified versions two and three, the synchronization module 22 on node 10 provides the server node with both revisions in the form of two difference files or a single differencing file. A first difference file may provide the differencing module on the server with the information necessary to construct the modified file corresponding to version two. A second difference file may provide the differencing module on the server with the information necessary to construct version three of the file based on version two. In this embodiment, both the first version of the file and the difference files that enable the differencing module 20 to construct the second and third versions of the file are required to enable access to all three versions of the file.
Alternatively, the node may apply a single differencing file, corresponding to the difference between version one and version three, to obtain version three of the file. In the alternative embodiment, access to version two of the file may not be available. [0067] Turning to FIG. 5, the synchronization module 22 on a node 10 other than the server may cause the server to update its data store 16. For example, if modifications to a reference file 50 were made on a node 10 that had checked out the file, the synchronization module 22 of that node may provide the server with the difference file that had been calculated by that node's differencing module 20. The server may store this difference file on its data store 16. The difference file may then be used by the differencing module of the server to construct the second version of the file 52. Alternatively, the server may store the reference file on the data store 16 as well as each of the difference file updates. It may be noted that in this example, the reference file, as well as the difference files provided to the data store 16, are saved on the data store 16 of the server.
[0068] For example, the reference file 50 may be uploaded to the data store 16 on the node acting as the server. The synchronization module 22 of a node 10 not acting as the server then checks out the reference file 50. While the file is checked out, a second node 10, also not acting as a server, may access the reference file 50 on the data store 16 of the server by downloading the reference file. Once the second node 10 has finished modifying the file, the second node's synchronization module 22 may provide the modifications to the data store 16 on the server. To provide the modifications to the data store 16 on the server, the second node's differencing module 20 can calculate a difference file 54 locally on the second node based on the reference file 50 and the modified file 52. The second node's synchronization module 22 may then provide the difference file 54 to the server to be stored on the data store. Since the reference file 50 was checked out by the first node 10, the file produced by the second node 10 may not be designated as a master copy. Hence, the difference file may be saved separately as B1 . The information relating to the file's version may be stored on the database of the server.
[0069] If the second node 10 later performs further modifications to the file, a second difference file 56 may be saved on the data store 16 of the server. Once the first node 10 has finished modifying the file, the first node's differencing module 20 may compute the difference file, which may then be provided to the data store 16 of the server by the synchronization module 22. As the file was checked out by the first node 10, the difference file uploaded to the data store 16 of the server corresponds to the second master version. Similarly, if the first node 10 then made a further modification to the file, the synchronization module 22 may provide a further difference file 58 to the data store 16 of the server as a master version of the file and save the corresponding version information on the database 17 of the server. The first node 10 may then check in the file once the first node 10 has completed any modifications. [0070] If, for example, a third node 10 then checks out and modifies the file, the synchronization module 22 may provide the resulting difference file, as outlined in the process explained above, to the data store 16 of the server. This difference file 62 may be stored on the data store 16 of the server with the master copy designation. The information relating to the master copy designation may be saved in the database 17 of the server. However, if while the third node 10 had checked out the file, the first made further modifications to the file corresponding to difference file 58, the modifications may be uploaded to the data store 16 of the server in the form of a difference file 60 by the synchronization module 22 but may not be saved as a master copy. If the first node 10 then modified the file further, the synchronization module 22 may provide a copy of the most recent difference file 64 computed by the differencing module 20 to the data store 16 of the server. Hence, in this example, at each new update of the reference file, a new difference file is provided to the data store 16 and no files are deleted.
[0071] A node 10 may log operations performed on files stored on its data store, for example to determine the history of file updates, particularly if the node 10 is a server for such files. The operations performed on the data store 16 may be identified with, for example, a timestamp, identification of the node 10 (and/or its user), location of the node 10 (and/or its user), MAC address/computer ID, etc.
[0072] It may also be possible for a node 10 to request a version of the file from the server that is not the most recent version of the file stored on the server. The node 10 may also request a copy of the file that is more than one version behind the most recent version. To enable a node 10 to access older versions of the file as well as update the file from an older version to the most recent version, the server's differencing module may be operable to generate a checkpoint file. A checkpoint file is a complete file that can be accessed by a node 10 without requiring the server to apply a difference file to a reference file. A checkpoint file may be saved at predetermined intervals to reduce the number of computations that must be performed by the server's differencing module 20 if there are many versions of file. The file management system may be operable to track the number of times a new version of a particular file has been saved. The file management system may be also operable to save a checkpoint file based on other parameters, for example, the number of version changes, the date, time elapsed since the last checkpoint was saved, the amount of content that has changed between version updates, etc.
[0073] The file management system may also be operable to save checkpoint files at intervals that are based upon how often nodes 10 update files and how often nodes 10 request older versions of the file. In the case that a node 10 not acting as a server is requesting a version of the file that is identical to the checkpoint file, the server can transmit the checkpoint file to the node 10. If the node 10 already has another version of the file, it may also be possible to transmit a difference file that directly maps the difference between the file currently at the node 10 and the file that the node 10 is requesting. If, however, the node 10 is requesting a version of the file that is not identical to the checkpoint file, the server's differencing module 20 may be operable to compute the requested version of the file by applying a difference file to the most appropriate checkpoint file.
[0074] For example, if node 10 requests from the server the tenth version of a file that is currently at its twentieth version, and checkpoint files are saved for every fifth version, the tenth version of a checkpoint file may be provided to the node 10 by the server. If the node 10 requests the eleventh version of the same file, the server may calculate the eleventh version by applying a difference file to the tenth version. The server may also, depending on the difference files that are stored, apply one or more difference file to the fifteenth version to obtain the tenth version. The node's synchronization module 22 may be operable to provide the requested version of the file to the node 10 through the network 14. Alternatively, if the node 10 currently has access to the twentieth version but would like to access the ninth version, the server may compute a differencing file mapping the differences between the twentieth version and the ninth version and transmit this difference file to the node 10.
[0075] By transmitting the differencing file rather than the entire version nine of the file, the transmission may be completed more quickly. Once the node 10 receives the difference file, the node's differencing module 20 may be operable to compute version nine of the file. By saving only a certain number of version files but saving enough difference files to enable the intermediate versions of the file to be calculated by the server's differencing module, the required amount of memory on the data store 16 of the server may also be reduced.
[0076] To expedite the process of providing files that are more than one version behind the most recent version, the differencing module may be operable to calculate relevant combinations of difference files in coordination with the checkpoint files. Considering an example where the server saves a checkpoint file for every ten versions of updates, and there are a total of fifty five versions of a particular file, then forty five difference files may be required to provide access to each of the versions in between the checkpoints. To conserve space in the data store, it may be assumed that most nodes 10 will have a version that is no more than ten versions old and will want to update the version of the file to the most recent version. This reduces the number of required difference files (or differencing files) to ten. In situations where the difference files are very large or space on the data store 16 may be limited, the difference files enabling a particular number of past versions to be updated to the most recent version can be stored in order to reply to any node's request for an update more rapidly.
[0077] In a further aspect, a node 10 may be operable to generate a synchronization list to request a plurality of files from the server. All files on the synchronization list may be updated as a group, or in priority, when there is access to the network 14. In one example, where the node 10 is operated by a user, the node 10 may update the files on the synchronization list when the user is away from the node 10 or not using the network connection, resulting in more bandwidth being available for synchronization processes. In order to synchronize each of the files in the synchronization list without transferring the entire file, some version of each file may need to be stored on the node 10, for example, a reference file. Thus, the node's synchronization module 22 requests that the server 16 provide it with corresponding difference files for each such reference file.
[0078] Referring to FIG. 6, a method of a first node, which is not acting as a server, receiving an updated version of a file from a server is explained. In 130, an operator of the node requests access to a file. At 132, the synchronization module on the node determines, from the node's database 17, the version of the file in the node's data store 16. The node determines whether the file is on the synchronization list. At 134 the node determines that the file is on the synchronization list. Upon determining that the file is on the synchronization list, the node requests an updated version from a server and provides the server with the version identifier of the file that exists on the node in 136. In 138, the server determines whether the version identifier of the file on the server is more recent than the version on the node. Upon determining that the version on the server is more recent, the server provides the node with the one or more difference files required to generate an updated version of the file from the reference file on the node in 140. The difference files are generated as described above. As explained above, the difference files may be generated before, or after, the update request from the node.
[0079] In 142, the node stores the difference file in memory and generates a modified file based on the difference file obtained from the server and an existing reference file on the node.
[0080] Although a request for updating a single file is outlined in FIG. 6, the node 10 may request, based on the synchronization list, an extensive list of files that are to be synchronized. To conserve memory on the node 10, and bandwidth over the network 14, the files on the synchronization list may be compressed by the node's compression module 24 and stored in a compressed format prior to transmission over the network 14. Similarly, the files transmitted by the server 16 to the node 10 may be compressed prior to
transmission.
[0081] By performing differencing computations and file compression computations prior to any synchronization process, the time required for synchronization can be reduced. This can be particularly advantageous if network access is only available for a limited number of hours in a day. By performing the differencing calculations before and after the data transfer over the network 14, the utility of the network 14 may be maximized while it is available. Furthermore, as explained above, since only one file is being transferred, the transfer may be interrupted and resumed in an intermittently available network without significant loss. Since only certain versions are saved as checkpoint versions and the other versions can be calculated based on difference files by the differencing module 20, the amount of space required from the data store 16 may be significantly reduced.
[0082] By saving difference calculations, a difference file can be used for updating future files without requiring an extra computation step. This increases the efficiency of the synchronization system and reduces the load on both the server and the nodes 10.
Moreover, since the server may distribute updates to each of the nodes 10 in the form of difference files as a more recent version of the file is created, the number of difference files that must be calculated may be reduced. Moreover, since difference files are typically smaller than reference files, there may be a lower probability of file corruption during transmission.
[0083] Another advantage of the synchronization process of the current invention is that compression may be applied to a particular difference file, further reducing the quantity of data that must be transmitted over the network 14.
[0084] In another aspect, the file management system enables metadata tagging of files in the data store 16 of the server or locally on the node. Metadata tags may be stored in a database 17 of the node, as well as in database of the server. By storing metadata tags on the database 17 of the node, the metadata tags may be used to perform searches during a temporary interruption of the network connection. The database 17 of the node may provide metadata to the search module 26 of the node. The search module 26 of the node provides search functionality to the nodes 10. In a relational database, each file on a server may be tagged by the node 10, the server, or a search module 26, based on a class of the file. Similarly, metadata searches may be performed by the search module on the server using metadata in the database 17 of the server. The metadata may comprise a class. [0085] Classes may be user-created or may be automatically created by the search module 26. Classes may exist for particular work sites, particular types of projects, particular employee types and/or particular file types, for example. Files may also be tagged based, for example, on the creator or editor of the file, the date that the file was created, the program used to create the file, the content in the file, the number of times that the file has been accessed, and particular information in the file. For example, in a mining operation, a certain file may be tagged as belonging to the class containing drill-hole data. Each of the files in this class may have a unique set of properties and the node's search module 26 may be operable to search for files based on their class.
[0086] Metadata tagging in accordance with the foregoing may optimally be applied in connection with data transmission over unreliable networks. For example, when a server's synchronization module 22 provides an updated file to a node 10, the class and tagging information may also be provided to the node 10. This may ensure that class information, as well as other metadata tags associated with the file, are available to a node's search module 26. The node 10 may save the class and tagging data. The server may also provide the tagging data of other files that have been tagged as being similar to the synchronized files. The server may also provide a larger subset of the tagging metadata available on the data store 16 or may provide all metadata associated with the tagging to the node 10.
[0087] Depending on the amount of metadata downloaded from the server, a user at a node not acting as a server 10 may apply metadata tags to search the entire body of files on the data store 16 or a subset of the files on the data store 16. For example, if the user is working at a node 10 that has downloaded the metadata tags for all the files on the data store 16 from a server, the user may search for all files of a specific class or all files tagged with particular information. For example, a user may wish to search information from all drill holes bored using tool steel bits in a particular area. A corresponding search may bring up all files in the class of drill holes bored using tool steel bits in the particular area. The user may then select to have particularly relevant files incorporated into the synchronization list to enable the user to view the file and maintain the file in its most recent version. For example, if a file having an association other files is added to the synchronization list, all associated files may be similarly added to the synchronization list.
[0088] Further, files created by a node 10, which is not acting as the server for those files, can be added to the synchronization list of files that must be synchronized with the server of those files. Since no copy of the file may exist on the data store 16 of the server, the server's synchronization module 22 may upload the file to the data store 16 during the synchronization process. If modifications are made to the file either at a node 10 or at some other node, the differencing module 20 on the server may incorporate updates into the data store 16.
[0089] To assist with tagging of documents, a specific template with relevant metadata may be recommended for each type of file. The user may combine templates as well as add or remove new tags and classes to optimize the metadata such that the file can easily be found in future searches. Metadata may also comprise folder information that may be relevant to the contents of a particular file. For example, if an existing folder structure is uploaded into the data store 16, the server may create metadata from the folder names or other information associated with the folders being uploaded.
[0090] In a further aspect, the file management system enables off-line synchronization. A node 10 may determine the synchronization status of each file on the node 10 or particular files on the node 10. In order to determine how much a file has been modified, the system may implement a file monitor 30. The file monitor is operable to determine the difference between modified files and the most recent version of the file downloaded to the node 10 from the server. Since, as explained above, the node 10 stores the most recent copy of the file downloaded from the server as a reference file, the node's modification detection module 24 may compare the modified version of the file to the reference file. If the file has not been synchronized with the server for a pre-determined period of time or if the differences between the reference file and the modified file are greater than a certain threshold, the node's modification detection module 24 may provide a warning to the user that the file should be synchronized when access to a network 14 becomes available.
[0091] The file management system may further prioritize the synchronization of files that are most different from the version that had previously been synchronized with the server. For example, if the node's synchronization module 22 is set to synchronize two files, the file that has been most heavily modified compared with the file last accessed from the server will be synchronized first. The user may also provide a manual priority ranking of which of the files on the synchronization list of node 10 should be synchronized first. The priority ranking of the synchronization list may also be determined based on metadata tags or classes applied to the file. Synchronizing higher priority files first may ensure that the most high priority files are synchronized prior to an interruption in network access.
[0092] Files may be modified without the node's knowledge, for example, if the file comprises information gleaned during a drilling process monitored by a sensor, the file may be updated in the background on the node 10. To notify the node 10 that the file must be synchronized with a server to provide the relevant difference file to the server, the node's modification detection module 24 may monitor for differences between the locally stored version of the most recent file during the last synchronization and the most recently updated file on the node 10. The node's modification detection module 24 may be registered with the node's operating system in order to capture file changes from a plurality of programs and processes, similarly to a virus scanning program. All files that should be synchronized with the server may then be synchronized once the network 14 becomes available.
[0093] By coupling the caching of server files with the ability to search all files on the server based on locally stored metadata, as well as the ability to monitor for file changes, the file management system may be better suited for use with unreliable networks than past systems. This allows minimal data transfer and ensures that the files that should be synchronized are synchronized as soon as possible. Furthermore, by enabling a user to search for files on a node 10 when the node 10 is not acting as a server for those files, searches may be conducted in the absence of a network connection.
[0094] Referring to FIG. 7, an example of operation of the file manager is shown in one example implementation, in which various types of data relating to the preparation, construction and operation of a mine are shown. It shall be understood that this is but one example, and that numerous other example implementations, and processes related to this implementation, may be provided.
[0095] In this example, three types of data may be stored in files on a node acting as a server for those files. The first type of data may be updated frequently. For example, blast- hole data 76, the ore control block model 82, and the short-term plan 90 may be updated on a daily basis. The second type of data may be updated less frequently, for example, the mine design 84 may be updated on a weekly or monthly basis. The third type of data may be updated infrequently, for example, the assay data 70, the drill hole data 72, the solids data 74, the block model 78, and the long-term plan 86 may be updated on a yearly basis.
[0096] For example, within a working group there may be many versions of a given file that are saved. Even for an individual user, there may be multiple iterations of a file in which each version is stored. At some point the work is "complete enough" to share with a broader audience, at which point the current version is "published" on the file's server, enabling other nodes to access the published file. This is an important concept in the mining industry in particular, as a lot is at stake when data is published, and it can often only be done by people with specific certifications.
[0097] Older versions of a file may not be replaced; however, they may be accessed, edited as saved as a new file or a new version. This ensures that the historical order of the files can always be retrieved from the data store 16. When a user wishes to edit a document, the user must check out the document to make edits in the master copy. Other users may access the same document; however, changes made by these other users may not be saved as a newer version of the document. These changes may be saved as a side branch of the document, as is shown in FIG. 5. Only the user who has checked out the document may save the master version of the document.
[0098] Although the above has been described with reference to certain specific example embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the scope of the claims appended hereto.

Claims

CLAIMS:
1 . A method of synchronizing data files over a network, the method comprising:
a first node establishing a connection with a second node, the first node having stored thereon:
one or more data files, each being associated with a version identifier, and a first synchronization list of data files to be synchronized;
the second node having stored thereon one or more corresponding data files, each being associated with a version identifier;
the first node determining, based on the version identifiers, whether a more recent version of each of the one or more data files on the first synchronization list exists on the second node; and
upon determining that the second node comprises a more recent version, the first node obtaining, from the second node, a difference file to update the data file on the first node.
2. The method of claim 1 wherein the synchronization list comprises data files selected by a user.
3. The method of claim 2 wherein data files associated with those selected by the user are also included in the synchronization list.
4. The method of claim 1 , wherein:
the second node further comprises:
a second synchronization list of data files to be synchronized; and wherein the second node, upon establishing a connection with the first node, determines whether the first node comprises a more recent version of each of the files on the second synchronization list; and
upon determining that the first node contains a more recent version, the second node obtains, from the first node, a difference file to update the data file on the second node.
5. The method of claim 1 , further comprising associating a priority ranking with each of the files on the synchronization list, wherein the data files are synchronized according to the priority ranking.
6. The method of claim 5 wherein the priority ranking is generated based on metadata associated with each of the data files.
7. The method of claim 5 wherein the priority ranking is generated based on the magnitude of difference between the version identifier of the data file on the first node and the more recent version of the data file on the second node.
8. The method of claim 5 wherein:
a reference file for each of the one or more data files is also stored on the first node; a modification detection module on the first node determines the degree to which each of the data files on the first synchronization list differ with respect to the respective reference file; and
the priority ranking is generated based on the magnitude of difference between the data file and the reference file.
9. The method of claim 1 wherein the difference file represents the difference between the version of the data file on one node and the version of the data file on the other node.
10. The method of claim 1 wherein at least two of a sequence of difference files are used to update the file on the first node; one of the at least two difference files representing the difference between the version of the data file on the second node and an intermediate version.
1 1 . A system for synchronizing data files over a network, the system comprising:
a first node comprising:
a data store operable to store data files;
a database comprising operable to store a version identifier associated with each of the data files; and
a synchronization list of data files to be synchronized;
a second node comprising: one or more corresponding data files, each being associated with a version identifier;
wherein, upon the first node establishing a connection with the second node, the first node being operable to determine, based on the version identifiers, whether a more recent
version of each of the one or more data files on the first synchronization list exists on the second node; and
upon determining that the second node comprises a more recent version, the first node being operable to obtain, from the second node, a difference file to update the data file on the first node.
12. The system of claim 1 1 wherein the synchronization list comprises data files selected by a user.
13. The system of claim 12 wherein any data files associated with those selected by the user are also included in the synchronization list.
14. The system of claim 1 1 , wherein:
the second node further comprises:
a second synchronization list of data files to be synchronized; and wherein the second node, upon establishing a connection with the first node, is operable to determine whether the first node comprises a more recent version of each of the files on the second synchronization list; and
upon determining that the first node contains a more recent version, the second node being operable to obtain, from the first node, a difference file to update the data file on the second node.
15. The system of claim 1 1 , wherein a priority ranking is associated with each of the data files on the synchronization list and the data files are synchronized according to the priority ranking.
16. The system of claim 15 wherein the priority ranking is generated based on metadata associated with each of the data files.
17. The system of claim 15 wherein the priority ranking is generated based on the magnitude of difference between the version identifier of the data file on the first node and the more recent version of the data file on the second node.
18. The system of claim 15 wherein:
the data store further comprises a reference file for each of the one or more data files stored thereon;
the first node further comprises a modification detection module operable to determine the degree to which each of the data files on the first synchronization list differ with respect to the respective reference file; and
the priority ranking is generated based on the magnitude of difference between the data file and the reference file.
19. The system of claim 1 1 wherein the difference file represents the difference between the version of the data file on one node and the version of the data file on the other node.
20. The system of claim 1 1 wherein at least two of a sequence of difference files are used to update the file on the first node; one of the at least two difference files representing the difference between the version of the data file on the second node and an intermediate version.
PCT/CA2012/050784 2011-11-04 2012-11-05 System and method for data synchronization over a network WO2013071428A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2012339532A AU2012339532B2 (en) 2011-11-04 2012-11-05 System and method for data communication over a network
CN201280065691.2A CN104272649A (en) 2011-11-04 2012-11-05 System and method for data communication over a network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161555999P 2011-11-04 2011-11-04
US61/555,999 2011-11-04
CA2769773 2012-02-28
CA2769773A CA2769773C (en) 2011-11-04 2012-02-28 System and method for data communication over a network

Publications (2)

Publication Number Publication Date
WO2013071428A1 true WO2013071428A1 (en) 2013-05-23
WO2013071428A8 WO2013071428A8 (en) 2013-10-31

Family

ID=48222505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2012/050784 WO2013071428A1 (en) 2011-11-04 2012-11-05 System and method for data synchronization over a network

Country Status (4)

Country Link
CN (1) CN104272649A (en)
AU (1) AU2012339532B2 (en)
CA (1) CA2769773C (en)
WO (1) WO2013071428A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017048409A1 (en) * 2015-09-15 2017-03-23 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US20170344594A1 (en) * 2016-05-27 2017-11-30 Cisco Technology, Inc. Delta database synchronization
CN112714149A (en) * 2020-11-27 2021-04-27 北京飞讯数码科技有限公司 Data synchronization method and device, computer equipment and storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105991685B (en) * 2014-11-07 2019-06-25 天地融科技股份有限公司 Data-updating method and system
CN105279100A (en) * 2015-11-04 2016-01-27 杭州华为数字技术有限公司 Linked clone parent roll updating method and device
CN106372199B (en) * 2016-08-31 2019-07-05 镇江乐游网络科技有限公司 A kind of multi version file management system supported based on metadata
CN107172169A (en) * 2017-05-27 2017-09-15 广东欧珀移动通信有限公司 Method of data synchronization, device, server and storage medium
US10402311B2 (en) * 2017-06-29 2019-09-03 Microsoft Technology Licensing, Llc Code review rebase diffing
CN109308272A (en) * 2017-07-28 2019-02-05 同星科技股份有限公司 The method of peripheral unit and the data memory device of controllable peripheral unit are controlled by data memory device
CN108121804B (en) * 2017-12-22 2020-06-05 百度在线网络技术(北京)有限公司 Cross-region distributed data storage method, device, terminal and storage medium
CN110636090B (en) * 2018-06-22 2022-09-20 北京东土科技股份有限公司 Data synchronization method and device under narrow bandwidth condition
CN109218447B (en) * 2018-10-29 2021-09-17 中国建设银行股份有限公司 Media file distribution method and file distribution platform
CN111090835B (en) * 2019-12-06 2022-04-19 支付宝(杭州)信息技术有限公司 Method and device for constructing file derivative graph
CN111259072B (en) * 2020-01-08 2023-11-14 广州虎牙科技有限公司 Data synchronization method, device, electronic equipment and computer readable storage medium
CN113094443A (en) * 2021-05-21 2021-07-09 珠海金山网络游戏科技有限公司 Data synchronization method and device
CN114124928B (en) * 2021-09-27 2023-07-14 苏州浪潮智能科技有限公司 Method, device and system for quickly synchronizing files between devices

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2356269A (en) * 1999-06-17 2001-05-16 Ibm Multiple links to versions of a source file in a distributed computer environment
US20010048728A1 (en) * 2000-02-02 2001-12-06 Luosheng Peng Apparatus and methods for providing data synchronization by facilitating data synchronization system design
US20060161516A1 (en) * 2005-01-14 2006-07-20 Microsoft Corporation Method and system for synchronizing multiple user revisions to a shared object
US20070186069A1 (en) * 2004-08-10 2007-08-09 Moir Mark S Coordinating Synchronization Mechanisms using Transactional Memory
WO2011109049A1 (en) * 2010-03-04 2011-09-09 Alibaba Group Holding Limited Method and apparatus of backing-up subversion repository

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1249597C (en) * 2002-09-03 2006-04-05 鸿富锦精密工业(深圳)有限公司 Synchronous system in distributed files and method
CN1261877C (en) * 2002-10-11 2006-06-28 鸿富锦精密工业(深圳)有限公司 Multi-node file syn chronizing system and method
CN1756108A (en) * 2004-09-29 2006-04-05 华为技术有限公司 Master/backup system data synchronizing method
CN101142573A (en) * 2004-10-25 2008-03-12 恩鲍尔技术公司 System and method for global data synchronization
US8185495B2 (en) * 2008-02-01 2012-05-22 Microsoft Corporation Representation of qualitative object changes in a knowledge based framework for a multi-master synchronization environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2356269A (en) * 1999-06-17 2001-05-16 Ibm Multiple links to versions of a source file in a distributed computer environment
US20010048728A1 (en) * 2000-02-02 2001-12-06 Luosheng Peng Apparatus and methods for providing data synchronization by facilitating data synchronization system design
US20070186069A1 (en) * 2004-08-10 2007-08-09 Moir Mark S Coordinating Synchronization Mechanisms using Transactional Memory
US20060161516A1 (en) * 2005-01-14 2006-07-20 Microsoft Corporation Method and system for synchronizing multiple user revisions to a shared object
WO2011109049A1 (en) * 2010-03-04 2011-09-09 Alibaba Group Holding Limited Method and apparatus of backing-up subversion repository

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017048409A1 (en) * 2015-09-15 2017-03-23 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US10425477B2 (en) 2015-09-15 2019-09-24 Microsoft Technology Licensing, Llc Synchronizing file data between computer systems
US20170344594A1 (en) * 2016-05-27 2017-11-30 Cisco Technology, Inc. Delta database synchronization
US10671590B2 (en) * 2016-05-27 2020-06-02 Cisco Technology, Inc. Delta database synchronization
CN112714149A (en) * 2020-11-27 2021-04-27 北京飞讯数码科技有限公司 Data synchronization method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CA2769773A1 (en) 2013-05-04
AU2012339532A1 (en) 2014-05-01
AU2012339532B2 (en) 2016-12-01
CN104272649A (en) 2015-01-07
WO2013071428A8 (en) 2013-10-31
CA2769773C (en) 2018-01-09

Similar Documents

Publication Publication Date Title
AU2012339532B2 (en) System and method for data communication over a network
US11507594B2 (en) Bulk data distribution system
US10990629B2 (en) Storing and identifying metadata through extended properties in a historization system
US11314690B2 (en) Regenerated container file storing
JP6373939B2 (en) Dynamic data diff generation and delivery
US7711707B2 (en) Method for synchronizing and updating bookmarks on multiple computer devices
US6694335B1 (en) Method, computer readable medium, and system for monitoring the state of a collection of resources
CN103457905B (en) Method of data synchronization, system and equipment
US20170147616A1 (en) Compacting data history files
US20150363484A1 (en) Storing and identifying metadata through extended properties in a historization system
KR20040099392A (en) Method and apparatus for synchronizing how data is stored in different data stores
CN104935634B (en) Mobile device data sharing method based on Distributed shared memory
US20150032785A1 (en) Non-transitory computer-readable media storing file management program, file management apparatus, and file management method
CN104182294A (en) Method and device for backing up and recovering file
JP2007122643A (en) Data retrieval system, meta data synchronization method and data retrieval device
US9075722B2 (en) Clustered and highly-available wide-area write-through file system cache
US11106635B2 (en) Computer system, file storage controller, and data sharing method
EP1862923A1 (en) Apparatus and associated method for synchronizing databases by comparing hash values
CN105260486A (en) Data processing method, device and system
US10402373B1 (en) Filesystem redirection
CN103294739B (en) Document management server, document management apparatus, document file management system and method
US9507528B2 (en) Client-side data caching
CN109325057B (en) Middleware management method, device, computer equipment and storage medium
JP2009277142A (en) Operation information management apparatus, operation information management method, and operation information management program
KR101929948B1 (en) Method and system for data type based multi-device synchronization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12849812

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2012339532

Country of ref document: AU

Date of ref document: 20121105

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12849812

Country of ref document: EP

Kind code of ref document: A1