US20150301893A1 - Distributed storage and communication - Google Patents

Distributed storage and communication Download PDF

Info

Publication number
US20150301893A1
US20150301893A1 US14/696,375 US201514696375A US2015301893A1 US 20150301893 A1 US20150301893 A1 US 20150301893A1 US 201514696375 A US201514696375 A US 201514696375A US 2015301893 A1 US2015301893 A1 US 2015301893A1
Authority
US
United States
Prior art keywords
data
subsets
parity
recovered
recreated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/696,375
Inventor
Iskender Syrgabekov
Yerkin Zadauly
Chokan Laumulin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
QANDO SERVICES Inc
Original Assignee
QANDO SERVICES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by QANDO SERVICES Inc filed Critical QANDO SERVICES Inc
Priority to US14/696,375 priority Critical patent/US20150301893A1/en
Assigned to EXTAS GLOBAL LTD. reassignment EXTAS GLOBAL LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LAUMULIN, CHOKAN, SYRGABEKOV, ISKENDER, ZADAULY, YERKIN
Assigned to QANDO SERVICES INC. reassignment QANDO SERVICES INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EXTAS GLOBAL LTD.
Publication of US20150301893A1 publication Critical patent/US20150301893A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/1088Reconstruction on already foreseen single or plurality of spare disks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • H04L1/0063Single parity check
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/02Arrangements for detecting or preventing errors in the information received by diversity reception
    • H04L1/04Arrangements for detecting or preventing errors in the information received by diversity reception using frequency diversity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems

Definitions

  • the present invention relates to a method and system for storing and communicating data and in particular for storing data across separate storage locations, and transmitting and receiving data.
  • Data may be stored within a computer system using many different techniques. Should an individual computer system such as a desktop or laptop computer be stolen or lost the data stored on it may also be lost with disastrous effects. Backing up the data on a separate drive may maintain the data but sensitive information may still be lost and made available to third parties. Even where the entire system is not lost or stolen, individual disk drives or other storage devices may fail leading to a loss of data with similar catastrophic effects.
  • a RAID (redundant array of inexpensive drives) array may be configured to store data under various conditions.
  • RAID arrays use disk mirroring and additional optional parity disks to protect against individual disk failures.
  • a RAID array must be configured in advance with a fixed number of disks each having a predetermined capacity.
  • the configuration of RAID arrays cannot be changed dynamically without rebuilding the array and this may result in significant system downtime. For instance, should a RAID array run out of space then additional disks may not be added easily to increase the overall capacity of the array without further downtime.
  • RAID arrays also cannot easily deal with more than two disk failures and separate RAID arrays cannot be combined easily.
  • RAID arrays may be located at different parts of a network, configuring multiple disks in this way is difficult and it is not convenient to place the disks at separate locations. Therefore, even though RAID arrays may be resilient to one or two disk failures a catastrophic event such as a fire or flood may result in the destruction of all of the data in a RAID array as disks are usually located near to each other.
  • Nested level RAID arrays may improve resilience to further failed disks but these systems are complicated, expensive and cannot be expanded without rebuilding the array.
  • portions of transmitted data may also be lost, corrupted or intercepted, especially over noisy or insecure channels.
  • a method of storing data comprising the steps of a) separating the data into a plurality of data subsets; b) generating parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; c) repeating steps a and b on each of the plurality of data subsets and parity data providing further data subsets and further parity data; and d) storing each of the further data subsets and further parity data in separate storage locations. Therefore, the data are separated recursively so that they are distributed as separate data subsets and parity data in several separate storage locations.
  • An original data set is divided into subsets.
  • a parity data set is created from the subsets.
  • the parity data set provides a mechanism for recreating any of the subsets should they be lost or corrupted.
  • the subsets and parity data may contribute to recreating the original data set should this be required. If only parity data subsets are lost then no further processing is required and the original data remains. If no subsets are lost then again, no further processing is required.
  • the process continues with any of the subsets and parity data sets being divided again in a similar way and further parity data being generated. This forms a cascade of data subsets that may be brought back together to form the original data.
  • any subsets of data are lost then they may be recreated from the remaining subsets and parity data at that particular level in the cascade of data. Therefore, the cascade may grow dynamically and does not need to be defined in advance.
  • Each subset of data or parity data may be stored separately, for instance in neighbouring sections on a disk drive or on separate servers in different organisations or territories. Also, the actual storage locations may be of different sizes or types and so this provides further flexibility to the storage system.
  • each data subset and parity data may be recorded so that the original data set may be recreated. Therefore, the location information may be used to restrict access to the original data as access to individual data subsets may not provide a third party access the original data without the remaining subsets.
  • Each of the plurality of data subsets and parity data may be stored on a separate storage location to the other data subsets and parity data.
  • more than one data subset or parity data may be stored on a single storage location especially if the number of storage locations available is limited.
  • each of the further data subsets and further parity data may be stored in separate physical devices.
  • step c) may be repeated for each of the plurality of data subsets and parity data. This distributes and cascades the data more effectively and so improves resilience to data loss and also interception.
  • the method may further comprise the steps of providing additional storage locations and repeating steps a and b on any of the further data subsets or parity data stored in the separate storage locations as the additional storage locations are provided. This allows a dynamic growth of the data cascade and allows the storage capacity to be increased without rebuilding the entire system.
  • the data are separated byte-wise.
  • other separation methods may be used such as bit-wise or by different lengths of bits.
  • the data subsets may also be of different sizes.
  • the data are separated into two data subsets.
  • the data are separated according to their odd or even status.
  • the parity data are generated by performing a logical function on the plurality of data subsets.
  • the logical function may be chosen to reduce processing requirements. Parity data generation is not limited to a logical function. For instance, data duplication may also be used.
  • the logical function may be an exclusive OR.
  • This function (XOR) requires a particularly low processing overhead and so improves efficiency.
  • such a function may be carried out using straightforward hardware.
  • the method further comprises the step of encrypting the data,. This provides enhanced security and/or privacy.
  • the separate storage locations are selected from the group consisting of hard disk drive, optical disk, FLASH RAM, web server, FTP server and network file server.
  • Other storage mediums may be used and are not limited to read/write locations.
  • the method may be independent of the specific type of storage used. Many other storage types and locations may be used.
  • the data are web pages or individual files. Web pages or web sites may then be distributed or accessed more securely and reduce eavesdropping or other forms of surveillance. For instance, it may not be possible to generate or recover the original data from individual subsets of data. A minimum quantity of data subsets may be required. Even with access to all subsets of data it may be possible to restrict the ability to recover the original data, for instance by using encryption or requiring details of how the original subsets of data were separated and created.
  • the method further comprises the step of c1) applying a function to any one or more of the data subsets and parity data to generate one or more associated authentication codes.
  • the function may be a hash function.
  • the hash function may be selected from the group consisting of: checksums, check digits, fingerprints, randomizing functions, error correcting codes, and cryptographic hash functions.
  • the authentication codes may be stored with the further data subsets and/or further parity data.
  • the authentication codes may be stored as header information.
  • a method of retrieving data stored in separate storage locations comprising the steps of: a) recovering subsets of data and parity data from the separate storage locations; b) recreating any missing subsets of data from the recovered subsets of data and parity data to form recreated subsets of data; c) combining the subsets of data and any recreated subsets of data to form a plurality of consolidated data sets, wherein the plurality of consolidated data sets include further subsets of data and further parity data; and d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combining the further subsets of data and any recreated further subsets of data to form an original set of data. Subsets of data are combined and then recombined until the original data set is recovered.
  • each data subset may have their own individual storage location remote from the other, further enhancing security and data reliability.
  • the original data may be encrypted and the method further comprises the step of: f) decrypting the original data.
  • the method may further comprise the step of receiving location information of the separate storage locations. This allows easier access and the location information may be further used to restrict access.
  • the separate storage locations are one or more selected from the group consisting of hard disk drive, optical disk, FLASH RAM, web server, FTP server and network file server.
  • the separate storage locations may be accessible over a network.
  • the network may be internal or external and may for example be the Internet.
  • the method may further comprise the steps of: recovering authentication codes associated with one or more of the subsets of data and parity data;
  • the authentication codes may be hash codes and the authentication step may comprise applying a hash function to the subsets of data and/or parity data to generate a comparison hash code and comparing this comparison hash code with the authentication codes associated with the data subsets and/or parity data.
  • apparatus for storing data comprising a processor arranged to: a) separate the data into a plurality of data subsets; b) generate parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; c) execute a and b on each of the plurality of data subsets and parity data providing further data subsets and further parity data; and d) store each of the further data subsets and further parity data in separate storage locations.
  • the processor may be further arranged to apply a function to any one or more of the data subsets and parity data to generate one or more associated authentication codes, and further wherein the further data subsets and further parity data may be stored with their associated authentication codes.
  • apparatus for retrieving data stored in separate storage locations comprising a processor arranged to: a) recover subsets of data and parity data from the separate storage locations; b) recreate any missing subsets of data from the recovered subsets of data and parity data to form recreated subsets of data; c) combine the subsets of data and any recreated subsets of data to form a plurality of consolidated data sets, wherein the plurality of consolidated data sets include further subsets of data and further parity data; and d) recreate any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combine the further subsets of data and any recreated further subsets of data to form an original set of data.
  • the processor may be further arranged to recover authentication codes associated with the one or more of the subsets of data and parity data, authenticate one or more of the subsets of data and parity data using the associated authentication codes, and recreate any subsets of data that fail authentication from the recovered subsets of data and parity data to form recreated subsets of data.
  • a data storage medium for storing a data file, containing data subsets, parity data and authentication codes
  • the authentication codes provide authentication of the data subsets
  • parity data are combinable with the data subsets to regenerate missing data subsets or data subsets which fail authentication.
  • the storage medium may be selected from the group consisting of: compact disc, DVD, hard drive, solid state drive, FLASH memory and digital tape.
  • the data file may be selected from the group consisting of: multimedia, audio, video, MPEG, MP3, music, database and binary file.
  • a method of transmitting data comprising the steps of: a) separating the data to be transmitted into a plurality of data subsets; b) generating parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; and c) transmitting the plurality of data subsets and parity data.
  • This allows data to be transmitted more securely and more reliably as lost data may be recreated.
  • any communication channel used may be more fully utilised either reducing error rate or allowing a lower power to be used maintaining a similar available data rate.
  • the transmitting step may further comprise the steps of: i) repeating steps a and b on any one or more of the plurality of data subsets and parity data providing further data subsets and further parity data; and ii) transmitting the further data subsets and further parity data. This provides increased reliability and security.
  • any or all of the data subsets and further parity data are transmitted by different transmission means.
  • the different, transmission means may be one or more selected from the group consisting of: wire, radio wave, internet protocol and mobile communication.
  • any or all of the data subsets are transmitted on different channels.
  • the channels may be mobile communication channels. Therefore, this may be implemented in mobile telephones to increase security and reliability of communication.
  • the different channels are different radio frequencies.
  • the choice of different channels is predetermined. This allows a receiver to be able to receiver the data successfully or more conveniently.
  • the method may further comprise the step of transmitting the choice of different channels.
  • the choice may be transmitted as a code.
  • This may be user selectable or automatic.
  • the code may be known to both transmitter and receiver or transmitted securely between them.
  • the method may further comprise the step of encrypting the data. This adds to security.
  • the code may instead or additionally be encrypted.
  • a method of receiving data comprising the steps of: a) receiving subsets of data and parity data; b) recreating any missing subsets of data from the received subsets of data and parity data to form recreated subsets of data; c) combining the subsets of data and any recreated subsets of data.
  • the recreated subsets of data form a plurality of consolidated data sets, and further wherein the plurality of consolidated data sets include further subsets of data and further parity data
  • the combining step further comprises the steps of: d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combining the further subsets of data and any recreated further subsets of data to form an original set of data.
  • the received data may be encrypted and the method further comprises the step of: f) decrypting the original data.
  • the receiving step may further comprise receiving any or all of the subsets of data and parity data from different channels.
  • the different channels may be different radio frequencies.
  • the different channels may be different cellular radio channels.
  • the method may further comprise the step of receiving channel information including details of which channels contain which data subsets and parity data.
  • the combining step may further comprise combining the subsets of data and any recreated subsets of data based on the received channel information.
  • the channels carrying any or all of the data subsets and parity data vary during reception. This channel hopping makes it more difficult for an unauthorised recipient to decode the original data or listen in to a voice call.
  • the data may be selected from the group consisting of: audio, mobile telephone, packet data, video, real time duplex data and Internet data. Additionally, this method is suitable for other data types.
  • apparatus for transmitting data comprising a processor arranged to: a) separate the data to be transmitted into a plurality of data subsets; b) generate parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; and c) transmit the plurality of data subsets and parity data.
  • the processor may have processing logic stored as hardware or software.
  • the processor may be further arranged to transmit by: i) repeating a and b on any one or more of the plurality of data subsets and parity data providing further data subsets and further parity data; and ii) transmit the further data subsets and further parity data.
  • apparatus for receiving data comprising a processor arranged to: a) receive subsets of data and parity data; b) recreate any missing subsets of data from the received subsets of data and parity data to form recreated subsets of data; c) combine the subsets of data and any recreated subsets of data.
  • the recreated subsets of data form a plurality of consolidated data sets, and further wherein the plurality of consolidated data sets include further subsets of data and further parity data
  • the processor is further arranged to combine the subsets of data by: d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combine the further subsets of data and any recreated further subsets of data to form an original set of data.
  • the data may be cascaded to form further subsets and further parity data.
  • the original data may be regenerated by reversing the cascade process generating any missing data subsets from parity data and the successfully received data.
  • a mobile handset comprising the previously described apparatus, i.e. a transmitter and/or a receiver.
  • the data described above i.e. the original data, transmitted data, voice data or data to be secured and stored, may be difference data relative to a reference data file and the method may further comprise the step of comparing the original data with the reference data file to obtain the difference data.
  • This allows data to be stored or transmitted securely without requiring underlying data to leave a restricted or protected environment.
  • This optional feature may be implemented in the methods or apparatus described above.
  • the processor may be further arranged or the method may include the step of applying the difference data to the reference data file to obtain underlying data.
  • the methods may be implemented in computer software running on a computing device, for instance.
  • the software may be stored on media or transmitted as a signal.
  • the computing device or devices may be a desktop personal computer or server computer running a suitable operating system such as Windows®, Apple OS X or UNIX based systems.
  • An example computing device may include a hard drive or other storage medium, input devices such as a keyboard and mouse and a display screen.
  • the method steps may be carried out on a single machine, computer or group of computers connected to a network such as an intranet or the Internet.
  • FIG. 1 shows a flowchart of a method for storing data, according to an aspect of the present invention given by way of example only;
  • FIG. 1 a shows a flowchart of an alternative method similar to that shown in FIG. 1 ;
  • FIG. 2 shows a schematic diagram of the data stored using the method of FIG. 1 ;
  • FIG. 2 a shows a schematic diagram of the data stored using the method of FIG. 1 a;
  • FIG. 3 shows a schematic diagram of data stored according to the method of FIG. 1 ;
  • FIG. 3 a shows a schematic diagram of data stored according to the method of FIG. 1 a;
  • FIG. 4 shows a schematic diagram of the data distributed as clusters stored following the method of FIG. 1 ;
  • FIG. 4 a shows a schematic diagram of the data distributed as cluster stored following the method of FIG. 1 a;
  • FIG. 5 shows a flow diagram of a method of storing data according to a further aspect of the present invention, given by way of example only;
  • FIG. 6 shows a schematic diagram of a network used to store data according to a further aspect of the present invention
  • FIG. 7 shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only;
  • FIG. 7 a shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only;
  • FIG. 8 shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only.
  • FIG. 8 a shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only.
  • Data to be stored may be in the form of a binary file, for instance.
  • the data may be divided into subsets of data.
  • Parity data may be generated from the subsets of data in such a way that if one or more of the data subsets is destroyed or lost that missing subset may be recreated from the remaining subsets and parity data.
  • Parity or control data is generated from the original data for the purpose of error checking or to enable lost data to be regenerated.
  • the parity data does not contain any additional information over that contained by the original data.
  • XOR exclusive or
  • parity data For a more detailed description of a calculation of parity data see http://www.pcguide.com/ref/hdd/perf/raid/concepts/genParity-c.html. Once the parity data has been calculated all of the data subsets and parity data may be stored in separate or remote file locations.
  • each of the data subsets or parity data may be separated into further subsets and further parity data may be generated in order to utilise any additional storage locations.
  • a cascade of data subsets may be created until all available storage locations are utilised or a predetermined limit in the number of locations is reached.
  • the data may be recovered using a reverse process with any missing data subsets being regenerated or recreated from the remaining data subsets and parity data using a suitable regeneration calculation or algorithm. The reading process continues until the single original data is recovered.
  • authentication or hash codes may be associated with any of the data subsets and/or parity data for use in confirming the authenticity of the data subsets. Authentic data subsets will not have changed or altered deliberately or accidentally following creation of the data subset. This alternative embodiment or its variations are described as authentication embodiments throughout the text.
  • FIG. 1 shows a flow diagram of an example method 10 for storing data.
  • the original data 20 is split into data subsets A and B in step 30 .
  • the data may be split into two equal parts, so that the subsets A and B are of equal size.
  • Zero padding may be used to ensure equal sized subsets A and B.
  • additional zero bytes or groups of bits
  • the parity data P may be generated during the splitting or separation step 30 .
  • a hashing function h(n) may be applied at step 45 .
  • This hashing function generates hash codes h(A) and h(B).
  • the parity data P may also be hashed to generate hash code h(P).
  • the hashing function may be chosen such that the computational power to perform it or compare resultant hash codes is acceptable or within system limitations.
  • the hash function may be applied to subsets A, B and/or parity data P. A reduction in computer overhead may be made by not hashing one or more of the data subsets or parity data in any combination.
  • the resultant two data subsets A and B and parity data set P may be stored at step 50 .
  • the subsets A and B and parity data may be stored in memory or a hard drive for instance.
  • the method 10 may loop at this point. It is determined whether or not there are any further storage locations available or required at step 60 . If there are then the method loops back to step 30 where any or each of the data subsets A, B and/or parity data P are further split into new subsets and a further parity data set. The loop continues with each data subset and parity data being for divided and generated until there are no further storage locations available and the method stops at step 70 .
  • the hash or authentication codes may be stored together with the data subsets A and B and/or the parity data P, stored as header information or stored separately, perhaps in a dedicated hash library or store.
  • the hash generation may be optionally differed until the lowest level of split data is reached, i.e. only the data which is actually stored rather than any intermediate data subsets. This provides improved efficiency.
  • the first iteration of the loop of method 10 results in three separate data files (A, B and P); two full iterations results in nine separate data files and three full iterations results in 27 separate data files.
  • three separate data files are generated (A, B and P) and three hash codes are generated (A h , B h and P h ).
  • the hash codes shown in FIG. 1 a may be generated for all stored data files and/or parity data to ensure that corruption or adjustment of the data has not occurred.
  • FIG. 2 shows a schematic diagram of the data resulting from a single iteration of the method shown in FIG. 1 .
  • Like method steps have the same reference numerals.
  • the original data set 20 is split byte-wise to generate data subset A and data subset B (i.e. block size of one byte).
  • the exclusive OR operation generates parity data P. Where there are three separate storage locations available the method 10 would stop at this stage resulting in a data cluster 150 having three distributed discrete data subsets A, B and P.
  • FIG. 2 a shows an alternative schematic diagram of the data including the hash codes.
  • FIG. 3 shows the result of a further iteration of steps 30 , 40 and 50 of method 10 .
  • nine separate storage locations are available and so each of the three data subsets A, B and P may be further split into three further data subsets each.
  • the hash codes are only required for the lowest level of data subsets and/or parity data AA, AB, AP, BA, BB, BP, PA, PB and PP as these are the only files that will be stored for later regeneration, i.e. they require authentication when they are read to ensure authenticity.
  • the various hash codes may be generated for the lowest level data sets in the cascade.
  • This additional recursive splitting 230 results in data subset A being split to form further data subsets AA and further parity data AP.
  • data subset B may be split into BA and BB, which together may be used to form parity data BP.
  • Parity data P may be split into PA, PB and PP.
  • each of the three data subsets have the same size.
  • the nine separate data locations used to store each of these nine data subsets may form a second level cluster 250 , which is shown in more detail as FIG. 4 (see FIG. 4 a for the authentication embodiment).
  • the first level cluster 150 has been expanded to form a second level cluster 250 .
  • the loop in the method 10 may be repeated as many times as necessary until all available storage locations are used or a predetermined limit is reached of the size of each subset has been reduced to a particular level.
  • FIG. 5 shows a schematic diagram of a system 300 used to store data according to the method 10 shown in FIG. 1 .
  • the system shown in FIG. 5 shows additional optional steps used to enhance the security and reliability of the system 300 according to the authentication embodiment.
  • a central server 360 administers the method and receives a request from a user to enter the system 310 .
  • the user logs on and is provided with encryption keys 320 .
  • a set of hash-codes (which may be unique) may be generated at step 45 , which serves as a unique identifier for the file, which may be used to guarantee authenticity. Encryption keys may be used to generate the hash codes.
  • a file is being stored as data 20 .
  • a database 370 is used to store log-in information and encryption keys and also the name of files to be stored.
  • the user registers with the database to create a file name at step 340 and the data file is split into subsets A and B and parity data P is created from these data subsets.
  • Each of the data subsets and parity data are assigned an identifier at step 350 , which is also administered by the database 370 .
  • Separate storage locations are accessible over a network and form a pool of available storage locations 380 .
  • the server 360 may determine the maximum level of recursive splitting to be achieved, which may be determined by predefined preferences or system parameters.
  • the server 360 also monitors the availability of each individual separate storage location within the pool 380 .
  • the server 360 may administer the storage as a processing layer invisible to the user. In other words, once they have accessed the system the storage of data appears to the user as conventional storage and retrieval. The original data 20 may be retrieved from the pool of storage locations 380 whilst any missing data may be regenerated using the parity data P. from any required data layer.
  • the server 360 keeps track of the level of data cascading and each data subset.
  • the server may also store and administer the hash codes, which may be stored separately or together with the data subsets and parity data.
  • the data subsets may be encrypted using the encryption keys and a tamper or distortion prevention facility may be incorporated using the hash-code. Therefore, the system 300 shown in FIG. 5 provides additional safety to the user storing sensitive information, as a third party having access to any or all of the individual separate storage locations within this storage pool 380 cannot recreate the original data 20 without the original encryption keys administered by the server 360 .
  • no encryption key may be required but there may be a prohibitive level of computing power needed to generate an altered data subset with the same hash code as the original.
  • the encryption keys may also be used to encrypt the data subsets for added security. Intercepting the transfer of data subsets between the storage pool 380 and the user by a third party also does not result in any data becoming available to them without the encryption keys, or obtaining copies of at least a minimum number of data subsets.
  • FIG. 6 A further embodiment of a system used to perform the method 10 is shown in FIG. 6 .
  • the system 400 shown in FIG. 6 may be used to distribute information securely over networks such as the Internet or an intranet.
  • the Internet or subsets of web pages 420 may be distributed securely to a user machine 440 via a central server 410 .
  • the central server 410 takes the web pages 420 and stores them according to the method 10 shown in FIG. 1 within separate storage locations 430 .
  • the data subsets may be encrypted and/or hashed to provide authentication, as described with reference to FIG. 5 .
  • Central server 410 supplies the user machine 440 with a decryption code or codes and information to identify and locate data subsets from particular storage locations 430 and how to recreate the data forming the original web pages 420 .
  • the web pages 420 are no longer prone to a single point of failure or attack (for instance, a single web server going down) as the original data 20 is distributed amongst separate storage locations 430 . Furthermore, any third party intercepting the network traffic of the user computer 440 would not be able to decrypt or recreate the original data forming the web pages 420 without the decryption keys and regeneration information supplied by the central server 410 .
  • Alteration may be detected by rehashing the data subsets and/or parity data and comparing the resultant hash code with that associated with the original. Where a difference is detected this data subset or parity data may be rejected and recreated using only authenticated data sets and/or parity data. Only data subsets that fail authentication by the hash codes (or are otherwise lost or unavailable) need to be recreated or regenerated.
  • Such a secure system may be suitable for banking transactions or other forms of secure data or where the system user requires additional privacy and security.
  • the central server 410 may be able to store or cache the entire available Internet or any particular individual websites and make these available only to particular subscribing users.
  • the central server 410 may also perform the function of a search engine or other central consolidator of information. Querying the search engine in this way may render search results containing decryption keys and information used to locate and regenerate the websites or other retrievable documents.
  • a further use for such a storage system according to the authentication embodiment is to store and recreate high quality media avoiding distortion and missing data. For instance, higher quality audio or video recordings may be obtained due to the high level of error checking used.
  • Each data subset may be checked for authenticity (e.g. corruption) using the authentication or hash codes. Any data subset that fails this authentication test may be rejected and regenerated using the parity data and any data subsets that pass authentication (the parity data may also be checked).
  • this storage method may be implemented on hard drives, optical discs such as CDs, DVDs and Blueray (RTM) and file encoding similar to MP3 and MPEG type encoding.
  • the method may be used to generate higher quality multimedia files.
  • FIG. 7 shows a schematic diagram of a communication system.
  • Two communication devices 500 , 510 transmit and received data to and from each other. This may be via a communication network such as a cellular network or directly as in two-way radios.
  • voice data is used as an illustration.
  • many other types of data may also be transmitted and received such as for instance, video, web or Internet and data files.
  • voice data is split into data subsets and parity data using a similar method to that described with respect to FIG. 1 for data storage.
  • These data subsets A, B and parity data P are transmitted separately across individual channels C 1 , C 2 and C 3 .
  • These data sets may be transmitted according to other schemes together or separately and may be transmitted using different mediums, for instance a mixture of wireless, cable and fibre optic transmission.
  • the splitting function may be carried out within the communication device 500 or within a transmission network facility such as a mobile base station or similar.
  • a cellular telephone may be adapted by the additional of additional hardware to implement the described functions.
  • the functions may be implemented as software.
  • hash codes may be generated from hash or other authentication functions and associated with the data subsets prior to transmission. This authentication embodiment is illustrated in FIG. 7 a.
  • Data subsets A and B may be combined to form the original voice data as a reverse of the splitting procedure. If either subsets A or B are lost, missing from the received transmission or fail a hashing match test then parity data P may be used to regenerate the missing data in a similar way to the retrieval of stored data described above. An eavesdropper receiving only one of channels C 1 , C 2 or C 3 will therefore not be able to reconstruct the voice data. Therefore, this provides a more secure as well as more reliable communication system and method. Security may be enhanced further by differing the mode, type or frequency of each channel. Integrity may be provided by the hash function authentication checks in the authentication embodiment shown in FIG. 7 a.
  • FIG. 8 shows a schematic diagram of a further embodiment similar to that shown in FIG. 7 .
  • this further embodiment implements a further cascade or layer of data splitting before transmission.
  • a further level of recombination must be used to reconstruct the voice or other transmitted data.
  • this further cascade of data splitting and parity data generation requires nine channels to communicate each data subset and parity data.
  • Such an additional cascade provides further resilience to data loss.
  • the data transmitted from five of the channels may be lost with the data fully reconstructable (lossless).
  • Further cascade may be achieved providing further resilience.
  • other numbers of channels of data may be used.
  • the data may be split three, four or five ways or more at each cascade.
  • Further cascade levels may be implemented dependent on the required level of security or reliability. This further fills the available channel capacity but in so doing so reduces the power requirements of each channel to maintain the same probability of data loss (Shannon or noisy-channel coding theorem).
  • the transmitted data subsets and/or parity data may any or each have the hash function applied to them.
  • the hash codes may be transmitted to the receiver.
  • the communication system may also comprise an additional layer of security or functionality.
  • the communication device 510 receiving the data may require information as to which data subsets and parity data are transmitted over which particular channels. In the example shown in FIGS. 8 and 8 a , channel C 1 is used to transmit data subset AA, C 2 is used for AB, etc, however, any combination may be used.
  • Such information may be exchanged between communication devices 500 , 510 before or during transmission, by for instance transmission of a code denoting a particular combination of channels and data subsets.
  • the particular combination may vary during transmission and reception. This may be according to a prearranged or predetermined scheme or the particular current combination may be transmitted to keep the transmitter and receiver synchronised. Both communication devices 500 , 510 may both transmit and receive simultaneously or in isolation.
  • the data may be stored or transmitted as difference or delta data relative to a reference file. Therefore, access to or knowledge of the reference file may be required in order to retrieve or receive the data.
  • This further security precaution may be used where there are practical or legal restrictions on transmitting or storing certain types or data. For instance, the storage of banking or confidential information may be restricted to a particular organisation or site. However, it may still be necessary to store these data such that the risk of their loss is reduced. Therefore, it may not be possible to distribute or transmit these types of data across different storage locations, as described previously, even using encryption. This problem may be addressed by instead transmitting and distributing the difference or delta data instead of the underlying data. In this situation, data protection requirements are met and the data may be secured against loss or corruption.
  • file A (or signal A) may be the underlying data required to be stored or transmitted.
  • File B may be the reference file.
  • a comparison of file A and file B may be made using a comparison function similar to UNIX diff, rdiff or rsync procedures to generate file C.
  • the difference file may be generated by applying the XOR function to file A and file B, perhaps byte-wise, for example.
  • File C is therefore a representation or encoding of the difference between file A and file B; file A cannot be regenerated from file C without knowledge or access to file B.
  • File B may take many different forms and may be a randomly generated string, a document, an audio file, a video file, the text of a book or any other known or generated data set, for example.
  • the benefit of using a known data file e.g. an MP3 file of a well known song
  • the user must simply remember which particular file they used (perhaps a MP3 file of the user's favourite song). As there are millions of options to a user, security can remain relatively high even when a well-known data file is used.
  • a function may be used to apply the difference or delta file C to the reference file B.
  • Various methods may be used in for regenerating file A depending on how the difference or delta file C was generated and encoded.
  • a further XOR function may be applied to files C and B to regenerate file A. This may be done on a byte-by-byte basis, for example. It is likely that that files A and B will be of different sizes. Where file A is smaller than file B then the procedure may simply stop when each byte or file chunk has been compared. Where file A is larger than file B then multiple copies of file B may be used until each byte of file A has been compared. Other variations, difference procedures and comparison functions may be used.
  • the difference data may be generated as a data stream, i.e. transmitted, received and encoded or decoded in real time.
  • the difference data may be divided into data subsets with parity data generated so that these data subsets may be stored in a distributed way or transmitted according to the methods described above.
  • the reference file (B) may again be used to sequentially encode the data stream in real-time. Should the data stream exceed the length of the reference file then the reference file may be reused until transmission ends.
  • voice communication for example, each time transmission starts, the beginning of the reference file may be used for comparison with a digitised voice or audio data stream to generate the difference data stream.
  • reuse may be reduced by continuing from the last point used in the reference file for each new transmission. This alternative may further improve security.
  • the data may be stored on many different types of storage medium such as hard disks, FLASH RAM, web servers, FTP servers and network file servers or a mixture of these.
  • storage medium such as hard disks, FLASH RAM, web servers, FTP servers and network file servers or a mixture of these.
  • the files are described above as being split into two data subsets (A and B) and a single parity data block (P) during each iteration three (A, B and C), four (A-D) or more data subsets may be generated.
  • parity data is described in the example as being generated from the XOR function but other functions may be used. For instance, Hamming, Reed-Solomon, Golay, Reed-Muller or other suitable error correcting codes may be used.
  • the data subsets maybe stored in physically separate or logically separate locations even within the same hard disk drive or cluster.

Abstract

Storing, retrieving, transmitting and receiving data (20) by a) separating the data into a plurality of data subsets (A, B); b) generating parity data (P) from the plurality of data subsets (A, B) such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data (P). Steps a and b may be repeated on one or more of the plurality of data subsets and parity data providing further data subsets and further parity data; and d) storing each of the further data subsets and further parity data in separate storage locations (380) or transmitting the further data subsets and further parity data.

Description

    FIELD OF THE INVENTION
  • The present invention relates to a method and system for storing and communicating data and in particular for storing data across separate storage locations, and transmitting and receiving data.
  • BACKGROUND OF THE INVENTION
  • Data may be stored within a computer system using many different techniques. Should an individual computer system such as a desktop or laptop computer be stolen or lost the data stored on it may also be lost with disastrous effects. Backing up the data on a separate drive may maintain the data but sensitive information may still be lost and made available to third parties. Even where the entire system is not lost or stolen, individual disk drives or other storage devices may fail leading to a loss of data with similar catastrophic effects.
  • A RAID (redundant array of inexpensive drives) array may be configured to store data under various conditions. RAID arrays use disk mirroring and additional optional parity disks to protect against individual disk failures. However, a RAID array must be configured in advance with a fixed number of disks each having a predetermined capacity. The configuration of RAID arrays cannot be changed dynamically without rebuilding the array and this may result in significant system downtime. For instance, should a RAID array run out of space then additional disks may not be added easily to increase the overall capacity of the array without further downtime. RAID arrays also cannot easily deal with more than two disk failures and separate RAID arrays cannot be combined easily.
  • Although the disks that make up a RAID array may be located at different parts of a network, configuring multiple disks in this way is difficult and it is not convenient to place the disks at separate locations. Therefore, even though RAID arrays may be resilient to one or two disk failures a catastrophic event such as a fire or flood may result in the destruction of all of the data in a RAID array as disks are usually located near to each other.
  • Nested level RAID arrays may improve resilience to further failed disks but these systems are complicated, expensive and cannot be expanded without rebuilding the array.
  • Similarly, portions of transmitted data may also be lost, corrupted or intercepted, especially over noisy or insecure channels.
  • Furthermore, current data storage and/or transmission methods and devices are prone to corruption and data loss. Even small levels of corruption may affect data quality. This is especially so where the data is used to record high quality audio or visual material as corruption can lead to distortion and loss of quality during playback or from received media.
  • Therefore, there is required a storage method and system for data that overcomes these problems.
  • SUMMARY OF THE INVENTION
  • In accordance with a first aspect of the present invention there is provided a method of storing data comprising the steps of a) separating the data into a plurality of data subsets; b) generating parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; c) repeating steps a and b on each of the plurality of data subsets and parity data providing further data subsets and further parity data; and d) storing each of the further data subsets and further parity data in separate storage locations. Therefore, the data are separated recursively so that they are distributed as separate data subsets and parity data in several separate storage locations. An original data set is divided into subsets. A parity data set is created from the subsets. The parity data set provides a mechanism for recreating any of the subsets should they be lost or corrupted. The subsets and parity data may contribute to recreating the original data set should this be required. If only parity data subsets are lost then no further processing is required and the original data remains. If no subsets are lost then again, no further processing is required. The process continues with any of the subsets and parity data sets being divided again in a similar way and further parity data being generated. This forms a cascade of data subsets that may be brought back together to form the original data. If any subsets of data are lost then they may be recreated from the remaining subsets and parity data at that particular level in the cascade of data. Therefore, the cascade may grow dynamically and does not need to be defined in advance. Each subset of data or parity data may be stored separately, for instance in neighbouring sections on a disk drive or on separate servers in different organisations or territories. Also, the actual storage locations may be of different sizes or types and so this provides further flexibility to the storage system.
  • The locations of each data subset and parity data may be recorded so that the original data set may be recreated. Therefore, the location information may be used to restrict access to the original data as access to individual data subsets may not provide a third party access the original data without the remaining subsets.
  • Each of the plurality of data subsets and parity data may be stored on a separate storage location to the other data subsets and parity data. However, more than one data subset or parity data may be stored on a single storage location especially if the number of storage locations available is limited.
  • Preferably, each of the further data subsets and further parity data may be stored in separate physical devices.
  • Preferably, step c) may be repeated for each of the plurality of data subsets and parity data. This distributes and cascades the data more effectively and so improves resilience to data loss and also interception.
  • Optionally, the method may further comprise the steps of providing additional storage locations and repeating steps a and b on any of the further data subsets or parity data stored in the separate storage locations as the additional storage locations are provided. This allows a dynamic growth of the data cascade and allows the storage capacity to be increased without rebuilding the entire system.
  • Advantageously, the data are separated byte-wise. However, other separation methods may be used such as bit-wise or by different lengths of bits. The data subsets may also be of different sizes.
  • Optionally, the data are separated into two data subsets.
  • Optionally, the data are separated according to their odd or even status.
  • Preferably, the parity data are generated by performing a logical function on the plurality of data subsets. The logical function may be chosen to reduce processing requirements. Parity data generation is not limited to a logical function. For instance, data duplication may also be used.
  • Preferably, the logical function may be an exclusive OR. This function (XOR) requires a particularly low processing overhead and so improves efficiency. Furthermore, such a function may be carried out using straightforward hardware.
  • Optionally, the method further comprises the step of encrypting the data,. This provides enhanced security and/or privacy.
  • Optionally, the separate storage locations are selected from the group consisting of hard disk drive, optical disk, FLASH RAM, web server, FTP server and network file server. Other storage mediums may be used and are not limited to read/write locations. The method may be independent of the specific type of storage used. Many other storage types and locations may be used.
  • Advantageously, the data are web pages or individual files. Web pages or web sites may then be distributed or accessed more securely and reduce eavesdropping or other forms of surveillance. For instance, it may not be possible to generate or recover the original data from individual subsets of data. A minimum quantity of data subsets may be required. Even with access to all subsets of data it may be possible to restrict the ability to recover the original data, for instance by using encryption or requiring details of how the original subsets of data were separated and created.
  • Optionally, the method, further comprises the step of c1) applying a function to any one or more of the data subsets and parity data to generate one or more associated authentication codes.
  • Preferably, the function may be a hash function.
  • Optionally, the hash function may be selected from the group consisting of: checksums, check digits, fingerprints, randomizing functions, error correcting codes, and cryptographic hash functions.
  • Optionally, the authentication codes may be stored with the further data subsets and/or further parity data.
  • Optionally, the authentication codes may be stored as header information.
  • In accordance with a second aspect of the present invention there is provided a method of retrieving data stored in separate storage locations comprising the steps of: a) recovering subsets of data and parity data from the separate storage locations; b) recreating any missing subsets of data from the recovered subsets of data and parity data to form recreated subsets of data; c) combining the subsets of data and any recreated subsets of data to form a plurality of consolidated data sets, wherein the plurality of consolidated data sets include further subsets of data and further parity data; and d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combining the further subsets of data and any recreated further subsets of data to form an original set of data. Subsets of data are combined and then recombined until the original data set is recovered.
  • Advantageously, the subsets of data and parity data may be each recovered from separate physical devices. Therefore, each data subset may have their own individual storage location remote from the other, further enhancing security and data reliability.
  • Optionally, the original data may be encrypted and the method further comprises the step of: f) decrypting the original data.
  • Preferably, the method may further comprise the step of receiving location information of the separate storage locations. This allows easier access and the location information may be further used to restrict access.
  • Preferably, the separate storage locations are one or more selected from the group consisting of hard disk drive, optical disk, FLASH RAM, web server, FTP server and network file server.
  • Advantageously, the separate storage locations may be accessible over a network. The network may be internal or external and may for example be the Internet.
  • Optionally, the method may further comprise the steps of: recovering authentication codes associated with one or more of the subsets of data and parity data;
  • authenticating one or more of the subsets of data and parity data using the associated authentication codes; and
  • recreating any subsets of data that fail authentication from the recovered subsets of data and parity data to form recreated subsets of data.
  • Optionally, the authentication codes may be hash codes and the authentication step may comprise applying a hash function to the subsets of data and/or parity data to generate a comparison hash code and comparing this comparison hash code with the authentication codes associated with the data subsets and/or parity data.
  • In accordance with a third aspect of the present invention there is provided apparatus for storing data comprising a processor arranged to: a) separate the data into a plurality of data subsets; b) generate parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; c) execute a and b on each of the plurality of data subsets and parity data providing further data subsets and further parity data; and d) store each of the further data subsets and further parity data in separate storage locations.
  • Optionally, the processor may be further arranged to apply a function to any one or more of the data subsets and parity data to generate one or more associated authentication codes, and further wherein the further data subsets and further parity data may be stored with their associated authentication codes.
  • In accordance with a fourth aspect of the present invention there is provided apparatus for retrieving data stored in separate storage locations comprising a processor arranged to: a) recover subsets of data and parity data from the separate storage locations; b) recreate any missing subsets of data from the recovered subsets of data and parity data to form recreated subsets of data; c) combine the subsets of data and any recreated subsets of data to form a plurality of consolidated data sets, wherein the plurality of consolidated data sets include further subsets of data and further parity data; and d) recreate any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combine the further subsets of data and any recreated further subsets of data to form an original set of data.
  • Optionally, the processor may be further arranged to recover authentication codes associated with the one or more of the subsets of data and parity data, authenticate one or more of the subsets of data and parity data using the associated authentication codes, and recreate any subsets of data that fail authentication from the recovered subsets of data and parity data to form recreated subsets of data.
  • In accordance with a fifth aspect of the present invention there is provided a data storage medium for storing a data file, containing data subsets, parity data and authentication codes,
  • wherein the data subsets are combinable to produce further data subsets and the further data subsets are combinable to produce the data file,
  • the authentication codes provide authentication of the data subsets, and
  • further wherein the parity data are combinable with the data subsets to regenerate missing data subsets or data subsets which fail authentication.
  • Optionally, the storage medium may be selected from the group consisting of: compact disc, DVD, hard drive, solid state drive, FLASH memory and digital tape.
  • Optionally, the data file may be selected from the group consisting of: multimedia, audio, video, MPEG, MP3, music, database and binary file.
  • In accordance with a sixth aspect of the present invention there is provided a method of transmitting data comprising the steps of: a) separating the data to be transmitted into a plurality of data subsets; b) generating parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; and c) transmitting the plurality of data subsets and parity data. This allows data to be transmitted more securely and more reliably as lost data may be recreated. Furthermore, as data is generated any communication channel used may be more fully utilised either reducing error rate or allowing a lower power to be used maintaining a similar available data rate.
  • Optionally, the transmitting step may further comprise the steps of: i) repeating steps a and b on any one or more of the plurality of data subsets and parity data providing further data subsets and further parity data; and ii) transmitting the further data subsets and further parity data. This provides increased reliability and security.
  • Optionally, any or all of the data subsets and further parity data are transmitted by different transmission means.
  • Advantageously, the different, transmission means may be one or more selected from the group consisting of: wire, radio wave, internet protocol and mobile communication.
  • Preferably, any or all of the data subsets are transmitted on different channels.
  • Advantageously, the channels may be mobile communication channels. Therefore, this may be implemented in mobile telephones to increase security and reliability of communication.
  • Optionally, the different channels are different radio frequencies.
  • Preferably, the choice of different channels is predetermined. This allows a receiver to be able to receiver the data successfully or more conveniently.
  • Optionally, the method may further comprise the step of transmitting the choice of different channels.
  • Preferably, the choice may be transmitted as a code. This may be user selectable or automatic. The code may be known to both transmitter and receiver or transmitted securely between them.
  • Optionally, the method may further comprise the step of encrypting the data. This adds to security. The code may instead or additionally be encrypted.
  • In accordance with a seventh aspect of the present invention there is provided a method of receiving data comprising the steps of: a) receiving subsets of data and parity data; b) recreating any missing subsets of data from the received subsets of data and parity data to form recreated subsets of data; c) combining the subsets of data and any recreated subsets of data.
  • Optionally, the recreated subsets of data form a plurality of consolidated data sets, and further wherein the plurality of consolidated data sets include further subsets of data and further parity data, and the combining step further comprises the steps of: d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combining the further subsets of data and any recreated further subsets of data to form an original set of data.
  • Optionally, the received data may be encrypted and the method further comprises the step of: f) decrypting the original data.
  • Advantageously, the receiving step may further comprise receiving any or all of the subsets of data and parity data from different channels.
  • Preferably, the different channels may be different radio frequencies.
  • Advantageously, the different channels may be different cellular radio channels.
  • Optionally, the method may further comprise the step of receiving channel information including details of which channels contain which data subsets and parity data.
  • Advantageously, the combining step may further comprise combining the subsets of data and any recreated subsets of data based on the received channel information.
  • Optionally, the channels carrying any or all of the data subsets and parity data vary during reception. This channel hopping makes it more difficult for an unauthorised recipient to decode the original data or listen in to a voice call.
  • Preferably, the data may be selected from the group consisting of: audio, mobile telephone, packet data, video, real time duplex data and Internet data. Additionally, this method is suitable for other data types.
  • According to a eighth aspect of the present invention there is provided apparatus for transmitting data comprising a processor arranged to: a) separate the data to be transmitted into a plurality of data subsets; b) generate parity data from the plurality of data subsets such that any one or more of the plurality of data subsets may be recreated from the remaining data subsets and the parity data; and c) transmit the plurality of data subsets and parity data. The processor may have processing logic stored as hardware or software.
  • Advantageously, the processor may be further arranged to transmit by: i) repeating a and b on any one or more of the plurality of data subsets and parity data providing further data subsets and further parity data; and ii) transmit the further data subsets and further parity data.
  • According to a ninth aspect of the present invention there is provided apparatus for receiving data comprising a processor arranged to: a) receive subsets of data and parity data; b) recreate any missing subsets of data from the received subsets of data and parity data to form recreated subsets of data; c) combine the subsets of data and any recreated subsets of data.
  • Advantageously, the recreated subsets of data form a plurality of consolidated data sets, and further wherein the plurality of consolidated data sets include further subsets of data and further parity data, and the processor is further arranged to combine the subsets of data by: d) recreating any missing further subsets of data from the further subsets of data and further parity data to form recreated further subsets of data; and e) combine the further subsets of data and any recreated further subsets of data to form an original set of data. In other words, the data may be cascaded to form further subsets and further parity data. The original data may be regenerated by reversing the cascade process generating any missing data subsets from parity data and the successfully received data.
  • According to a tenth aspect of the present invention there is provided a mobile handset comprising the previously described apparatus, i.e. a transmitter and/or a receiver.
  • Optionally, the data described above, i.e. the original data, transmitted data, voice data or data to be secured and stored, may be difference data relative to a reference data file and the method may further comprise the step of comparing the original data with the reference data file to obtain the difference data. This allows data to be stored or transmitted securely without requiring underlying data to leave a restricted or protected environment. This optional feature may be implemented in the methods or apparatus described above.
  • When the difference or delta data are retrieved from storage locations or received from a transmitter, in the form of combined subsets of data (or regenerated subsets where particular subsets have been lost, corrupted or are otherwise unavailable or fail authorisation), the processor may be further arranged or the method may include the step of applying the difference data to the reference data file to obtain underlying data.
  • The methods may be implemented in computer software running on a computing device, for instance. The software may be stored on media or transmitted as a signal. For example, the computing device or devices may be a desktop personal computer or server computer running a suitable operating system such as Windows®, Apple OS X or UNIX based systems. An example computing device may include a hard drive or other storage medium, input devices such as a keyboard and mouse and a display screen.
  • Optionally, the method steps may be carried out on a single machine, computer or group of computers connected to a network such as an intranet or the Internet.
  • BRIEF DESCRIPTION OF THE FIGURES
  • The present invention may be put into practice in a number of ways and embodiments will now be described by way of example only and with reference to the accompanying drawings, in which:
  • FIG. 1 shows a flowchart of a method for storing data, according to an aspect of the present invention given by way of example only;
  • FIG. 1 a shows a flowchart of an alternative method similar to that shown in FIG. 1;
  • FIG. 2 shows a schematic diagram of the data stored using the method of FIG. 1;
  • FIG. 2 a shows a schematic diagram of the data stored using the method of FIG. 1 a;
  • FIG. 3 shows a schematic diagram of data stored according to the method of FIG. 1;
  • FIG. 3 a shows a schematic diagram of data stored according to the method of FIG. 1 a;
  • FIG. 4 shows a schematic diagram of the data distributed as clusters stored following the method of FIG. 1;
  • FIG. 4 a shows a schematic diagram of the data distributed as cluster stored following the method of FIG. 1 a;
  • FIG. 5 shows a flow diagram of a method of storing data according to a further aspect of the present invention, given by way of example only;
  • FIG. 6 shows a schematic diagram of a network used to store data according to a further aspect of the present invention
  • FIG. 7 shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only;
  • FIG. 7 a shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only;
  • FIG. 8 shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only; and
  • FIG. 8 a shows a schematic diagram of a communication system according to a further aspect of the present invention, given by way of example only.
  • It should be noted that the figures are illustrated for simplicity and are not necessarily drawn to scale.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Data to be stored may be in the form of a binary file, for instance. The data may be divided into subsets of data. Parity data may be generated from the subsets of data in such a way that if one or more of the data subsets is destroyed or lost that missing subset may be recreated from the remaining subsets and parity data. Parity or control data is generated from the original data for the purpose of error checking or to enable lost data to be regenerated. However, the parity data does not contain any additional information over that contained by the original data. There are several logical operations that may achieve the generation of such parity data. For instance, applying an exclusive or (XOR) to two binary numbers results in a third binary number, which is the parity number. Should either of the original two binary numbers be lost then it may be recovered by simply performing an XOR between the remaining original number and the parity number. For a more detailed description of a calculation of parity data see http://www.pcguide.com/ref/hdd/perf/raid/concepts/genParity-c.html. Once the parity data has been calculated all of the data subsets and parity data may be stored in separate or remote file locations.
  • However, each of the data subsets or parity data may be separated into further subsets and further parity data may be generated in order to utilise any additional storage locations. In this way a cascade of data subsets may be created until all available storage locations are utilised or a predetermined limit in the number of locations is reached. The data may be recovered using a reverse process with any missing data subsets being regenerated or recreated from the remaining data subsets and parity data using a suitable regeneration calculation or algorithm. The reading process continues until the single original data is recovered.
  • In one alternative embodiment, authentication or hash codes may be associated with any of the data subsets and/or parity data for use in confirming the authenticity of the data subsets. Authentic data subsets will not have changed or altered deliberately or accidentally following creation of the data subset. This alternative embodiment or its variations are described as authentication embodiments throughout the text.
  • FIG. 1 shows a flow diagram of an example method 10 for storing data. The original data 20 is split into data subsets A and B in step 30. The data may be split into two equal parts, so that the subsets A and B are of equal size. Zero padding may be used to ensure equal sized subsets A and B. For example, additional zero bytes (or groups of bits) may be added to the end of subsets A and B before the parity data P is generated. After the data 20 has been split into subsets A and B an exclusive OR operation may be carried out on subsets A and B, at step 40, to generate parity data set P. Alternatively, the parity data P may be generated during the splitting or separation step 30.
  • In the authentication embodiment method shown as a flow diagram 10′ in FIG. 1 a, after the generation of data subsets A and B, a hashing function h(n) may be applied at step 45. This hashing function generates hash codes h(A) and h(B). The parity data P may also be hashed to generate hash code h(P). The hashing function may be chosen such that the computational power to perform it or compare resultant hash codes is acceptable or within system limitations. The hash function may be applied to subsets A, B and/or parity data P. A reduction in computer overhead may be made by not hashing one or more of the data subsets or parity data in any combination.
  • The resultant two data subsets A and B and parity data set P (and optional hash codes) may be stored at step 50. The subsets A and B and parity data may be stored in memory or a hard drive for instance. The method 10 may loop at this point. It is determined whether or not there are any further storage locations available or required at step 60. If there are then the method loops back to step 30 where any or each of the data subsets A, B and/or parity data P are further split into new subsets and a further parity data set. The loop continues with each data subset and parity data being for divided and generated until there are no further storage locations available and the method stops at step 70.
  • In the authentication embodiments, the hash or authentication codes may be stored together with the data subsets A and B and/or the parity data P, stored as header information or stored separately, perhaps in a dedicated hash library or store.
  • Where additional storage locations are available and further looping of the method occurs, the hash generation may be optionally differed until the lowest level of split data is reached, i.e. only the data which is actually stored rather than any intermediate data subsets. This provides improved efficiency.
  • In the non-authentication embodiment, the first iteration of the loop of method 10 results in three separate data files (A, B and P); two full iterations results in nine separate data files and three full iterations results in 27 separate data files. Alternatively, it may not be necessary to split each data subset to the same degree. Where there are many storage locations available the subsets may be split create further subsets until subsets of a predetermined minimum size are created. Further utilisation of storage locations may then alternatively involve simple duplication in order to improve resilience to data loss.
  • For the authentication embodiment shown in FIG. 1 a, three separate data files are generated (A, B and P) and three hash codes are generated (Ah, Bh and Ph).
  • With the data 20 being split into nine separate locations four of those datasets may be lost or corrupted (detectable via optional hash code comparison) leaving it still possible to always recreate the original data set 20. More than four may even be lost and still result in accurate regeneration of the original data set 20 but this cannot be guaranteed as it depends on which particular sets are lost.
  • The hash codes shown in FIG. 1 a, may be generated for all stored data files and/or parity data to ensure that corruption or adjustment of the data has not occurred.
  • FIG. 2 shows a schematic diagram of the data resulting from a single iteration of the method shown in FIG. 1. Like method steps have the same reference numerals. The original data set 20 is split byte-wise to generate data subset A and data subset B (i.e. block size of one byte). The exclusive OR operation generates parity data P. Where there are three separate storage locations available the method 10 would stop at this stage resulting in a data cluster 150 having three distributed discrete data subsets A, B and P.
  • FIG. 2 a shows an alternative schematic diagram of the data including the hash codes.
  • FIG. 3 shows the result of a further iteration of steps 30, 40 and 50 of method 10. In this case, nine separate storage locations are available and so each of the three data subsets A, B and P may be further split into three further data subsets each.
  • As shown in FIG. 3 a, in the authentication embodiment, the hash codes are only required for the lowest level of data subsets and/or parity data AA, AB, AP, BA, BB, BP, PA, PB and PP as these are the only files that will be stored for later regeneration, i.e. they require authentication when they are read to ensure authenticity.
  • The various hash codes may be generated for the lowest level data sets in the cascade.
  • This additional recursive splitting 230 results in data subset A being split to form further data subsets AA and further parity data AP. Similarly, data subset B may be split into BA and BB, which together may be used to form parity data BP. Parity data P may be split into PA, PB and PP. For this particular embodiment of the method each of the three data subsets have the same size. The nine separate data locations used to store each of these nine data subsets may form a second level cluster 250, which is shown in more detail as FIG. 4 (see FIG. 4 a for the authentication embodiment).
  • In other words, the first level cluster 150 has been expanded to form a second level cluster 250. There is therefore no need to store the original three data sets A, B and P (but this may be done anyway as an alternative method for additional resilience to data loss) as these may each be recreated from the nine data subsets in the second level cluster 250. The loop in the method 10 may be repeated as many times as necessary until all available storage locations are used or a predetermined limit is reached of the size of each subset has been reduced to a particular level.
  • FIG. 5 shows a schematic diagram of a system 300 used to store data according to the method 10 shown in FIG. 1. The system shown in FIG. 5 shows additional optional steps used to enhance the security and reliability of the system 300 according to the authentication embodiment. A central server 360 administers the method and receives a request from a user to enter the system 310. The user logs on and is provided with encryption keys 320. Furthermore, a set of hash-codes (which may be unique) may be generated at step 45, which serves as a unique identifier for the file, which may be used to guarantee authenticity. Encryption keys may be used to generate the hash codes. In this particular embodiment a file is being stored as data 20. A database 370 is used to store log-in information and encryption keys and also the name of files to be stored. The user registers with the database to create a file name at step 340 and the data file is split into subsets A and B and parity data P is created from these data subsets. Each of the data subsets and parity data are assigned an identifier at step 350, which is also administered by the database 370. Separate storage locations are accessible over a network and form a pool of available storage locations 380. The server 360 may determine the maximum level of recursive splitting to be achieved, which may be determined by predefined preferences or system parameters. The server 360 also monitors the availability of each individual separate storage location within the pool 380.
  • In this way, individual users may back-up particular files or their entire data storage system over any particular number of separate storage locations from an available pool 380. The server 360 may administer the storage as a processing layer invisible to the user. In other words, once they have accessed the system the storage of data appears to the user as conventional storage and retrieval. The original data 20 may be retrieved from the pool of storage locations 380 whilst any missing data may be regenerated using the parity data P. from any required data layer. The server 360 keeps track of the level of data cascading and each data subset. The server may also store and administer the hash codes, which may be stored separately or together with the data subsets and parity data.
  • Furthermore, the data subsets may be encrypted using the encryption keys and a tamper or distortion prevention facility may be incorporated using the hash-code. Therefore, the system 300 shown in FIG. 5 provides additional safety to the user storing sensitive information, as a third party having access to any or all of the individual separate storage locations within this storage pool 380 cannot recreate the original data 20 without the original encryption keys administered by the server 360. Alternatively, no encryption key may be required but there may be a prohibitive level of computing power needed to generate an altered data subset with the same hash code as the original. The encryption keys may also be used to encrypt the data subsets for added security. Intercepting the transfer of data subsets between the storage pool 380 and the user by a third party also does not result in any data becoming available to them without the encryption keys, or obtaining copies of at least a minimum number of data subsets.
  • A further embodiment of a system used to perform the method 10 is shown in FIG. 6. The system 400 shown in FIG. 6 may be used to distribute information securely over networks such as the Internet or an intranet. The Internet or subsets of web pages 420 may be distributed securely to a user machine 440 via a central server 410. The central server 410 takes the web pages 420 and stores them according to the method 10 shown in FIG. 1 within separate storage locations 430. The data subsets may be encrypted and/or hashed to provide authentication, as described with reference to FIG. 5. Central server 410 supplies the user machine 440 with a decryption code or codes and information to identify and locate data subsets from particular storage locations 430 and how to recreate the data forming the original web pages 420. Therefore, the web pages 420 are no longer prone to a single point of failure or attack (for instance, a single web server going down) as the original data 20 is distributed amongst separate storage locations 430. Furthermore, any third party intercepting the network traffic of the user computer 440 would not be able to decrypt or recreate the original data forming the web pages 420 without the decryption keys and regeneration information supplied by the central server 410.
  • Alteration may be detected by rehashing the data subsets and/or parity data and comparing the resultant hash code with that associated with the original. Where a difference is detected this data subset or parity data may be rejected and recreated using only authenticated data sets and/or parity data. Only data subsets that fail authentication by the hash codes (or are otherwise lost or unavailable) need to be recreated or regenerated.
  • Such a secure system may be suitable for banking transactions or other forms of secure data or where the system user requires additional privacy and security.
  • The central server 410 may be able to store or cache the entire available Internet or any particular individual websites and make these available only to particular subscribing users. The central server 410 may also perform the function of a search engine or other central consolidator of information. Querying the search engine in this way may render search results containing decryption keys and information used to locate and regenerate the websites or other retrievable documents.
  • A further use for such a storage system according to the authentication embodiment, is to store and recreate high quality media avoiding distortion and missing data. For instance, higher quality audio or video recordings may be obtained due to the high level of error checking used. Each data subset may be checked for authenticity (e.g. corruption) using the authentication or hash codes. Any data subset that fails this authentication test may be rejected and regenerated using the parity data and any data subsets that pass authentication (the parity data may also be checked).
  • For instance this storage method may be implemented on hard drives, optical discs such as CDs, DVDs and Blueray (RTM) and file encoding similar to MP3 and MPEG type encoding. The method may be used to generate higher quality multimedia files.
  • FIG. 7 shows a schematic diagram of a communication system. Two communication devices 500, 510 transmit and received data to and from each other. This may be via a communication network such as a cellular network or directly as in two-way radios. In the following example voice data is used as an illustration. However, many other types of data may also be transmitted and received such as for instance, video, web or Internet and data files.
  • As shown in FIG. 7, voice data is split into data subsets and parity data using a similar method to that described with respect to FIG. 1 for data storage. These data subsets A, B and parity data P are transmitted separately across individual channels C1, C2 and C3. These data sets may be transmitted according to other schemes together or separately and may be transmitted using different mediums, for instance a mixture of wireless, cable and fibre optic transmission. The splitting function may be carried out within the communication device 500 or within a transmission network facility such as a mobile base station or similar. A cellular telephone may be adapted by the additional of additional hardware to implement the described functions. Alternatively, the functions may be implemented as software.
  • As with the data storage embodiments, as an alternative authentication embodiment, hash codes may be generated from hash or other authentication functions and associated with the data subsets prior to transmission. This authentication embodiment is illustrated in FIG. 7 a.
  • Data subsets A and B may be combined to form the original voice data as a reverse of the splitting procedure. If either subsets A or B are lost, missing from the received transmission or fail a hashing match test then parity data P may be used to regenerate the missing data in a similar way to the retrieval of stored data described above. An eavesdropper receiving only one of channels C1, C2 or C3 will therefore not be able to reconstruct the voice data. Therefore, this provides a more secure as well as more reliable communication system and method. Security may be enhanced further by differing the mode, type or frequency of each channel. Integrity may be provided by the hash function authentication checks in the authentication embodiment shown in FIG. 7 a.
  • FIG. 8 shows a schematic diagram of a further embodiment similar to that shown in FIG. 7. However, this further embodiment implements a further cascade or layer of data splitting before transmission. A further level of recombination must be used to reconstruct the voice or other transmitted data. In the example shown in FIG. 8 this further cascade of data splitting and parity data generation requires nine channels to communicate each data subset and parity data. Such an additional cascade provides further resilience to data loss. The data transmitted from five of the channels may be lost with the data fully reconstructable (lossless). Further cascade may be achieved providing further resilience. Just as with the data storage example above, other numbers of channels of data may be used. For instance the data may be split three, four or five ways or more at each cascade. Further cascade levels may be implemented dependent on the required level of security or reliability. This further fills the available channel capacity but in so doing so reduces the power requirements of each channel to maintain the same probability of data loss (Shannon or noisy-channel coding theorem).
  • As shown in FIG. 8 a, the transmitted data subsets and/or parity data (lowest levels in the cascade) may any or each have the hash function applied to them. The hash codes may be transmitted to the receiver.
  • The communication system may also comprise an additional layer of security or functionality. The communication device 510 receiving the data may require information as to which data subsets and parity data are transmitted over which particular channels. In the example shown in FIGS. 8 and 8 a, channel C1 is used to transmit data subset AA, C2 is used for AB, etc, however, any combination may be used. Such information may be exchanged between communication devices 500, 510 before or during transmission, by for instance transmission of a code denoting a particular combination of channels and data subsets. The particular combination may vary during transmission and reception. This may be according to a prearranged or predetermined scheme or the particular current combination may be transmitted to keep the transmitter and receiver synchronised. Both communication devices 500, 510 may both transmit and receive simultaneously or in isolation.
  • As a further security precaution, the data may be stored or transmitted as difference or delta data relative to a reference file. Therefore, access to or knowledge of the reference file may be required in order to retrieve or receive the data.
  • This further security precaution may be used where there are practical or legal restrictions on transmitting or storing certain types or data. For instance, the storage of banking or confidential information may be restricted to a particular organisation or site. However, it may still be necessary to store these data such that the risk of their loss is reduced. Therefore, it may not be possible to distribute or transmit these types of data across different storage locations, as described previously, even using encryption. This problem may be addressed by instead transmitting and distributing the difference or delta data instead of the underlying data. In this situation, data protection requirements are met and the data may be secured against loss or corruption.
  • For example and as an illustration of this further alternative procedure, file A (or signal A) may be the underlying data required to be stored or transmitted. File B may be the reference file. A comparison of file A and file B may be made using a comparison function similar to UNIX diff, rdiff or rsync procedures to generate file C.
  • In a further alternative, the difference file may be generated by applying the XOR function to file A and file B, perhaps byte-wise, for example.
  • File C is therefore a representation or encoding of the difference between file A and file B; file A cannot be regenerated from file C without knowledge or access to file B. File B may take many different forms and may be a randomly generated string, a document, an audio file, a video file, the text of a book or any other known or generated data set, for example. The benefit of using a known data file (e.g. an MP3 file of a well known song) is that if the user's computer is lost, stolen or corrupted then the underlying data may be regenerated by acquiring a further copy of the known and publicly available reference file. The user must simply remember which particular file they used (perhaps a MP3 file of the user's favourite song). As there are millions of options to a user, security can remain relatively high even when a well-known data file is used.
  • In order to regenerate file A from file C, a function may be used to apply the difference or delta file C to the reference file B. Various methods may be used in for regenerating file A depending on how the difference or delta file C was generated and encoded. In the XOR example, a further XOR function may be applied to files C and B to regenerate file A. This may be done on a byte-by-byte basis, for example. It is likely that that files A and B will be of different sizes. Where file A is smaller than file B then the procedure may simply stop when each byte or file chunk has been compared. Where file A is larger than file B then multiple copies of file B may be used until each byte of file A has been compared. Other variations, difference procedures and comparison functions may be used.
  • Once the difference or delta file (or data stream) has been generated then this may be used as the original data described above and stored or transmitted (e.g. as voice data), accordingly. For the transmission and receiving embodiments, the difference data may be generated as a data stream, i.e. transmitted, received and encoded or decoded in real time. In other words, the difference data may be divided into data subsets with parity data generated so that these data subsets may be stored in a distributed way or transmitted according to the methods described above.
  • Where a data stream, in the form of difference data, is to be transmitted then the reference file (B) may again be used to sequentially encode the data stream in real-time. Should the data stream exceed the length of the reference file then the reference file may be reused until transmission ends. In voice communication, for example, each time transmission starts, the beginning of the reference file may be used for comparison with a digitised voice or audio data stream to generate the difference data stream. Alternatively, reuse may be reduced by continuing from the last point used in the reference file for each new transmission. This alternative may further improve security.
  • It should be noted that although separate embodiments have been described, features of these embodiments may be interchanged, especially regarding data manipulations. Furthermore, features described with respect to the transmission and reception embodiments may be used with the storage embodiments and visa versa.
  • As will be appreciated by the skilled person, details of the above embodiment may be varied without departing from the scope of the present invention, as defined by the appended claims.
  • For example, the data may be stored on many different types of storage medium such as hard disks, FLASH RAM, web servers, FTP servers and network file servers or a mixture of these. Although the files are described above as being split into two data subsets (A and B) and a single parity data block (P) during each iteration three (A, B and C), four (A-D) or more data subsets may be generated.
  • The parity data is described in the example as being generated from the XOR function but other functions may be used. For instance, Hamming, Reed-Solomon, Golay, Reed-Muller or other suitable error correcting codes may be used.
  • The data subsets maybe stored in physically separate or logically separate locations even within the same hard disk drive or cluster.
  • Many combinations, modifications, or alterations to the features of the above embodiments will be readily apparent to the skilled person and are intended to form part of the invention.

Claims (14)

1-65. (canceled)
66. A method of retrieving a data set, the method comprising:
recovering subsets of data and parity data from separate storage locations that are remote from each other and accessed over an external network;
recreating any missing subsets of data from the recovered subsets of data and recovered parity data to form recreated subsets of data, the recovered subsets of data, recovered parity data, and recreated subsets of data providing a second level cluster;
combining the recovered subsets of data and recreated subsets of data of the second level cluster to form a plurality of recombined subsets of data and recombined parity data to provide a first level cluster;
recreating any subsets of data missing from the first level cluster from the recombined subsets of data and the recombined parity data to form recreated subsets of data, the recreated subsets of data forming part of the first level cluster; and
combining the recombined subsets of data and recreated subsets of data of the first level cluster to form the data set.
67. The method of claim 66, wherein the separate physical locations are each located at separate physical devices.
68. The method of claim 66, wherein the original data is encrypted, the method further comprising decrypting the original data.
69. The method of claim 66, further comprising receiving location information of the separate storage locations.
70. The method of claim 66, wherein the separate storage locations comprise at least one of a hard disk drive, an optical disk, a FLASH RAM, a web server, an FTP server, or a network file server.
71. The method of claim 66, further comprising:
recovering authentication codes associated with one or more of the subsets of data and parity data;
authenticating one or more of the subsets of data and parity data using the associated authentication codes; and
recreating any subsets of data that fail authentication from the recovered subsets of data and parity data to form recreated subsets of data within the second level cluster.
72. The method of claim 71, wherein the authentication codes are hash codes and the authenticating comprises:
applying a hash function to the subsets of data and/or parity data to generate a comparison hash code; and
comparing the comparison hash code with the authentication codes associated with the data subsets and/or parity data.
73. An apparatus for retrieving a data set, the apparatus comprising a processor arranged to:
recover subsets of data and parity data from separate storage locations that are remote from each other and accessed over an external network;
recreate any missing subsets of data from the recovered subsets of data and recovered parity data to form recreated subsets of data, the recovered subsets of data, recovered parity data, and recreated subsets of data providing a second level cluster;
combine the recovered subsets of data and recreated subsets of data of the second level cluster to form a plurality of recombined subsets of data and recombined parity data to provide a first level cluster;
recreate any subsets of data missing from the first level cluster from the recombined subsets of data and the recombined parity data to form recreated subsets of data, the recreated subsets of data forming part of the first level cluster; and
combine the recombined subsets of data and any recreated subsets of data of the first level cluster to form the data set.
74. The apparatus of claim 73, wherein the processor is further arranged to:
recover authentication codes associated with the one or more of the recovered subsets of data and recovered parity data of the second level cluster;
authenticate one or more of the recovered subsets of data and recovered parity data using the associated authentication codes; and
recreate any recovered subsets of data that fail authentication from the recovered subsets of data and recovered parity data to form recreated subsets of data within the second level cluster.
75. The apparatus of claim 73, further comprising storage locations.
76. A data storage medium for storing a data file, the data storage medium comprising data subsets, parity data, and authentication codes, wherein:
the data subsets are combinable to produce further data subsets;
the further data subsets are combinable to produce the data file;
the authentication codes provide authentication of the data subsets; and
the parity data are combinable with the data subsets to regenerate missing data subsets or data subsets which fail authentication.
77. The data storage medium of claim 76, wherein the data storage medium comprises at least one of a compact disc, a DVD, a hard drive, a solid state drive, a FLASH memory, or a digital tape.
78. The data storage medium of claim 76, wherein the data file comprises at least one of multimedia data, audio data, video data, MPEG data, MP3 data, music data, database data, or binary data.
US14/696,375 2008-09-02 2015-04-25 Distributed storage and communication Abandoned US20150301893A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/696,375 US20150301893A1 (en) 2008-09-02 2015-04-25 Distributed storage and communication

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
GB0815959.2 2008-09-02
GB0815959A GB2463078B (en) 2008-09-02 2008-09-02 Distributed storage
GB0817804A GB2463085B (en) 2008-09-02 2008-09-29 Communication system
GB0817804.8 2008-09-29
GB0819211A GB2463087B (en) 2008-09-02 2008-10-20 Data storage and communication
GB0819211.4 2008-10-20
PCT/GB2009/002101 WO2010026366A1 (en) 2008-09-02 2009-09-01 Distributed storage and communication
US201113059348A 2011-02-16 2011-02-16
US14/696,375 US20150301893A1 (en) 2008-09-02 2015-04-25 Distributed storage and communication

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US13/059,348 Continuation US9026844B2 (en) 2008-09-02 2009-09-01 Distributed storage and communication
PCT/GB2009/002101 Continuation WO2010026366A1 (en) 2008-09-02 2009-09-01 Distributed storage and communication

Publications (1)

Publication Number Publication Date
US20150301893A1 true US20150301893A1 (en) 2015-10-22

Family

ID=39866115

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/059,348 Active 2031-03-18 US9026844B2 (en) 2008-09-02 2009-09-01 Distributed storage and communication
US14/696,375 Abandoned US20150301893A1 (en) 2008-09-02 2015-04-25 Distributed storage and communication

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/059,348 Active 2031-03-18 US9026844B2 (en) 2008-09-02 2009-09-01 Distributed storage and communication

Country Status (5)

Country Link
US (2) US9026844B2 (en)
EP (1) EP2340489B1 (en)
JP (2) JP2012501508A (en)
GB (3) GB2463078B (en)
WO (1) WO2010026366A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098315A1 (en) * 2014-10-07 2016-04-07 Airbus Operations Sas Device for managing the storage of data
EP3436949A4 (en) * 2016-07-29 2020-03-25 Hewlett-Packard Development Company, L.P. Data recovery with authenticity

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2463078B (en) 2008-09-02 2013-04-17 Extas Global Ltd Distributed storage
CN104079573A (en) * 2009-05-19 2014-10-01 安全第一公司 Systems and methods for securing data in the cloud
GB201003407D0 (en) * 2010-03-01 2010-04-14 Extas Global Ltd Distributed storage and communication
CA2795206C (en) 2010-03-31 2014-12-23 Rick L. Orsini Systems and methods for securing data in motion
US8874868B2 (en) * 2010-05-19 2014-10-28 Cleversafe, Inc. Memory utilization balancing in a dispersed storage network
GB2482112A (en) 2010-07-14 2012-01-25 Extas Global Ltd Distributed data storage and recovery
GB2555549A (en) * 2010-07-14 2018-05-02 Qando Services Inc Distributed data storage and recovery
EP3029592B1 (en) 2010-08-18 2018-07-25 Security First Corp. Systems and methods for securing virtual machine computing environments
GB2483222B (en) * 2010-08-24 2018-05-09 Qando Services Inc Accessing a web site
GB2484116B (en) * 2010-09-29 2018-01-17 Qando Services Inc Distributed electronic communication
KR101502895B1 (en) * 2010-12-22 2015-03-17 주식회사 케이티 Method for recovering errors from all erroneous replicas and the storage system using the method
KR101544485B1 (en) 2011-04-25 2015-08-17 주식회사 케이티 Method and apparatus for selecting a node to place a replica in cloud storage system
GB2492981B (en) * 2011-07-18 2014-03-26 Qando Service Inc Data reconstruction
GB201203557D0 (en) * 2012-02-29 2012-04-11 Qando Service Inc Electronic communication
WO2013151732A1 (en) 2012-04-06 2013-10-10 O'hare Mark S Systems and methods for securing and restoring virtual machines
WO2014066986A1 (en) * 2012-11-02 2014-05-08 Vod2 Inc. Data distribution methods and systems
US9075686B2 (en) * 2013-02-25 2015-07-07 GM Global Technology Operations LLC System and method to improve control module reflash time
JP6225731B2 (en) 2014-01-31 2017-11-08 富士通株式会社 Storage control device, storage system, and storage control method
WO2016081942A2 (en) 2014-11-21 2016-05-26 Security First Corp. Gateway for cloud-based secure storage
US10922292B2 (en) * 2015-03-25 2021-02-16 WebCloak, LLC Metamorphic storage of passcodes
US9760432B2 (en) * 2015-07-28 2017-09-12 Futurewei Technologies, Inc. Intelligent code apparatus, method, and computer program for memory
US20170104806A1 (en) * 2015-10-13 2017-04-13 Comcast Cable Communications, Llc Methods and systems for content stream coding
US10437480B2 (en) 2015-12-01 2019-10-08 Futurewei Technologies, Inc. Intelligent coded memory architecture with enhanced access scheduler
US11860819B1 (en) * 2017-06-29 2024-01-02 Amazon Technologies, Inc. Auto-generation of partition key
US10818314B1 (en) 2019-05-07 2020-10-27 International Business Machines Corporation Storing multiple instances of a housekeeping data set on a magnetic recording tape
DE102021000348A1 (en) 2020-02-14 2021-08-19 Sew-Eurodrive Gmbh & Co Kg Method and system for the transmission of data

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185368B1 (en) * 1996-07-29 2001-02-06 Sony Corporation Redundant disk array with real-time lost data reconstruction
US20010056500A1 (en) * 1998-02-10 2001-12-27 Digital Island, Inc. Optimized network resource location
US20020104037A1 (en) * 2001-01-26 2002-08-01 Dell Products L.P. Replaceable memory modules with parity-based data recovery
US6950966B2 (en) * 2001-07-17 2005-09-27 Seachange International, Inc. Data transmission from raid services
US7590801B1 (en) * 2004-02-12 2009-09-15 Netapp, Inc. Identifying suspect disks
US8015440B2 (en) * 2006-12-06 2011-09-06 Fusion-Io, Inc. Apparatus, system, and method for data storage using progressive raid
US8924720B2 (en) * 2012-09-27 2014-12-30 Intel Corporation Method and system to securely migrate and provision virtual machine images and content
US9218244B1 (en) * 2014-06-04 2015-12-22 Pure Storage, Inc. Rebuilding data across storage nodes

Family Cites Families (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301297A (en) * 1991-07-03 1994-04-05 Ibm Corp. (International Business Machines Corp.) Method and means for managing RAID 5 DASD arrays having RAID DASD arrays as logical devices thereof
US5432787A (en) 1994-03-24 1995-07-11 Loral Aerospace Corporation Packet data transmission system with adaptive data recovery method
JPH0816328A (en) 1994-06-28 1996-01-19 Mitsubishi Electric Corp Disk array system
JPH09297663A (en) * 1996-05-08 1997-11-18 Ekushingu:Kk Disk array device
KR100267366B1 (en) 1997-07-15 2000-10-16 Samsung Electronics Co Ltd Method for recoding parity and restoring data of failed disks in an external storage subsystem and apparatus therefor
US6243846B1 (en) * 1997-12-12 2001-06-05 3Com Corporation Forward error correction system for packet based data and real time media, using cross-wise parity calculation
US6353895B1 (en) 1998-02-19 2002-03-05 Adaptec, Inc. RAID architecture with two-drive fault tolerance
US6421803B1 (en) 1999-06-25 2002-07-16 Telefonaktiebolaget L M Ericsson (Publ) System and method for implementing hybrid automatic repeat request using parity check combining
JP2001195204A (en) * 2000-01-14 2001-07-19 Nippon Telegr & Teleph Corp <Ntt> Method for checking data validity and recording medium with recorded data validity check program
US6826711B2 (en) 2000-02-18 2004-11-30 Avamar Technologies, Inc. System and method for data protection with multidimensional parity
US6789077B1 (en) 2000-05-09 2004-09-07 Sun Microsystems, Inc. Mechanism and apparatus for web-based searching of URI-addressable repositories in a distributed computing environment
WO2001098952A2 (en) 2000-06-20 2001-12-27 Orbidex System and method of storing data to a recording medium
US20020032844A1 (en) 2000-07-26 2002-03-14 West Karlon K. Distributed shared memory management
KR100388498B1 (en) * 2000-12-30 2003-06-25 한국전자통신연구원 A Hierarchical RAID System Comprised of Multiple RAIDs
US6862692B2 (en) * 2001-01-29 2005-03-01 Adaptec, Inc. Dynamic redistribution of parity groups
US20020156974A1 (en) 2001-01-29 2002-10-24 Ulrich Thomas R. Redundant dynamically distributed file system
CN1494790A (en) 2001-03-28 2004-05-05 Cooperation method of transferring divided file under network environment
US8656246B2 (en) * 2001-04-16 2014-02-18 Qualcomm Incorporated Method and an apparatus for use of codes in multicast transmission
US6871263B2 (en) * 2001-08-28 2005-03-22 Sedna Patent Services, Llc Method and apparatus for striping data onto a plurality of disk drives
US7073115B2 (en) 2001-12-28 2006-07-04 Network Appliance, Inc. Correcting multiple block data loss in a storage array using a combination of a single diagonal parity group and multiple row parity groups
JP4685317B2 (en) * 2002-03-29 2011-05-18 株式会社富士通ソーシアルサイエンスラボラトリ Data distributed storage method, data distributed storage device, program, and backup site
JP2004086721A (en) 2002-08-28 2004-03-18 Nec Corp Data reproducing system, relay system, data transmission/receiving method, and program for reproducing data in storage
US20040177218A1 (en) 2002-11-06 2004-09-09 Meehan Thomas F. Multiple level raid architecture
US7188270B1 (en) 2002-11-21 2007-03-06 Adaptec, Inc. Method and system for a disk fault tolerance in a disk array using rotating parity
JP3951949B2 (en) * 2003-03-31 2007-08-01 日本電気株式会社 Distributed resource management system, distributed resource management method and program
JP4490068B2 (en) * 2003-09-22 2010-06-23 大日本印刷株式会社 Data storage system using network
TW200527223A (en) 2003-11-28 2005-08-16 Cpm S A Electronic computing system-on demand and method for dynamic access to digital resources
JP4485230B2 (en) 2004-03-23 2010-06-16 株式会社日立製作所 Migration execution method
GB2412760B (en) 2004-04-01 2006-03-15 Toshiba Res Europ Ltd Secure storage of data in a network
US7406621B2 (en) 2004-04-02 2008-07-29 Seagate Technology Llc Dual redundant data storage format and method
US7321905B2 (en) 2004-09-30 2008-01-22 International Business Machines Corporation System and method for efficient data recovery in a storage array utilizing multiple parity slopes
US7263583B2 (en) 2004-10-05 2007-08-28 International Business Machines Corporation On demand, non-capacity based process, apparatus and computer program to determine maintenance fees for disk data storage system
US7565569B2 (en) * 2004-10-22 2009-07-21 International Business Machines Corporation Data protection in a mass storage system
US7403945B2 (en) 2004-11-01 2008-07-22 Sybase, Inc. Distributed database system providing data and space management methodology
EP1825372A2 (en) * 2004-11-05 2007-08-29 Data Robotics Incorporated Dynamically expandable and contractible fault-tolerant storage system permitting variously sized storage devices and method
US7430701B2 (en) * 2005-06-16 2008-09-30 Mediatek Incorporation Methods and systems for generating error correction codes
US7577866B1 (en) 2005-06-27 2009-08-18 Emc Corporation Techniques for fault tolerant data storage
US8060648B2 (en) 2005-08-31 2011-11-15 Cable Television Laboratories, Inc. Method and system of allocating data for subsequent retrieval
US7574579B2 (en) 2005-09-30 2009-08-11 Cleversafe, Inc. Metadata management system for an information dispersed storage system
US9996413B2 (en) 2007-10-09 2018-06-12 International Business Machines Corporation Ensuring data integrity on a dispersed storage grid
JP2007122185A (en) * 2005-10-25 2007-05-17 Fujitsu Ltd Data storage method and data storage device
CA2629015A1 (en) * 2005-11-18 2008-05-08 Rick L. Orsini Secure data parser method and system
US7631143B1 (en) 2006-01-03 2009-12-08 Emc Corporation Data storage system employing virtual disk enclosure
EP1811378A2 (en) * 2006-01-23 2007-07-25 Xyratex Technology Limited A computer system, a computer and a method of storing a data file
JP2007299088A (en) * 2006-04-28 2007-11-15 Fujitsu Ltd Data protection system, method and program
WO2007133791A2 (en) 2006-05-15 2007-11-22 Richard Kane Data partitioning and distributing system
JP2008098894A (en) * 2006-10-11 2008-04-24 Kddi Corp System, method and program for managing information
US8230432B2 (en) 2007-05-24 2012-07-24 International Business Machines Corporation Defragmenting blocks in a clustered or distributed computing system
JP5274094B2 (en) 2007-06-04 2013-08-28 三菱電機株式会社 Communication system, transmission station, and communication method
US20090172244A1 (en) 2007-12-31 2009-07-02 Chaoyang Wang Hierarchical secondary raid stripe mapping
US8171379B2 (en) 2008-02-18 2012-05-01 Dell Products L.P. Methods, systems and media for data recovery using global parity for multiple independent RAID levels
GB2463078B (en) 2008-09-02 2013-04-17 Extas Global Ltd Distributed storage
US8458515B1 (en) * 2009-11-16 2013-06-04 Symantec Corporation Raid5 recovery in a high availability object based file system
US8762820B1 (en) * 2011-12-22 2014-06-24 Landis+Gyr Technologies, Llc Data communications via power line

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185368B1 (en) * 1996-07-29 2001-02-06 Sony Corporation Redundant disk array with real-time lost data reconstruction
US20010056500A1 (en) * 1998-02-10 2001-12-27 Digital Island, Inc. Optimized network resource location
US20020104037A1 (en) * 2001-01-26 2002-08-01 Dell Products L.P. Replaceable memory modules with parity-based data recovery
US6950966B2 (en) * 2001-07-17 2005-09-27 Seachange International, Inc. Data transmission from raid services
US7590801B1 (en) * 2004-02-12 2009-09-15 Netapp, Inc. Identifying suspect disks
US8015440B2 (en) * 2006-12-06 2011-09-06 Fusion-Io, Inc. Apparatus, system, and method for data storage using progressive raid
US8412979B2 (en) * 2006-12-06 2013-04-02 Fusion-Io, Inc. Apparatus, system, and method for data storage using progressive raid
US8924720B2 (en) * 2012-09-27 2014-12-30 Intel Corporation Method and system to securely migrate and provision virtual machine images and content
US9218244B1 (en) * 2014-06-04 2015-12-22 Pure Storage, Inc. Rebuilding data across storage nodes

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098315A1 (en) * 2014-10-07 2016-04-07 Airbus Operations Sas Device for managing the storage of data
US9672100B2 (en) * 2014-10-07 2017-06-06 Airbus Operations Sas Device for managing the storage of data
EP3436949A4 (en) * 2016-07-29 2020-03-25 Hewlett-Packard Development Company, L.P. Data recovery with authenticity

Also Published As

Publication number Publication date
EP2340489B1 (en) 2018-10-31
GB2463087B (en) 2013-07-31
JP5905068B2 (en) 2016-04-20
GB2463078B (en) 2013-04-17
GB0817804D0 (en) 2008-11-05
JP2012501508A (en) 2012-01-19
GB0819211D0 (en) 2008-11-26
US20110145638A1 (en) 2011-06-16
GB2463085B (en) 2013-04-17
GB0815959D0 (en) 2008-10-08
EP2340489A1 (en) 2011-07-06
US9026844B2 (en) 2015-05-05
WO2010026366A8 (en) 2011-04-14
GB2463078A (en) 2010-03-03
JP2015052806A (en) 2015-03-19
WO2010026366A1 (en) 2010-03-11
GB2463087A (en) 2010-03-03
GB2463085A (en) 2010-03-03

Similar Documents

Publication Publication Date Title
US9026844B2 (en) Distributed storage and communication
US20230351062A1 (en) Recovering data from encoded data slices interspersed with auxiliary data
US9203812B2 (en) Dispersed storage network with encrypted portion withholding and methods for use therewith
US9104691B2 (en) Securing data in a dispersed storage network using an encoding equation
US9819484B2 (en) Distributed storage network and method for storing and retrieving encryption keys
US11182247B2 (en) Encoding and storage node repairing method for minimum storage regenerating codes for distributed storage systems
US9009491B2 (en) Distributed storage network and method for encrypting and decrypting data using hash functions
US8601259B2 (en) Securing data in a dispersed storage network using security sentinel value
US11301592B2 (en) Distributed storage with data obfuscation and method for use therewith
US8744071B2 (en) Dispersed data storage system data encryption and encoding
US10447474B2 (en) Dispersed data storage system data decoding and decryption
EA031078B1 (en) Method and device for storing and processing data
US20130073901A1 (en) Distributed storage and communication
GB2482112A (en) Distributed data storage and recovery
Paul et al. Design of a secure and fault tolerant environment for distributed storage
GB2555549A (en) Distributed data storage and recovery

Legal Events

Date Code Title Description
AS Assignment

Owner name: QANDO SERVICES INC., VIRGIN ISLANDS, BRITISH

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXTAS GLOBAL LTD.;REEL/FRAME:035495/0458

Effective date: 20130410

Owner name: EXTAS GLOBAL LTD., VIRGIN ISLANDS, BRITISH

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SYRGABEKOV, ISKENDER;ZADAULY, YERKIN;LAUMULIN, CHOKAN;SIGNING DATES FROM 20110428 TO 20110502;REEL/FRAME:035495/0450

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION