US9639427B1 - Backing up data stored in a distributed database system - Google Patents

Backing up data stored in a distributed database system Download PDF

Info

Publication number
US9639427B1
US9639427B1 US12/277,754 US27775408A US9639427B1 US 9639427 B1 US9639427 B1 US 9639427B1 US 27775408 A US27775408 A US 27775408A US 9639427 B1 US9639427 B1 US 9639427B1
Authority
US
United States
Prior art keywords
nodes
data
database system
backup
data portion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/277,754
Inventor
Jeremy Davis
P. Keith Muller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Teradata US Inc
Teradata Corp
Original Assignee
Teradata US Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Teradata US Inc filed Critical Teradata US Inc
Priority to US12/277,754 priority Critical patent/US9639427B1/en
Assigned to TERADATA CORPORATION reassignment TERADATA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MULLER, P. KEITH, DAVIS, JEREMY
Application granted granted Critical
Publication of US9639427B1 publication Critical patent/US9639427B1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • a database system provides a central repository of data that can be easily accessible by one or more users.
  • a database system can be a parallel or distributed database system that has a number of nodes, where each node is associated with a corresponding storage subsystem. Data is distributed across the storage subsystems of the associated multiple nodes. Upon receiving a query for data, the distributed database system is able to retrieve responsive data that is distributed across the nodes and return an answer set in response to the query.
  • the individual nodes of the distributed database system process the query independently to retrieve the portion of the answer set that is owned by the corresponding node.
  • a benefit offered by many distributed database systems is that the originator of the request can make a database-wide query and not be concerned about the physical location of the data in the distributed database system.
  • Different portions of the answer set are typically gathered at the nodes of the distributed database system, with the different portions collectively making up the complete answer set that is provided to the originator of the request.
  • the node-to-node transfer of data is performed over a database system interconnect that connects the nodes.
  • Substantial node-to-node communications between the multiple nodes of the distributed database system during a backup or archive operation can result in significant consumption of the database system interconnect bandwidth, which reduces the bandwidth available to satisfy normal database query operations.
  • a backup utility is configured with information regarding locations of data stored in the distributed database system having a plurality of nodes.
  • the backup utility retrieves, based on the information regarding locations of data stored in the distributed database system, backup data from the plurality of nodes for backup storage.
  • FIG. 1 is a block diagram of an exemplary distributed or parallel database system in which an embodiment of the invention is incorporated.
  • FIG. 2 is a flow diagram of a process of backing up data, according to an embodiment.
  • a technique of backing up data stored in a parallel or distributed database system involves configuring a backup utility with information regarding locations of data stored in the distributed database system. According to the information regarding locations of data, backup data can be retrieved from at least some of the nodes of the distributed database system in an intelligent manner that avoids unnecessary communication of backup data over a database interconnect that connects the nodes of the distributed database system. In some embodiments, distinct sessions or connections are established between the backup utility and each of the nodes of the database system for the purpose of transporting backup data.
  • a “backup utility” refers to a module (implemented with software or a combination of software and hardware) that manages backing up of data in the database system.
  • backing up data refers to storing a copy of the data to provide redundancy in case of failure of the primary data. “Backing up” data can also mean archiving data, which involves moving the data from a primary storage location to an alternative storage location (the archived data no longer resides in the primary storage location, but instead is moved to the alternative location).
  • a parallel or distributed database system refers to a database system that has multiple nodes (which are distinct processing elements) that are able to store and retrieve data in corresponding distinct storage subsystems such that the writing or reading of data can be performed in parallel for improved throughput.
  • Establishing a “session” or “connection” between a backup utility and each of the nodes of the distributed database system refers to establishing a separately identifiable flow of data between the backup utility and the nodes; in other words, establishing multiple sessions or connections between the backup utility and the nodes means that multiple distinctly identifiable flows of data are possible.
  • the backup utility is run on a backup server.
  • an instance of the backup utility can be run on each of the nodes of the distributed database system. Based on the information regarding locations of data, each backup utility instance is able to retrieve the relevant subset of the backup data located at the corresponding node, such that unnecessary communication of backup data over the database interconnect can be avoided.
  • FIG. 1 illustrates an exemplary distributed database system 100 that has multiple nodes 102 that are interconnected by a database interconnect 104 .
  • the interconnect can be implemented with wired or wireless links (e.g., conductive traces, cables, radio frequency carriers, etc.).
  • Each node 102 is connected to a corresponding database storage subsystem 106 .
  • Data stored in the database system 100 can be distributed across the database storage subsystems 106 for parallel access (read or write).
  • Each node 102 includes a database processing module 108 that is able to receive a database query from a client 110 over a network 112 , which can be a local network or a public network (e.g., the Internet).
  • a database query which can be a Structured Query Language (SQL) query, received by a database processing module 108 can be forwarded to multiple nodes 102 for the multiple nodes 102 to independently process the query.
  • Each database node 102 can then retrieve or write the corresponding data in the respective database storage subsystem 106 .
  • the nodes 102 can provide data over the database interconnect 104 .
  • there can be communication of data between the various nodes 102 for the purpose of gathering the data for provision in a complete answer set that can be provided back to the client 110 .
  • the node-to-node communication of backup data is reduced.
  • this is accomplished by establish distinct backup sessions between database system nodes 102 (identified based on information 129 regarding locations of data in the distributed database system) and a backup utility 114 , which can be executable in a backup server 116 , as illustrated in the example of FIG. 1 .
  • Separate backup sessions are illustrated by dashed lines 150 , 152 , and 154 , which are sessions between the backup utility 114 and the corresponding nodes 102 .
  • backup data is communicated separately from each of the database storage subsystems 106 to the backup utility 114 (through corresponding nodes 102 ).
  • backup data is transported from database storage subsystem A through node A (in session 150 ) to the backup utility 114 .
  • backup data from database storage subsystem B is communicated through node B to the backup utility 114
  • backup data from database storage subsystem C is communicated through node C to the backup utility 114 .
  • Backup data communicated to the backup utility 114 is stored by the backup utility 114 as backup data 128 in a backup storage subsystem 130 .
  • the communication of backup data through the nodes 102 is controlled by corresponding backup processing modules 118 that are executable in corresponding nodes 102 .
  • the database processing module 108 and backup processing module 118 are software modules that are executable on one or more central processing units (CPUs) 120 in each respective node 102 .
  • CPUs central processing units
  • Each CPU 120 can be connected to a corresponding memory 122 .
  • the backup utility in the backup server 116 can be executable on one or more CPUs 124 in the backup server 116 .
  • the CPU(s) 124 can be connected to a memory 126 in the backup server 116 .
  • the backup utility 114 does not have to rely on database processing modules 108 in the database nodes 102 to retrieve backup data.
  • the communication of backup data can be provided over the same database interconnect 104 as for primary traffic during normal database operations.
  • a dedicated backup data communication path (separate from the primary database system interconnect) can be provided for transporting backup data to the backup utility 114 .
  • backup utility 114 instead of providing the backup utility 114 in the backup server 116 that is separate from the database system 100 , it is noted that an instance of the backup utility 114 can be provided in each of the nodes 102 .
  • the backup server 116 can be omitted, with direct input/output (I/O) used for writing backup data to the backup storage subsystem 130 .
  • Each backup utility instance can then retrieve the relevant subset of backup data at the corresponding node based on the information 129 relating to locations of data.
  • the backup utility 114 is configured with knowledge of locations of data stored in the database system 100 . Such knowledge can be provided in the form of the information 129 regarding locations of data in the database system 100 .
  • the information 129 can be created based on information provided by the database processing modules 108 that execute in the nodes 102 . Using the information 129 , the backup utility 114 knows where data is stored in the database system 100 , such that the backup utility 114 can establish corresponding sessions for transporting backup data from the database system 100 to the backup utility 114 .
  • the backup utility 114 does not have to rely upon the database processing modules 108 in the database nodes 102 for gathering and collecting the backup data into a complete set for communication to the backup utility 114 .
  • the process of collecting and gathering data by the database processing modules 108 would involve node-to-node communication of backup data over the database interconnect 104 , which would consume valuable database interconnect bandwidth.
  • FIG. 2 is a flow diagram of a general process according to one embodiment performed by the database system 100 and/or backup server 116 of FIG. 1 .
  • the backup utility 114 is configured (at 202 ) with information ( 129 ) regarding locations of data in the distributed database system.
  • the configuring of the backup utility 114 can involve the backup utility 114 receiving updates of data locations from the database processing modules 108 in the database system 100 .
  • the backup utility 114 receives (at 204 ) a request to back up data (e.g., copy a portion of the data stored in the database system 100 to a backup location for redundancy, move a portion of the data in the database system 100 to an alternative location for archiving purposes, and so forth).
  • the request to back up data may be received from a remote console (e.g., computer of a database administrator), or at a control interface of the backup server 116 .
  • the request to back up data can he an automatically generated request that is provided periodically or in response to certain predefined events.
  • the backup utility 114 determines (at 206 ) locations of data to be backed up based on the location information ( 129 ). Based on such determination, the backup utility 114 then identifies (at 208 ) the nodes that store the data that is to be backed up.
  • the backup utility 114 then establishes (at 210 ) distinct backup sessions with the identified nodes.
  • the backup data is then transported (at 212 ) in the distinct backup sessions from the corresponding nodes 102 to the backup utility 114 .
  • the backup utility stores (at 214 ) the backup data in the backup storage subsystem 130 .
  • the backup utility 114 can instead create an instance on each of the plurality of nodes of the distributed database system. Then, in response to a request to backup data, each backup utility instance can access the information 129 regarding locations of data to retrieve the corresponding subset of backup data, while reducing or minimizing communication of backup data over the database interconnect.
  • the various tasks discussed above can be performed by software (e.g., backup utility 114 , backup processing module 118 , and database processing module 108 ). Instructions of such software are loaded for execution on a processor (such as CPUs 120 or 124 in FIG. 1 ).
  • the processor includes microprocessors, microcontrollers, processor modules or subsystems (including one or more microprocessors or microcontrollers), or other control or computing devices.
  • a “processor” can refer to a single component or to plural components.
  • Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media.
  • the storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs).
  • DRAMs or SRAMs dynamic or static random access memories
  • EPROMs erasable and programmable read-only memories
  • EEPROMs electrically erasable and programmable read-only memories
  • flash memories magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape
  • optical media such as compact disks (CDs) or digital video disks (DVDs).

Abstract

To back up data stored in a distributed database system, a backup utility is configured with information regarding locations of data stored in the distributed database system having a plurality of nodes. The backup utility retrieves, based on the information regarding locations of data stored in the distributed database system, backup data from the plurality of nodes for backup storage.

Description

BACKGROUND
A database system provides a central repository of data that can be easily accessible by one or more users. For enhanced performance, a database system can be a parallel or distributed database system that has a number of nodes, where each node is associated with a corresponding storage subsystem. Data is distributed across the storage subsystems of the associated multiple nodes. Upon receiving a query for data, the distributed database system is able to retrieve responsive data that is distributed across the nodes and return an answer set in response to the query.
The individual nodes of the distributed database system process the query independently to retrieve the portion of the answer set that is owned by the corresponding node. A benefit offered by many distributed database systems is that the originator of the request can make a database-wide query and not be concerned about the physical location of the data in the distributed database system. Different portions of the answer set are typically gathered at the nodes of the distributed database system, with the different portions collectively making up the complete answer set that is provided to the originator of the request. There can be a substantial amount of node-to-node transfers of data as the different portions of the answer set are collected at various nodes of the database system. The node-to-node transfer of data is performed over a database system interconnect that connects the nodes.
Although such approach is efficient when retrieving data in response to queries during normal database operations, such an approach may not be efficient when backing up or archiving data that is stored in the distributed database system. Substantial node-to-node communications between the multiple nodes of the distributed database system during a backup or archive operation can result in significant consumption of the database system interconnect bandwidth, which reduces the bandwidth available to satisfy normal database query operations.
SUMMARY
In general, a backup utility is configured with information regarding locations of data stored in the distributed database system having a plurality of nodes. The backup utility retrieves, based on the information regarding locations of data stored in the distributed database system, backup data from the plurality of nodes for backup storage.
Other or alternative features will become apparent from the following description, from the drawings, and from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an exemplary distributed or parallel database system in which an embodiment of the invention is incorporated.
FIG. 2 is a flow diagram of a process of backing up data, according to an embodiment.
DETAILED DESCRIPTION
A technique of backing up data stored in a parallel or distributed database system involves configuring a backup utility with information regarding locations of data stored in the distributed database system. According to the information regarding locations of data, backup data can be retrieved from at least some of the nodes of the distributed database system in an intelligent manner that avoids unnecessary communication of backup data over a database interconnect that connects the nodes of the distributed database system. In some embodiments, distinct sessions or connections are established between the backup utility and each of the nodes of the database system for the purpose of transporting backup data. A “backup utility” refers to a module (implemented with software or a combination of software and hardware) that manages backing up of data in the database system. As used here, “backing up” data refers to storing a copy of the data to provide redundancy in case of failure of the primary data. “Backing up” data can also mean archiving data, which involves moving the data from a primary storage location to an alternative storage location (the archived data no longer resides in the primary storage location, but instead is moved to the alternative location).
A parallel or distributed database system refers to a database system that has multiple nodes (which are distinct processing elements) that are able to store and retrieve data in corresponding distinct storage subsystems such that the writing or reading of data can be performed in parallel for improved throughput. Establishing a “session” or “connection” between a backup utility and each of the nodes of the distributed database system refers to establishing a separately identifiable flow of data between the backup utility and the nodes; in other words, establishing multiple sessions or connections between the backup utility and the nodes means that multiple distinctly identifiable flows of data are possible.
By establishing distinct sessions based on the information regarding locations of data for the purpose of transporting backup data between each of at least some of the nodes and the backup utility, unnecessary node-to-node transfers of backup data can be avoided, such that database system interconnect bandwidth is not unnecessarily consumed by such node-to-node communications of backup data.
In the embodiment above in which distinct sessions are established to retrieve backup data, the backup utility is run on a backup server. In an alternative embodiment, an instance of the backup utility can be run on each of the nodes of the distributed database system. Based on the information regarding locations of data, each backup utility instance is able to retrieve the relevant subset of the backup data located at the corresponding node, such that unnecessary communication of backup data over the database interconnect can be avoided.
FIG. 1 illustrates an exemplary distributed database system 100 that has multiple nodes 102 that are interconnected by a database interconnect 104. The interconnect can be implemented with wired or wireless links (e.g., conductive traces, cables, radio frequency carriers, etc.). Each node 102 is connected to a corresponding database storage subsystem 106. Data stored in the database system 100 can be distributed across the database storage subsystems 106 for parallel access (read or write).
Each node 102 includes a database processing module 108 that is able to receive a database query from a client 110 over a network 112, which can be a local network or a public network (e.g., the Internet). A database query, which can be a Structured Query Language (SQL) query, received by a database processing module 108 can be forwarded to multiple nodes 102 for the multiple nodes 102 to independently process the query. Each database node 102 can then retrieve or write the corresponding data in the respective database storage subsystem 106. In the example of a read query, the nodes 102 can provide data over the database interconnect 104. In the process of retrieving the distributed data, there can be communication of data between the various nodes 102 for the purpose of gathering the data for provision in a complete answer set that can be provided back to the client 110.
However, in accordance with some embodiments, to avoid unnecessary consumption of the database interconnect bandwidth during backup operations, the node-to-node communication of backup data is reduced. In one embodiment, this is accomplished by establish distinct backup sessions between database system nodes 102 (identified based on information 129 regarding locations of data in the distributed database system) and a backup utility 114, which can be executable in a backup server 116, as illustrated in the example of FIG. 1. Separate backup sessions are illustrated by dashed lines 150, 152, and 154, which are sessions between the backup utility 114 and the corresponding nodes 102.
As illustrated in the example of FIG. 1, backup data is communicated separately from each of the database storage subsystems 106 to the backup utility 114 (through corresponding nodes 102). For example, in FIG. 1, backup data is transported from database storage subsystem A through node A (in session 150) to the backup utility 114. Similarly, backup data from database storage subsystem B is communicated through node B to the backup utility 114, and backup data from database storage subsystem C is communicated through node C to the backup utility 114. Backup data communicated to the backup utility 114 is stored by the backup utility 114 as backup data 128 in a backup storage subsystem 130.
The communication of backup data through the nodes 102 is controlled by corresponding backup processing modules 118 that are executable in corresponding nodes 102. The database processing module 108 and backup processing module 118 are software modules that are executable on one or more central processing units (CPUs) 120 in each respective node 102. Each CPU 120 can be connected to a corresponding memory 122. Similarly, the backup utility in the backup server 116 can be executable on one or more CPUs 124 in the backup server 116. The CPU(s) 124 can be connected to a memory 126 in the backup server 116.
By using techniques according to some embodiments the backup utility 114 does not have to rely on database processing modules 108 in the database nodes 102 to retrieve backup data.
In some embodiments, the communication of backup data can be provided over the same database interconnect 104 as for primary traffic during normal database operations. In an alternative embodiment, a dedicated backup data communication path (separate from the primary database system interconnect) can be provided for transporting backup data to the backup utility 114.
In an alternative embodiment, instead of providing the backup utility 114 in the backup server 116 that is separate from the database system 100, it is noted that an instance of the backup utility 114 can be provided in each of the nodes 102. In such an embodiment, the backup server 116 can be omitted, with direct input/output (I/O) used for writing backup data to the backup storage subsystem 130. Each backup utility instance can then retrieve the relevant subset of backup data at the corresponding node based on the information 129 relating to locations of data.
As noted above, the backup utility 114 according to some embodiments is configured with knowledge of locations of data stored in the database system 100. Such knowledge can be provided in the form of the information 129 regarding locations of data in the database system 100. The information 129 can be created based on information provided by the database processing modules 108 that execute in the nodes 102. Using the information 129, the backup utility 114 knows where data is stored in the database system 100, such that the backup utility 114 can establish corresponding sessions for transporting backup data from the database system 100 to the backup utility 114. In other words, the backup utility 114 does not have to rely upon the database processing modules 108 in the database nodes 102 for gathering and collecting the backup data into a complete set for communication to the backup utility 114. The process of collecting and gathering data by the database processing modules 108 would involve node-to-node communication of backup data over the database interconnect 104, which would consume valuable database interconnect bandwidth.
FIG. 2 is a flow diagram of a general process according to one embodiment performed by the database system 100 and/or backup server 116 of FIG. 1. The backup utility 114 is configured (at 202) with information (129) regarding locations of data in the distributed database system. The configuring of the backup utility 114 can involve the backup utility 114 receiving updates of data locations from the database processing modules 108 in the database system 100.
Next, the backup utility 114 receives (at 204) a request to back up data (e.g., copy a portion of the data stored in the database system 100 to a backup location for redundancy, move a portion of the data in the database system 100 to an alternative location for archiving purposes, and so forth). The request to back up data may be received from a remote console (e.g., computer of a database administrator), or at a control interface of the backup server 116. Alternatively, the request to back up data can he an automatically generated request that is provided periodically or in response to certain predefined events.
Next, the backup utility 114 determines (at 206) locations of data to be backed up based on the location information (129). Based on such determination, the backup utility 114 then identifies (at 208) the nodes that store the data that is to be backed up.
The backup utility 114 then establishes (at 210) distinct backup sessions with the identified nodes. The backup data is then transported (at 212) in the distinct backup sessions from the corresponding nodes 102 to the backup utility 114. Upon receipt of the backup data, the backup utility stores (at 214) the backup data in the backup storage subsystem 130.
In an alternative embodiment, instead of establishing distinct sessions between the backup utility 114 running on the backup server 116, the backup utility 114 can instead create an instance on each of the plurality of nodes of the distributed database system. Then, in response to a request to backup data, each backup utility instance can access the information 129 regarding locations of data to retrieve the corresponding subset of backup data, while reducing or minimizing communication of backup data over the database interconnect.
The various tasks discussed above can be performed by software (e.g., backup utility 114, backup processing module 118, and database processing module 108). Instructions of such software are loaded for execution on a processor (such as CPUs 120 or 124 in FIG. 1). The processor includes microprocessors, microcontrollers, processor modules or subsystems (including one or more microprocessors or microcontrollers), or other control or computing devices. A “processor” can refer to a single component or to plural components.
Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs).
While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.

Claims (18)

What is claimed is:
1. A method of backing up data stored in a distributed database system, comprising:
configuring a backup utility in a backup server with information regarding locations of data stored in the distributed database system having a plurality of nodes, wherein the distributed database system is configured to respond to a database query by gathering data from multiple ones of the plurality of nodes and providing the gathered data in an answer set responsive to the database query, wherein gathering the data comprises communication of a subset of the data between at least two of the multiple nodes;
receiving, by the backup utility, a request for backup storage of a data portion;
in response to the request, establishing, between the backup utility and respective ones of the plurality of nodes of the distributed database system, distinct corresponding sessions;
retrieving, by the backup utility based on the information regarding locations of data stored in the distributed database system, the data portion from the plurality of nodes for processing the request, wherein retrieving the data portion comprises retrieving the data portion for processing the request from the plurality of nodes in respective ones of the distinct sessions; and
communicating, by the backup utility, the retrieved data portion to a backup storage subsystem separate from the distributed database system, the communicating causing storing of the retrieved data portion at the backup storage subsystem to provide a backup copy of the data portion.
2. The method of claim 1, wherein establishing the distinct sessions comprises establishing the distinct sessions based on the information regarding the locations of the data stored in the distributed database system.
3. The method of claim 1, wherein retrieving the data portion for processing the request for backup storage from the plurality of nodes in respective ones of the distinct sessions comprises communicating the data portion from the plurality of nodes in respective ones of the distinct sessions to the backup utility over a database system interconnect, wherein the database system interconnect interconnects the plurality of nodes.
4. The method of claim 3, further comprising:
communicating data responsive to the database query over the database system interconnect to provide the answer set to a client computer in response to the database query, the client computer different from the backup server.
5. The method of claim 1, wherein the gathered data is communicated over a database system interconnect interconnecting the plurality of nodes,
wherein retrieving the data portion for processing the request for backup storage from the plurality of nodes in respective ones of the distinct sessions to the backup utility is over another interconnect separate from the database system interconnect.
6. The method of claim 1, wherein the distributed database system includes additional nodes in addition to the plurality of nodes, the method further comprising:
in response to the request, identifying, based on the information regarding locations of data, the plurality of nodes that are less than all nodes in the distributed database system, wherein the identified plurality of nodes are computer nodes that contain the data portion to be backed up for the request.
7. An article comprising at least one non-transitory computer-readable storage medium containing instructions that when executed cause a backup utility executable on a backup server including at least one processor to:
receive a single request to back up a data portion stored in a distributed database system having a plurality of nodes, each of the plurality of nodes including a central processing unit (CPU), wherein the distributed database system is configured to respond to a database query by gathering data from multiple ones of the plurality of nodes and providing the gathered data in an answer set responsive to the database query, wherein gathering the data comprises communicating a subset of the data between at least two of the multiple nodes;
in response to the single request to back up the data portion, access information identifying locations of data in the distributed database system;
identify nodes from among the plurality of nodes of the distributed database system that contain the data portion to be backed up for the single request;
establish, between the backup utility and respective ones of the identified nodes of the distributed database system, distinct corresponding sessions;
retrieve the data portion from the identified nodes for processing the single request, wherein retrieving the data portion comprises retrieving the data portion for processing the request from the identified nodes in respective ones of the distinct sessions; and
communicate the retrieved data portion to a backup storage subsystem separate from the distributed database system, the communicating causing storing of the retrieved data portion at the backup storage subsystem, to provide a backup copy of the data portion stored in the distributed database system.
8. The article of claim 7, wherein plural parts of the data portion are retrieved from corresponding ones of the identified nodes in the distinct corresponding sessions.
9. The article of claim 8, wherein the instructions when executed cause the backup utility to further receive the plural parts of the data portion in the distinct sessions over a database system interconnect that connects the plurality of nodes of the distributed database system.
10. The article of claim 8, wherein the backup server is separate from the distributed database system.
11. The article of claim 8, wherein the instructions when executed cause the backup utility to further run an instance of the backup utility on each of the plurality of nodes of the distributed database system to retrieve the data portion.
12. A system comprising:
a distributed database system having:
a plurality of nodes each including a central processing unit (CPU), and
storage subsystems associated with the plurality of nodes, wherein the distributed database system is configured to respond to a database query by gathering data from multiple ones of the plurality of nodes and providing the gathered data in an answer set responsive to the database query, wherein gathering the data comprises communication of a subset of the data between at least two of the multiple nodes;
at least one processor; and
a backup utility executable on the at least one processor to:
receive a single request for backup storage of a data portion;
in response to the single request, access information regarding locations of data stored in the distributed database system;
in response to the single request, establish distinct backup sessions between the backup utility and at least some nodes of the plurality of nodes, the at least some nodes identified based on the information regarding locations of data; and
in response to the single request, retrieve, based on the information regarding locations of data stored in the distributed database system, the data portion in the distinct backup sessions from corresponding ones of the at least some nodes for storing in a backup storage subsystem.
13. The database system of claim 12, wherein the data portion of the single request comprises a copy of data in the database system to be stored at the backup storage subsystem.
14. The database system of claim 12, wherein the data portion of the single request comprises archived data to be stored at the backup storage subsystem.
15. The article of claim 7, wherein the identifying of the nodes is based on the information identifying locations of data in the distributed database subsystem.
16. The method of claim 1, wherein retrieving the data portion comprises retrieving plural parts of the data portion from the corresponding plurality of nodes in the distinct corresponding sessions.
17. The method of claim 1, wherein the backup server is separate from the distributed database system.
18. The system of claim 12, wherein the backup utility is executable to further communicate the retrieved data portion to the backup storage subsystem that is separate from the distributed database system, the communicating causing storing of the retrieved data portion at the backup storage subsystem, to provide a backup copy of the data portion stored in the distributed database system.
US12/277,754 2008-11-25 2008-11-25 Backing up data stored in a distributed database system Active 2035-01-02 US9639427B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/277,754 US9639427B1 (en) 2008-11-25 2008-11-25 Backing up data stored in a distributed database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/277,754 US9639427B1 (en) 2008-11-25 2008-11-25 Backing up data stored in a distributed database system

Publications (1)

Publication Number Publication Date
US9639427B1 true US9639427B1 (en) 2017-05-02

Family

ID=58615660

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/277,754 Active 2035-01-02 US9639427B1 (en) 2008-11-25 2008-11-25 Backing up data stored in a distributed database system

Country Status (1)

Country Link
US (1) US9639427B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741890A (en) * 2017-10-16 2018-02-27 郑州云海信息技术有限公司 A kind of backup management method and device
US10824510B1 (en) * 2017-10-02 2020-11-03 EMC IP Holding Company LLC Environment independent data protection in a highly available data system

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5899998A (en) * 1995-08-31 1999-05-04 Medcard Systems, Inc. Method and system for maintaining and updating computerized medical records
US6023710A (en) * 1997-12-23 2000-02-08 Microsoft Corporation System and method for long-term administration of archival storage
US6154852A (en) * 1998-06-10 2000-11-28 International Business Machines Corporation Method and apparatus for data backup and recovery
US20040117438A1 (en) * 2000-11-02 2004-06-17 John Considine Switching system
US20050131740A1 (en) * 2003-12-10 2005-06-16 Geoage, Incorporated Management tool for health care provider services
US20060026219A1 (en) * 2004-07-29 2006-02-02 Orenstein Jack A Metadata Management for fixed content distributed data storage
US20060129940A1 (en) * 2000-12-11 2006-06-15 Microsoft Corporation User interface for managing multiple network resources
US20060271601A1 (en) * 2005-05-24 2006-11-30 International Business Machines Corporation System and method for peer-to-peer grid based autonomic and probabilistic on-demand backup and restore
US20070073791A1 (en) * 2005-09-27 2007-03-29 Computer Associates Think, Inc. Centralized management of disparate multi-platform media
US20070100913A1 (en) * 2005-10-12 2007-05-03 Sumner Gary S Method and system for data backup
US20070124349A1 (en) * 1999-12-20 2007-05-31 Taylor Kenneth J Method and apparatus for storage and retrieval of very large databases using a direct pipe
US20070192552A1 (en) * 2006-02-16 2007-08-16 International Business Machines Corporation Dynamically determining and managing a set of target volumes for snapshot operation
US20070214196A1 (en) * 2006-03-08 2007-09-13 International Business Machines Coordinated federated backup of a distributed application environment
US7330997B1 (en) * 2004-06-03 2008-02-12 Gary Odom Selective reciprocal backup
US20080126445A1 (en) * 2003-06-06 2008-05-29 Eric Michelman Method and system for reciprocal data backup
US20080177994A1 (en) * 2003-01-12 2008-07-24 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US20080235299A1 (en) * 2007-03-21 2008-09-25 International Business Machines Corporation Determining which user files to backup in a backup system
US7469274B1 (en) * 2003-12-19 2008-12-23 Symantec Operating Corporation System and method for identifying third party copy devices
US20090150431A1 (en) * 2007-12-07 2009-06-11 Sap Ag Managing relationships of heterogeneous objects
US20090249005A1 (en) * 2008-03-27 2009-10-01 International Business Machines Corporation System and method for providing a backup/restore interface for third party hsm clients
US20090300137A1 (en) * 2008-05-29 2009-12-03 Research In Motion Limited Method, system and devices for communicating between an internet browser and an electronic device
US20100021127A1 (en) * 2008-07-23 2010-01-28 Hiroshi Saito Data control apparatus, data backup apparatus, and recording medium
US7752169B2 (en) * 2002-06-04 2010-07-06 International Business Machines Corporation Method, system and program product for centrally managing computer backups
US8412822B1 (en) * 2004-01-27 2013-04-02 At&T Intellectual Property Ii, L.P. Optimized job scheduling and execution in a distributed computing grid

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5899998A (en) * 1995-08-31 1999-05-04 Medcard Systems, Inc. Method and system for maintaining and updating computerized medical records
US6023710A (en) * 1997-12-23 2000-02-08 Microsoft Corporation System and method for long-term administration of archival storage
US6154852A (en) * 1998-06-10 2000-11-28 International Business Machines Corporation Method and apparatus for data backup and recovery
US20070124349A1 (en) * 1999-12-20 2007-05-31 Taylor Kenneth J Method and apparatus for storage and retrieval of very large databases using a direct pipe
US20040117438A1 (en) * 2000-11-02 2004-06-17 John Considine Switching system
US20060129940A1 (en) * 2000-12-11 2006-06-15 Microsoft Corporation User interface for managing multiple network resources
US7752169B2 (en) * 2002-06-04 2010-07-06 International Business Machines Corporation Method, system and program product for centrally managing computer backups
US20080177994A1 (en) * 2003-01-12 2008-07-24 Yaron Mayer System and method for improving the efficiency, comfort, and/or reliability in Operating Systems, such as for example Windows
US20080126445A1 (en) * 2003-06-06 2008-05-29 Eric Michelman Method and system for reciprocal data backup
US20050131740A1 (en) * 2003-12-10 2005-06-16 Geoage, Incorporated Management tool for health care provider services
US7469274B1 (en) * 2003-12-19 2008-12-23 Symantec Operating Corporation System and method for identifying third party copy devices
US8412822B1 (en) * 2004-01-27 2013-04-02 At&T Intellectual Property Ii, L.P. Optimized job scheduling and execution in a distributed computing grid
US7330997B1 (en) * 2004-06-03 2008-02-12 Gary Odom Selective reciprocal backup
US20060026219A1 (en) * 2004-07-29 2006-02-02 Orenstein Jack A Metadata Management for fixed content distributed data storage
US7657581B2 (en) * 2004-07-29 2010-02-02 Archivas, Inc. Metadata management for fixed content distributed data storage
US20060271601A1 (en) * 2005-05-24 2006-11-30 International Business Machines Corporation System and method for peer-to-peer grid based autonomic and probabilistic on-demand backup and restore
US20070073791A1 (en) * 2005-09-27 2007-03-29 Computer Associates Think, Inc. Centralized management of disparate multi-platform media
US20070100913A1 (en) * 2005-10-12 2007-05-03 Sumner Gary S Method and system for data backup
US20070192552A1 (en) * 2006-02-16 2007-08-16 International Business Machines Corporation Dynamically determining and managing a set of target volumes for snapshot operation
US20070214196A1 (en) * 2006-03-08 2007-09-13 International Business Machines Coordinated federated backup of a distributed application environment
US20080235299A1 (en) * 2007-03-21 2008-09-25 International Business Machines Corporation Determining which user files to backup in a backup system
US20090150431A1 (en) * 2007-12-07 2009-06-11 Sap Ag Managing relationships of heterogeneous objects
US20090249005A1 (en) * 2008-03-27 2009-10-01 International Business Machines Corporation System and method for providing a backup/restore interface for third party hsm clients
US20090300137A1 (en) * 2008-05-29 2009-12-03 Research In Motion Limited Method, system and devices for communicating between an internet browser and an electronic device
US20100021127A1 (en) * 2008-07-23 2010-01-28 Hiroshi Saito Data control apparatus, data backup apparatus, and recording medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10824510B1 (en) * 2017-10-02 2020-11-03 EMC IP Holding Company LLC Environment independent data protection in a highly available data system
US11372722B2 (en) 2017-10-02 2022-06-28 EMC IP Holding Company LLC Environment independent data protection in a highly available data system
CN107741890A (en) * 2017-10-16 2018-02-27 郑州云海信息技术有限公司 A kind of backup management method and device

Similar Documents

Publication Publication Date Title
US11422853B2 (en) Dynamic tree determination for data processing
US10581957B2 (en) Multi-level data staging for low latency data access
US20190188190A1 (en) Scaling stateful clusters while maintaining access
US9996593B1 (en) Parallel processing framework
US9020802B1 (en) Worldwide distributed architecture model and management
US9280381B1 (en) Execution framework for a distributed file system
US8788660B2 (en) Query execution and optimization with autonomic error recovery from network failures in a parallel computer system with multiple networks
US8543596B1 (en) Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
US20140214752A1 (en) Data stream splitting for low-latency data access
US8688819B2 (en) Query optimization in a parallel computer system with multiple networks
US10127077B2 (en) Event distribution pattern for use with a distributed data grid
US20080021902A1 (en) System and Method for Storage Area Network Search Appliance
CN107180113B (en) Big data retrieval platform
US9754032B2 (en) Distributed multi-system management
CN112714018B (en) Gateway-based ElasticSearch search service method, system, medium and terminal
Honnutagi The Hadoop distributed file system
CN104054076A (en) Data storage method, database storage node failure processing method and apparatus
US9639427B1 (en) Backing up data stored in a distributed database system
US8930345B2 (en) Query optimization in a parallel computer system to reduce network traffic
CN112231399A (en) Method and device applied to graph database
US8037184B2 (en) Query governor with network monitoring in a parallel computer system
US20220200849A1 (en) Automated networking device replacement system

Legal Events

Date Code Title Description
AS Assignment

Owner name: TERADATA CORPORATION, OHIO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DAVIS, JEREMY;MULLER, P. KEITH;SIGNING DATES FROM 20090113 TO 20090119;REEL/FRAME:022223/0550

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4