WO2016018400A1 - Data merge processing - Google Patents

Data merge processing Download PDF

Info

Publication number
WO2016018400A1
WO2016018400A1 PCT/US2014/049264 US2014049264W WO2016018400A1 WO 2016018400 A1 WO2016018400 A1 WO 2016018400A1 US 2014049264 W US2014049264 W US 2014049264W WO 2016018400 A1 WO2016018400 A1 WO 2016018400A1
Authority
WO
WIPO (PCT)
Prior art keywords
update
priority
updates
processor
higher priority
Prior art date
Application number
PCT/US2014/049264
Other languages
French (fr)
Inventor
Muhuan HUANG
Kimberly Keeton
Iii Charles B. Morrey
Kevin T. Lim
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2014/049264 priority Critical patent/WO2016018400A1/en
Publication of WO2016018400A1 publication Critical patent/WO2016018400A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs

Definitions

  • FIG. 1 is a block diagram of an example merge stage
  • FIG. 2 is a process flow diagram illustrating an example method for prioritizing data to be merged
  • Fig. 3A is a diagram of an example priority processor populated with initial values
  • Fig. 3B is a diagram of an example priority processor fully populated with an ordered array of individual updates with priority values displayed;
  • Fig. 3C is a diagram of an example priority processor receiving a new individual update
  • Fig. 3D is a diagram of an example priority processor having processed a new individual update
  • Fig. 3E is a diagram of an example priority processor receiving another individual update
  • Fig. 3F is a diagram of an example priority processor having processed the individual updates
  • FIG. 4 is a process flow diagram illustrating an example method for merging SCUs
  • FIG. 5 is block diagram of an example computing device to merge data
  • Fig. 6 is a drawing of an example machine-readable storage medium that can be used to merge data.
  • Data may be prioritized using values that indicate the relative priority of individual data. For example, in databases, data ordering may be based on the primary keys in an index. For example, the primary key values may be used to retrieve or update information in a specified order.
  • priority queues refer to data structures that are used to prioritize individual updates in batch updates to data entries.
  • the priority queues described herein may maintain a partially ordered internal structure, but allow for constant time identification of a priority element.
  • priority queues may be used to provide a fast ordering where a priority element is the item of interest.
  • a priority element in a priority queue may be an individual update with an entry having a higher primary index number than other individual updates in the queue.
  • a specialized processor such as a field-programmable gate array (FPGA) may be used to achieve parallel processing of elements to be merged.
  • FPGA field-programmable gate array
  • pipelined refers to a set of elements being connected in series, where the output of one element is the input of the next element, and multiple operations may occur in parallel along the series.
  • the pipeline stages have the limitations of using a general purpose processor. For example, the architecture of the processor may not be well suited for the access patterns nor enable as much parallelism.
  • a priority processor of the computing device may prioritize data in a pipelined array associated with a priority queue as described in Figs. 3A-3F below. The prioritized data can then be efficiently merged using a merging processor.
  • An example merge stage is shown in Fig. 1 and an example of such computing device is shown in Fig. 5.
  • Fig. 1 is a detailed block diagram of an example merge stage according to implementations described herein.
  • the configuration of the example merge stage is generally referred to by the reference number 100.
  • the merge stage includes three primary components, including a fetch stage 102, a priority queue 104, and a data merge stage 106.
  • the data fetch stage 102 includes a Direct Memory Access (DMA) engine 1 08 that reads data such as individual updates 1 13 of self-consistent updates (SCUs) 1 10 from local memory into block random access memory (BRAM) buffers 1 12.
  • DMA Direct Memory Access
  • SCUs self-consistent updates
  • BRAM block random access memory
  • an SCU refers to a batch of updates that are received by a client node and/or a pointer to the batch of updates.
  • SCUs include a set of changes that are applied atomically to tables in a database.
  • the SCUs are stored durably onto a disk when they are first uploaded.
  • the SCUs are said to be self-consistent to the extent that each SCU is applied consistently such that all underlying tables in a system are consistent after the changes are applied.
  • the application of an SCU is isolated from other SCUs.
  • Each SCU 1 10 includes individual updates 1 1 3 that can each include a key 1 14 and one or more related fields 1 1 6.
  • one or more fields 1 1 6 may not have an update to be applied as indicated by dashes in updates 1 13.
  • the priority queue 1 04 includes a pipelined array 1 18 of individual updates 1 13 to be prioritized and a prioritized array 120 of individual updates 1 13.
  • the merge stage 106 includes an array 122 of merged individual updates 124.
  • the fetch stage 102 may begin with the DMA engine reading data from a local memory to BRAM buffers.
  • the BRAM buffers may be an on-FPGA memory.
  • the DMA may receive a request for a particular data in the form of a request for acknowledgment.
  • a priority processor can request an individual update from a self-consistent update (SCU) from the DMA.
  • the updates can be batched based on a specific time interval. For example, the updates can be batched once every several seconds. In some examples, updates can be batched based on a specific number of updates. For example, the updates can be batched every 100 updates.
  • the DMA can return an acknowledgment to the priority queue and retrieve the individual update from an SCU.
  • the BRAM buffers provide temporary storage for a plurality of individual updates 1 13 from each SCU 1 10. The BRAM thus enables priority queue operation to occur while overlapping the transfer from the local memory.
  • the priority queue 104 may be a high speed parallel implementation that utilizes a pipelined array.
  • a pipelined array allows replace operations to occur on the priority queue 104 every clock cycle.
  • the pipelined array may be processed by a priority processor, such as an FPGA or Application Specific Integrated Circuit (ASIC).
  • the FPGAs enable the sorting of the array implementing the priority queue to occur in parallel with the replace operation. This may result in a 10x improvement over standard software implementations. Standard software implementations take logarithmic time between replace operations, whereas the time spent between replace operations in the present implementations is fairly constant without regard to the number of operations concurrently executed.
  • the data merge stage 106 takes each prioritized individual update 1 13 provided by the priority queue 1 04 and merges the data of each prioritized individual update 1 13 with the current version of the specific row.
  • Each update potentially includes updates to multiple fields within the row.
  • a timestamp is be compared to each field's timestamp in the current version of the row to see if the prioritized individual update 1 13 has more recent data. If the prioritized individual update 1 13 has more recent data, then the current version's data is changed to match the update 1 13.
  • a merge processor implemented using an FPGA is able to compare timestamps across multiple fields in parallel, allowing a very fast merging of data. Traditional software approaches often examine each field sequentially.
  • the final merged updates 1 24 of merged array 122 are written out to local memory once the latest higher priority update of the priority queue 104 indicates that there are no more updates to that row. For example, a new value for the primary key may be returned by the priority queue indicating an update to another row.
  • Fig. 2 is a process flow diagram illustrating an example method for prioritizing data to be merged.
  • the method of Fig. 2 is generally referred to by the reference number 200.
  • a processor initializes a priority queue 104.
  • higher numbers represent higher priority and lower numbers represent lower priority.
  • initial values are used to initialize the priority queue.
  • the processor may populate a priority queue 104 with values indicating infinity.
  • the processor may use lower values to indicate priority and may populate the priority queue 104 with values indicating negative infinity. In both cases, the infinity values serve as initial placeholders in the priority queue 104 that may be replaced by individual updates 1 13 using the replace operation.
  • the priority queue 104 receives an individual update 1 13 from the SCUs 1 10.
  • the update 1 1 3 may correspond to one of a plurality of database entries to be updated.
  • the update 1 13 may include the data to be updated or a pointer to the update 1 1 3.
  • update 1 13 may refer to one of a batch of updates in an SCU or a pointer that identifies an individual update in the SCU.
  • Each individual update also includes a priority value associated with the individual update.
  • the priority value may be assigned by the processor.
  • the priority value may be a key value such as primary key 1 14.
  • the key value may be used to determine a relative position in a database entry to be updated.
  • related database entries to be updated may be later merged together using the key value and updated in a more efficient manner.
  • the priority processor identifies a higher priority update 1 1 3 than all other updates 1 13 in the priority queue and replaces the higher priority update 1 13 at the root position in a priority queue with a new update 1 13 from SCUs 1 10.
  • a root position in a priority queue refers to a level in a priority queue that receives new updates 1 13 from the SCUs 1 10 and also contains higher priority updates 1 13.
  • the priority processor may send the identified update 1 1 3 to the other processor to indicate the corresponding update 1 1 3 to be updated.
  • the priority processor may send the identified higher priority updates 1 1 3 of prioritized array 120 to a memory for merging in the merge stage 1 06.
  • the root level of the priority queue may contain the higher priority update 1 13 after a complete clock cycle. In some examples, the root level may contain the new update 1 13 from SCUs 1 10 after a complete clock cycle. Thus, by replacing the update 1 13 at the root position of the priority queue 104 with a new update 1 13 from the SCUs 1 1 0, the priority processor may identify an update 1 13 with a higher priority than all other updates 1 13 in the array 1 18.
  • the priority processor swaps even-level updates 1 13 with consecutively higher odd-level updates 1 13 based on a comparison of priority values associated with the updates 1 13.
  • the priority values may be key values 1 14 associated with each update 1 13.
  • the updates 1 13 in levels 0 and 1 may be swapped
  • the updates 1 13 in levels 2 and 3 may be swapped
  • the updates 1 13 in levels 4 and 5 may be swapped.
  • the updates 1 1 3 of two levels are swapped based on their priority values.
  • the updates 1 13 may be swapped when the higher level update 1 13 has a higher priority value. For example, in a priority queue, the higher priority updates 1 13 will be sorted to lower levels.
  • the priority processor may simultaneously swap all the pairs of odd/even levels of updates 1 13 that are to be swapped.
  • block 208 may be executed at higher levels concurrently with the execution of block 206 at the lower levels.
  • the priority processor swaps odd-level updates 1 13 with consecutively higher even-level updates 1 13 based on a comparison of priority values associated with the updates 1 13.
  • the update 1 13 of level 1 might be swapped with an update 1 13 of level 2
  • an update 1 13 of level 3 might be swapped with an update 1 13 of level 4, and so on.
  • the updates 1 13 are swapped according to their priority values.
  • an update 1 1 3 in level 1 may be swapped with an update 1 13 in level 2 if the update 1 13 in level 1 has a lower priority value than the update 1 13 in level 2.
  • block 210 may be executed at higher levels concurrently with the execution of block 206 at the lower levels.
  • method 200 may iterate through additional received updates 1 13 by cycling through blocks 204-210. If no further additional updates 1 13 are received, then the method proceeds to diamond 214.
  • method 200 may proceed to cycle through blocks 208-210 until no further swaps are performed because all the updates 1 13 are sorted.
  • the priority processor may be populated with lower priority values to sort the remaining sorted updates 1 1 3 and identify higher priority updates 1 13 in the priority queue. If all the updates 1 13 are sorted, then the method ends at block 21 6.
  • a clock cycle may begin at block 204 and end at block 21 0.
  • a clock cycle may begin at block 208, proceed to block 210, then finish with blocks 204 and 206.
  • any number of additional elements not shown in Fig. 2 may be included in the method 200, depending on the details of the specific implementation.
  • Fig. 3A is a diagram of an example priority processor initialized with initial values.
  • the configuration of the example priority processor of Fig. 3A is referred to generally by the reference number 300A.
  • the priority processor 300A includes levels 0-7 that are labeled as levels 302-316, respectively. In the example of 300A, levels 302-316 are populated by the value infinity 318.
  • the levels 302-316 are populated by the value infinity 31 8 because priority is indicated by higher priority values.
  • the priority queue may be arranged to output the priority updates and store the rest of the updates in a descending order from left to right.
  • the priority processor 300A may use replace and delete functions, and not insert functions. By using the replace and delete functions on the priority processor 300A in parallel on all the levels of the priority queue, rather than using insert functions, the priority processor 300A may allow an operation following a replacement or removal in O(1 ), or constant time, instead of O(log n), or logarithmic time. Therefore, the priority processor 300A may efficiently process updates regardless of the total amount of updates to be processed. Furthermore, because a single array is used, the priority processor 300A may use storage space efficiently.
  • Fig. 3B is a diagram of an example priority processor fully populated with an ordered array of individual updates 1 13 with priority values displayed.
  • the configuration of the example priority processor in Fig. 3B is referred to generally by the reference number 300B.
  • Updates 320-334 correspond to levels 302-316 of the priority processor, respectively.
  • Fig. 3C is a diagram of an example priority processor receiving a new update.
  • the configuration of the priority processor in Fig. 3C is generally referred to by the reference number 300C.
  • new update 336 is about to replace update 320 as shown by arrow 338.
  • Update 320 is also about to be identified as a higher priority update and sent to output as shown by arrow 340. In some examples, the output may be sent to a processor or memory as discussed further in Figs. 5 and 6 below.
  • the fully populated priority processor 300C receives a new individual update 336.
  • the new update 336 is received at level 0 302, also referred to herein as the root level 302.
  • the priority processor 300C uses the replace function to replace update 320 at root level 302 with new update 336 and output update 320.
  • the update 320 may be output 340 to a processor, memory, or storage device.
  • the priority processor 300C may then swap consecutive updates using the replace operation as described in Fig. 3D.
  • Fig. 3D is a diagram of an example priority processor having processed an individual update 1 13.
  • the configuration of the priority processor in Fig. 3D is generally referred to by the reference number 300D.
  • a first round of swap and comparisons are indicated by arrows 342 and 344, respectively.
  • a second round of swap and comparisons are indicated by arrows 346 and 348, respectively.
  • update 336 has shifted two places to the right from root level 302 to level 306.
  • the replacement of update 336 with the original update 320 at root level 302 and the shifting of update 336 two levels to the right may be performed by the priority processor 300D within one clock cycle.
  • the priority processor 300D may perform two sets of adjacent comparisons and/or swaps. For example, a first set of a swap and comparisons of even-levels with consecutively higher odd-levels indicated by arrows 342 and 344, respectively, results in new update 336 at root level 302 swapping with higher priority update 322 at level 304.
  • update 322 is then placed into root level 302 and update 336 takes the place of update 322 at level 304.
  • the priority processor 300D does not perform any swaps because the priority values of these updates indicate that they are already ordered in a descending order of priority.
  • update 336 of level 304 is then swapped with higher priority update 324 of level 306.
  • update 336 moves up to level 306, and update 324 moves down to level 304, the final resulting order of the updates shown in the example of 300D.
  • Fig. 3E is a diagram of an example priority processor receiving another individual update 1 13.
  • the configuration of the priority processor in Fig. 3E is generally referred to by the reference number 300E.
  • a new update 350 is to replace update 322 as shown by arrow 352.
  • Update 322 is also to be output by the priority processor 300E as shown by arrow 354.
  • a new update 350 is to be added to the pipeline processor configuration of 300D.
  • the new update 350 is to replace the existing update 322 of root level 302, the existing update 322 to be output by the priority processor 300E as indicated by arrow 354.
  • this time two pairs of swaps will simultaneously follow the replacement of root level 302 as described in further detail with reference to Fig. 3F.
  • Fig. 3F is a diagram of an example priority processor having processed the individual updates.
  • the configuration of the priority processor in Fig. 3F is generally referred to by the reference number 300F.
  • Two pairs of swaps 342, 346 are indicated by bold dotted arrows, while comparisons 344, 348 are indicated by lightly dotted arrows.
  • Fig. 3F both new update 350 of 300E and update 336 of 300C have been shifted up two levels to the right.
  • the priority processor 300F executes two consecutive swaps.
  • 300F shows two pairs of consecutive swaps.
  • more than one update may simultaneously be swapped with a consecutively higher level update.
  • update 350 of root level 302 was swapped with update 324 of level 304
  • update 336 of level 306 was swapped with update 326 of level 308.
  • the odd-level updates are compared with the corresponding
  • update 350 at level 304 was compared and swapped with update 326 of level 306, and update 336 of level 308 was compared and swapped with update 328 of level 310.
  • 300F shows the final positions of the two sets of swaps.
  • update 336 may eventually reach level 316 as it is an update with a lower priority.
  • priority processor 300F may swap update 350 into level 310 and keep it there until a lower priority update is introduced at later clock cycles. Processing a pipelined array on a priority processor 300F such as an FPGA may result in a higher overall performance. For example, an implemented pipelined array priority queue on an FPGA board produced
  • FIGs of Figs. 3A-3F are not intended to indicate that all of the elements of the configurations 300A-300F are to be included in every case. Further, any number of additional elements not shown in Figs. 3A-3F may be included in the configurations 300A-300F, depending on the details of the specific implementation. For example, in configuration 300F, more than two updates may be swapped at the same time with a consecutively higher level, depending on the priority values of the updates.
  • Fig. 4 is a process flow diagram illustrating an example method for merging data. The method of Fig. 4 is generally referred to by the reference number 400.
  • a priority processor receives individual updates 1 1 3.
  • the individual updates 1 13 may be from SCUs 1 10 and are to be prioritized and merged with a current database row.
  • the priority processor orders the updates 1 1 3 by priority value via a priority queue.
  • the priority value may be a primary key associated with each individual update.
  • the priority processor can order the updates 1 13 while concurrently sending higher priority updates 1 13 to the merge processor.
  • the DMA can store higher priority updates 1 13 in the BRAM for the priority processor to access higher priority updates.
  • a merge processor receives a higher priority update from the priority queue.
  • the higher priority update may have had a primary key that is equal to or higher than all the rest of the updates of the priority queue.
  • the merge processor merges the higher priority update with a corresponding database row.
  • merging the higher priority update may include updating multiple fields of the corresponding database row based on a timestamp comparison. In some examples, multiple fields are to be updated in parallel.
  • the merge processor checks if a different database row is indicated by a priority value. For example, a primary key can be used to distinguish between members of different rows. If the primary key indicates that the higher priority update corresponds to the same database row as the prior primary key, then the method proceeds back to 408 wherein the merge processor is to receive additional higher priority updates from the priority queue. In some examples, the merge processor may then merge the additional higher priority updates with their corresponding database rows. In some examples, if the primary key indicates that a higher priority update corresponds to a different row based on the primary key, then the method proceeds to 412.
  • a priority value For example, a primary key can be used to distinguish between members of different rows. If the primary key indicates that the higher priority update corresponds to the same database row as the prior primary key, then the method proceeds back to 408 wherein the merge processor is to receive additional higher priority updates from the priority queue. In some examples, the merge processor may then merge the additional higher priority updates with their corresponding database rows. In some examples, if the primary key indicates that a higher priority update
  • the merge processor sends the merged database row to memory after receiving a higher priority update corresponding to a different database row.
  • the merge processor can also store the merged database in a storage device.
  • Fig. 5 is a block diagram of an example computing device 502 to update data.
  • the computing device 502 may include a processor 504, memory 506, a machine-readable storage 508, a network interface card (NIC) 51 0 to connect computing system 102 to network 1 12, a direct memory access (DMA) engine 514, a priority processor 51 6, and a merge processor 51 8.
  • NIC network interface card
  • DMA direct memory access
  • the processor 504 may be a main processor that is adapted to execute the stored instructions.
  • the processor 504 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations.
  • the processor 504 may be implemented as Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors, x86 Instruction set compatible processors, ARMv7 Instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU).
  • CISC Complex Instruction Set Computer
  • RISC Reduced Instruction Set Computer
  • the memory device 506 may include random access memory (e.g., SRAM, BRAM, DRAM, zero capacitor RAM, SONOS, eDRAM, EDO RAM, DDR RAM, RRAM, PRAM, etc.), read only memory (e.g., Mask ROM, PROM, EPROM, EEPROM, etc.), flash memory, or any other suitable memory systems.
  • random access memory e.g., SRAM, BRAM, DRAM, zero capacitor RAM, SONOS, eDRAM, EDO RAM, DDR RAM, RRAM, PRAM, etc.
  • read only memory e.g., Mask ROM, PROM, EPROM, EEPROM, etc.
  • flash memory e.g., a flash memory, or any other suitable memory systems.
  • the memory may receive identified higher priority data from the priority processor 51 6.
  • machine-readable storage 508 may be any electronic, magnetic, optical, or other physical storage device that stored executable
  • machine-readable storage medium may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like.
  • RAM Random Access Memory
  • EEPROM Electrically-Erasable Programmable Read-Only Memory
  • storage drive an optical disc, and the like.
  • machine-readable storage medium 508 may be encoded with executable instructions for prioritizing data.
  • the machine-readable storage medium 508 may be encoded with executable instructions for prioritizing individual updates.
  • a NIC 510 may connect computing system 502 to a network 51 2.
  • the NIC 51 0 may connect computing system 502 to a local network 512, a virtual private network (VPN), or the Internet.
  • the NIC may include an Ethernet controller.
  • the Ethernet controller can be leveraged on a field programmable gate array (FPGA) to provide a connection of the pipeline stages using the transmission control protocol (TCP).
  • FPGA field programmable gate array
  • the DMA engine 514 may be an embedded DMA controller.
  • the DMA engine can be used to transport data without accessing the processor 504.
  • the direct memory access (DMA) engine 514 may be used to retrieve data from an FPGA-local dynamic random access memory (DRAM) into block random access memory (BRAM) buffers on a field programmable gate array (FPGA).
  • DRAM FPGA-local dynamic random access memory
  • BRAM block random access memory
  • FPGA field programmable gate array
  • the DMA engine 514 may be used to retrieve an individual update 1 13 that is to be placed in the priority queue.
  • the priority processor 516 may be an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other type of specialized processor designed to perform the techniques described herein.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • an FPGA may be programmed to efficiently prioritize updates 1 13 in a pipelined array as discussed in Figs. 3A-3F above.
  • the priority processor 516 may receive a first update 1 13 from the processor and output a second update residing at a root position of the priority queue 104 and send the second update to the processor and/or the memory and enter the first update at the root position of the priority queue 104.
  • the priority processor 51 6 may also swap the first update residing at the root position of the priority queue 104 with a third update residing at a second position of the priority queue 104 based on a comparison of a first priority value associated with the first update and a second priority value associated with the third update.
  • the priority processor 516 may be further configured to swap the first update residing at the second position of the priority queue 1 04 with at least a fourth update residing at least a third position of the priority queue 1 04 based on a comparison of a third priority value associated with the fourth update and the first priority value associated with the first update.
  • the swapping between pairs of consecutive odd and even level updates may be executed concurrently.
  • the priority queue 104 is a data structure that may receive updates 1 13 for sorting according to a priority.
  • the priority queue 1 04 may be located on priority processor 516.
  • memory 506 may be a memory associated with priority processor 516.
  • the priority queue 104 may be located on storage device 508.
  • FIG. 5 The block diagram of Fig. 5 is not intended to indicate that the computing device 502 is to include all of the components shown in Fig. 5. Further, the computing device 502 may include any number of additional components not shown in Fig. 5, depending on the details of the specific implementation.
  • Fig. 6 is a drawing of an example machine-readable storage medium 600 that may be used to update data.
  • Machine-readable storage medium 600 is connected to processor 602 via bus 604.
  • Machine-readable storage medium 600 also contains data fetch module 606, a priority module 608, and a merge module 610.
  • the machine-readable medium is generally referred to by the reference number 600.
  • the machine-readable medium 600 may comprise Random Access Memory (RAM), a hard disk drive, an array of hard disk drives, an optical drive, an array of optical drives, a non-volatile memory, a Universal Serial Bus (USB) flash drive, a DVD, a CD, and the like.
  • RAM Random Access Memory
  • USB Universal Serial Bus
  • the machine-readable medium 600 may be accessed by a processor 602 over a computer bus 604.
  • a first block 606 may include a data fetch module 606 to retrieve an update.
  • the update can be an individual update 1 13 from an SCU 1 10.
  • a second block 608 may include a priority module to order the updates by priority value via a priority queue.
  • the priority value can be a primary key 1 14 associated with each individual update 1 13.
  • the priority module 608 may further also receive a higher priority update from the priority queue.
  • a third block 61 0 may include a merge module 610 to merge the higher priority update with a
  • merging the higher priority update may include instructions to update multiple fields of the corresponding database row based on a timestamp comparison.
  • the merge module 61 0 may receive additional higher priority updates from the priority queue.
  • the merge module 610 may merge the higher priority updates with the corresponding database row.
  • the merge module 610 may also send the merged database row to a memory after receiving a higher priority update corresponding to a different database row.
  • the instructions to update the multiple fields are to be executed in parallel.
  • the merge module 61 0 may store the merged database row to a storage device.
  • the software components may be stored in any order or configuration.
  • the computer-readable medium 600 is a hard drive
  • the software components may be stored in non-contiguous, or even overlapping, sectors.

Abstract

Techniques are described in which updates are received. The updates are ordered by priority value via a priority queue. A higher priority update is received from the priority queue. The higher priority update is merged with a corresponding database row.

Description

DATA MERGE PROCESSING
BACKGROUND
[0001] In computing, there are several examples of processes that merge large amounts of data. Data may also be prioritized using values that indicate the relative priority of individual data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Various features of the techniques of the present application will become apparent from the following description of examples, given by way of example only, which is made with reference to the accompanying drawings, of which:
[0003] Fig. 1 is a block diagram of an example merge stage;
[0004] Fig. 2 is a process flow diagram illustrating an example method for prioritizing data to be merged;
[0005] Fig. 3A is a diagram of an example priority processor populated with initial values;
[0006] Fig. 3B is a diagram of an example priority processor fully populated with an ordered array of individual updates with priority values displayed;
[0007] Fig. 3C is a diagram of an example priority processor receiving a new individual update;
[0008] Fig. 3D is a diagram of an example priority processor having processed a new individual update;
[0009] Fig. 3E is a diagram of an example priority processor receiving another individual update;
[0010] Fig. 3F is a diagram of an example priority processor having processed the individual updates;
[0011] Fig. 4 is a process flow diagram illustrating an example method for merging SCUs;
[0012] Fig. 5 is block diagram of an example computing device to merge data; and
[0013] Fig. 6 is a drawing of an example machine-readable storage medium that can be used to merge data. DETAILED DESCRIPTION
[0014] There are processes in computing environments that may benefit from merging data efficiently. Data may be prioritized using values that indicate the relative priority of individual data. For example, in databases, data ordering may be based on the primary keys in an index. For example, the primary key values may be used to retrieve or update information in a specified order.
[0015] Improved merging of data flows may be achieved using priority queues. As used herein, priority queues refer to data structures that are used to prioritize individual updates in batch updates to data entries. The priority queues described herein may maintain a partially ordered internal structure, but allow for constant time identification of a priority element. Thus, priority queues may be used to provide a fast ordering where a priority element is the item of interest. For example, a priority element in a priority queue may be an individual update with an entry having a higher primary index number than other individual updates in the queue. In addition, a specialized processor such as a field-programmable gate array (FPGA) may be used to achieve parallel processing of elements to be merged.
[0016] Some current designs process pipeline stages on a standard processor. As used herein, pipelined refers to a set of elements being connected in series, where the output of one element is the input of the next element, and multiple operations may occur in parallel along the series. By executing on a standard processor, the pipeline stages have the limitations of using a general purpose processor. For example, the architecture of the processor may not be well suited for the access patterns nor enable as much parallelism.
[0017] Described herein are techniques relating to merging data using a specialized priority processor and merging processor in a computing device. In some examples, a priority processor of the computing device may prioritize data in a pipelined array associated with a priority queue as described in Figs. 3A-3F below. The prioritized data can then be efficiently merged using a merging processor. An example merge stage is shown in Fig. 1 and an example of such computing device is shown in Fig. 5.
[0018] Fig. 1 is a detailed block diagram of an example merge stage according to implementations described herein. The configuration of the example merge stage is generally referred to by the reference number 100. The merge stage includes three primary components, including a fetch stage 102, a priority queue 104, and a data merge stage 106. The data fetch stage 102 includes a Direct Memory Access (DMA) engine 1 08 that reads data such as individual updates 1 13 of self-consistent updates (SCUs) 1 10 from local memory into block random access memory (BRAM) buffers 1 12. As used herein, an SCU refers to a batch of updates that are received by a client node and/or a pointer to the batch of updates. SCUs include a set of changes that are applied atomically to tables in a database. In some examples, the SCUs are stored durably onto a disk when they are first uploaded. The SCUs are said to be self-consistent to the extent that each SCU is applied consistently such that all underlying tables in a system are consistent after the changes are applied. Moreover, the application of an SCU is isolated from other SCUs. Each SCU 1 10 includes individual updates 1 1 3 that can each include a key 1 14 and one or more related fields 1 1 6. In some examples, one or more fields 1 1 6 may not have an update to be applied as indicated by dashes in updates 1 13. The priority queue 1 04 includes a pipelined array 1 18 of individual updates 1 13 to be prioritized and a prioritized array 120 of individual updates 1 13. The merge stage 106 includes an array 122 of merged individual updates 124.
[0019] In some examples, the fetch stage 102 may begin with the DMA engine reading data from a local memory to BRAM buffers. In some examples, the BRAM buffers may be an on-FPGA memory. In some examples, the DMA may receive a request for a particular data in the form of a request for acknowledgment. For example, a priority processor can request an individual update from a self-consistent update (SCU) from the DMA. In some examples, the updates can be batched based on a specific time interval. For example, the updates can be batched once every several seconds. In some examples, updates can be batched based on a specific number of updates. For example, the updates can be batched every 100 updates. In some examples, the DMA can return an acknowledgment to the priority queue and retrieve the individual update from an SCU. In some examples, the BRAM buffers provide temporary storage for a plurality of individual updates 1 13 from each SCU 1 10. The BRAM thus enables priority queue operation to occur while overlapping the transfer from the local memory.
[0020] In some examples, the priority queue 104 may be a high speed parallel implementation that utilizes a pipelined array. A pipelined array allows replace operations to occur on the priority queue 104 every clock cycle. In some examples, the pipelined array may be processed by a priority processor, such as an FPGA or Application Specific Integrated Circuit (ASIC). The FPGAs enable the sorting of the array implementing the priority queue to occur in parallel with the replace operation. This may result in a 10x improvement over standard software implementations. Standard software implementations take logarithmic time between replace operations, whereas the time spent between replace operations in the present implementations is fairly constant without regard to the number of operations concurrently executed.
[0021] In some examples, the data merge stage 106 takes each prioritized individual update 1 13 provided by the priority queue 1 04 and merges the data of each prioritized individual update 1 13 with the current version of the specific row. Each update potentially includes updates to multiple fields within the row. For each field 1 16, a timestamp is be compared to each field's timestamp in the current version of the row to see if the prioritized individual update 1 13 has more recent data. If the prioritized individual update 1 13 has more recent data, then the current version's data is changed to match the update 1 13. In some examples, a merge processor implemented using an FPGA is able to compare timestamps across multiple fields in parallel, allowing a very fast merging of data. Traditional software approaches often examine each field sequentially. In some examples, the final merged updates 1 24 of merged array 122 are written out to local memory once the latest higher priority update of the priority queue 104 indicates that there are no more updates to that row. For example, a new value for the primary key may be returned by the priority queue indicating an update to another row.
[0022] Fig. 2 is a process flow diagram illustrating an example method for prioritizing data to be merged. The method of Fig. 2 is generally referred to by the reference number 200.
[0023] At block 202, a processor initializes a priority queue 104. As used herein, higher numbers represent higher priority and lower numbers represent lower priority. In some examples, because the replace operation is used, initial values are used to initialize the priority queue. In some examples, the processor may populate a priority queue 104 with values indicating infinity. In some examples, the processor may use lower values to indicate priority and may populate the priority queue 104 with values indicating negative infinity. In both cases, the infinity values serve as initial placeholders in the priority queue 104 that may be replaced by individual updates 1 13 using the replace operation. [0024] At block 204, the priority queue 104 receives an individual update 1 13 from the SCUs 1 10. For example, the update 1 1 3 may correspond to one of a plurality of database entries to be updated. The update 1 13 may include the data to be updated or a pointer to the update 1 1 3. For example, update 1 13 may refer to one of a batch of updates in an SCU or a pointer that identifies an individual update in the SCU. Each individual update also includes a priority value associated with the individual update. The priority value may be assigned by the processor. In some examples, the priority value may be a key value such as primary key 1 14. For example, the key value may be used to determine a relative position in a database entry to be updated. In some examples, related database entries to be updated may be later merged together using the key value and updated in a more efficient manner.
[0025] At block 206, the priority processor identifies a higher priority update 1 1 3 than all other updates 1 13 in the priority queue and replaces the higher priority update 1 13 at the root position in a priority queue with a new update 1 13 from SCUs 1 10. A root position in a priority queue, as used herein, refers to a level in a priority queue that receives new updates 1 13 from the SCUs 1 10 and also contains higher priority updates 1 13. After identifying the higher priority update 1 13, the priority processor may send the identified update 1 1 3 to the other processor to indicate the corresponding update 1 1 3 to be updated. In some examples, the priority processor may send the identified higher priority updates 1 1 3 of prioritized array 120 to a memory for merging in the merge stage 1 06. In some examples, the root level of the priority queue may contain the higher priority update 1 13 after a complete clock cycle. In some examples, the root level may contain the new update 1 13 from SCUs 1 10 after a complete clock cycle. Thus, by replacing the update 1 13 at the root position of the priority queue 104 with a new update 1 13 from the SCUs 1 1 0, the priority processor may identify an update 1 13 with a higher priority than all other updates 1 13 in the array 1 18.
[0026] At block 208, the priority processor swaps even-level updates 1 13 with consecutively higher odd-level updates 1 13 based on a comparison of priority values associated with the updates 1 13. For example, the priority values may be key values 1 14 associated with each update 1 13. In some examples, given six levels 0- 5, the updates 1 13 in levels 0 and 1 may be swapped, the updates 1 13 in levels 2 and 3 may be swapped, and the updates 1 13 in levels 4 and 5 may be swapped. In some examples, the updates 1 1 3 of two levels are swapped based on their priority values. For example, the updates 1 13 may be swapped when the higher level update 1 13 has a higher priority value. For example, in a priority queue, the higher priority updates 1 13 will be sorted to lower levels. Thus, in the priority queue, if an update 1 13 in level 0 has a priority value of 5 and an update 1 13 in level 1 has a priority value of 2, then the update 1 13 in level 0 will not be swapped after being compared with the update 1 13 in level 1 because they are already sorted correctly. In some examples, the priority processor may simultaneously swap all the pairs of odd/even levels of updates 1 13 that are to be swapped. In some examples, block 208 may be executed at higher levels concurrently with the execution of block 206 at the lower levels.
[0027] At block 210, the priority processor swaps odd-level updates 1 13 with consecutively higher even-level updates 1 13 based on a comparison of priority values associated with the updates 1 13. For example, the update 1 13 of level 1 might be swapped with an update 1 13 of level 2, an update 1 13 of level 3 might be swapped with an update 1 13 of level 4, and so on. In some examples, the updates 1 13 are swapped according to their priority values. For example, an update 1 1 3 in level 1 may be swapped with an update 1 13 in level 2 if the update 1 13 in level 1 has a lower priority value than the update 1 13 in level 2. In some examples, block 210 may be executed at higher levels concurrently with the execution of block 206 at the lower levels.
[0028] In some examples, as indicated by diamond 212, if additional updates 1 1 3 are received by the priority queue 104, then method 200 may iterate through additional received updates 1 13 by cycling through blocks 204-210. If no further additional updates 1 13 are received, then the method proceeds to diamond 214.
[0029] In some examples, as indicated by diamond 214, if no additional updates 1 13 are received by priority queue 104, then method 200 may proceed to cycle through blocks 208-210 until no further swaps are performed because all the updates 1 13 are sorted. In some examples, the priority processor may be populated with lower priority values to sort the remaining sorted updates 1 1 3 and identify higher priority updates 1 13 in the priority queue. If all the updates 1 13 are sorted, then the method ends at block 21 6.
[0030] It is to be understood that the process diagram of Fig. 2 is not intended to indicate that all of the elements of the method 200 are to be included in every case. For example, a clock cycle may begin at block 204 and end at block 21 0. In some examples, a clock cycle may begin at block 208, proceed to block 210, then finish with blocks 204 and 206. Further, any number of additional elements not shown in Fig. 2 may be included in the method 200, depending on the details of the specific implementation.
[0031] Fig. 3A is a diagram of an example priority processor initialized with initial values. The configuration of the example priority processor of Fig. 3A is referred to generally by the reference number 300A. The priority processor 300A includes levels 0-7 that are labeled as levels 302-316, respectively. In the example of 300A, levels 302-316 are populated by the value infinity 318.
[0032] In Fig. 3A, the levels 302-316 are populated by the value infinity 31 8 because priority is indicated by higher priority values. For example, the priority queue may be arranged to output the priority updates and store the rest of the updates in a descending order from left to right. In some examples, the priority processor 300A may use replace and delete functions, and not insert functions. By using the replace and delete functions on the priority processor 300A in parallel on all the levels of the priority queue, rather than using insert functions, the priority processor 300A may allow an operation following a replacement or removal in O(1 ), or constant time, instead of O(log n), or logarithmic time. Therefore, the priority processor 300A may efficiently process updates regardless of the total amount of updates to be processed. Furthermore, because a single array is used, the priority processor 300A may use storage space efficiently.
[0033] Fig. 3B is a diagram of an example priority processor fully populated with an ordered array of individual updates 1 13 with priority values displayed. The configuration of the example priority processor in Fig. 3B is referred to generally by the reference number 300B. Updates 320-334 correspond to levels 302-316 of the priority processor, respectively.
[0034] In the diagram of Fig. 3B, the corresponding priority values of updates 320-334 have replaced the value infinity 318 one clock cycle at a time. In some examples, after eight clock cycles, the order of the updates 320-334 is from higher to lower. The original order of the updates 320-334 does not matter because of the swapping function as discussed at greater length in Fig. 3C. Thus, the priority processor 300B is able to efficiently sort updates regardless of their original order. [0035] Fig. 3C is a diagram of an example priority processor receiving a new update. The configuration of the priority processor in Fig. 3C is generally referred to by the reference number 300C. In addition, new update 336 is about to replace update 320 as shown by arrow 338. Update 320 is also about to be identified as a higher priority update and sent to output as shown by arrow 340. In some examples, the output may be sent to a processor or memory as discussed further in Figs. 5 and 6 below.
[0036] In the diagram of Fig. 3C, the fully populated priority processor 300C receives a new individual update 336. The new update 336 is received at level 0 302, also referred to herein as the root level 302. In some examples, the priority processor 300C uses the replace function to replace update 320 at root level 302 with new update 336 and output update 320. In some examples, the update 320 may be output 340 to a processor, memory, or storage device. In some examples, the priority processor 300C may then swap consecutive updates using the replace operation as described in Fig. 3D.
[0037] Fig. 3D is a diagram of an example priority processor having processed an individual update 1 13. The configuration of the priority processor in Fig. 3D is generally referred to by the reference number 300D. A first round of swap and comparisons are indicated by arrows 342 and 344, respectively. A second round of swap and comparisons are indicated by arrows 346 and 348, respectively.
[0038] In the diagram of Fig. 3D, update 336 has shifted two places to the right from root level 302 to level 306. In some examples, the replacement of update 336 with the original update 320 at root level 302 and the shifting of update 336 two levels to the right may be performed by the priority processor 300D within one clock cycle. In some examples, the priority processor 300D may perform two sets of adjacent comparisons and/or swaps. For example, a first set of a swap and comparisons of even-levels with consecutively higher odd-levels indicated by arrows 342 and 344, respectively, results in new update 336 at root level 302 swapping with higher priority update 322 at level 304. Thus, update 322 is then placed into root level 302 and update 336 takes the place of update 322 at level 304. Although comparisons are made as indicated by arrows 344, the priority processor 300D does not perform any swaps because the priority values of these updates indicate that they are already ordered in a descending order of priority. In a second set of swaps and comparisons, as indicated by arrows 346 and 348, update 336 of level 304 is then swapped with higher priority update 324 of level 306. Thus, update 336 moves up to level 306, and update 324 moves down to level 304, the final resulting order of the updates shown in the example of 300D.
[0039] Fig. 3E is a diagram of an example priority processor receiving another individual update 1 13. The configuration of the priority processor in Fig. 3E is generally referred to by the reference number 300E. A new update 350 is to replace update 322 as shown by arrow 352. Update 322 is also to be output by the priority processor 300E as shown by arrow 354.
[0040] In the diagram of Fig. 3E, a new update 350 is to be added to the pipeline processor configuration of 300D. As in 300C, the new update 350 is to replace the existing update 322 of root level 302, the existing update 322 to be output by the priority processor 300E as indicated by arrow 354. However, this time two pairs of swaps will simultaneously follow the replacement of root level 302 as described in further detail with reference to Fig. 3F.
[0041] Fig. 3F is a diagram of an example priority processor having processed the individual updates. The configuration of the priority processor in Fig. 3F is generally referred to by the reference number 300F. Two pairs of swaps 342, 346 are indicated by bold dotted arrows, while comparisons 344, 348 are indicated by lightly dotted arrows.
[0042] In Fig. 3F, both new update 350 of 300E and update 336 of 300C have been shifted up two levels to the right. As discussed in 300D, the priority processor 300F executes two consecutive swaps. However, 300F shows two pairs of consecutive swaps. In some examples, more than one update may simultaneously be swapped with a consecutively higher level update. For example, in 300F update 350 of root level 302 was swapped with update 324 of level 304, and update 336 of level 306 was swapped with update 326 of level 308. In some examples, after the even-level updates compared and/or swapped with consecutively higher odd-level updates, then the odd-level updates are compared with the corresponding
consecutively higher level even-level updates. For example, in the example of 300F, update 350 at level 304 was compared and swapped with update 326 of level 306, and update 336 of level 308 was compared and swapped with update 328 of level 310. Thus, 300F shows the final positions of the two sets of swaps. As updates 330-334 are still ordered properly with respect to each other, no swaps included updates 330-334. However, in some examples, with two additional clock cycles, update 336 may eventually reach level 316 as it is an update with a lower priority. Likewise, in some examples, with an additional clock cycle, priority processor 300F may swap update 350 into level 310 and keep it there until a lower priority update is introduced at later clock cycles. Processing a pipelined array on a priority processor 300F such as an FPGA may result in a higher overall performance. For example, an implemented pipelined array priority queue on an FPGA board produced
benchmarks indicating about a tenfold speedup over software implementations, and about a threefold speedup over pipelined heap designs.
[0043] It is to be understood that the diagrams of Figs. 3A-3F are not intended to indicate that all of the elements of the configurations 300A-300F are to be included in every case. Further, any number of additional elements not shown in Figs. 3A-3F may be included in the configurations 300A-300F, depending on the details of the specific implementation. For example, in configuration 300F, more than two updates may be swapped at the same time with a consecutively higher level, depending on the priority values of the updates.
[0044] Fig. 4 is a process flow diagram illustrating an example method for merging data. The method of Fig. 4 is generally referred to by the reference number 400.
[0045] At block 402, a priority processor receives individual updates 1 1 3. For example, the individual updates 1 13 may be from SCUs 1 10 and are to be prioritized and merged with a current database row.
[0046] At block 404, the priority processor orders the updates 1 1 3 by priority value via a priority queue. For example, the priority value may be a primary key associated with each individual update. In some examples, the priority processor can order the updates 1 13 while concurrently sending higher priority updates 1 13 to the merge processor. In some examples, the DMA can store higher priority updates 1 13 in the BRAM for the priority processor to access higher priority updates.
[0047] At block 406, a merge processor receives a higher priority update from the priority queue. For example, the higher priority update may have had a primary key that is equal to or higher than all the rest of the updates of the priority queue.
[0048] At block 408, the merge processor merges the higher priority update with a corresponding database row. In some examples, merging the higher priority update may include updating multiple fields of the corresponding database row based on a timestamp comparison. In some examples, multiple fields are to be updated in parallel.
[0049] At diamond 410, the merge processor checks if a different database row is indicated by a priority value. For example, a primary key can be used to distinguish between members of different rows. If the primary key indicates that the higher priority update corresponds to the same database row as the prior primary key, then the method proceeds back to 408 wherein the merge processor is to receive additional higher priority updates from the priority queue. In some examples, the merge processor may then merge the additional higher priority updates with their corresponding database rows. In some examples, if the primary key indicates that a higher priority update corresponds to a different row based on the primary key, then the method proceeds to 412.
[0050] At block 412, the merge processor sends the merged database row to memory after receiving a higher priority update corresponding to a different database row. In some examples, the merge processor can also store the merged database in a storage device.
[0051] It is to be understood that the process diagram of Fig. 4 is not intended to indicate that all of the elements of the method 400 are to be included in every case. Further, any number of additional elements not shown in Fig. 4 may be included in the method 400, depending on the details of the specific implementation.
[0052] Fig. 5 is a block diagram of an example computing device 502 to update data. The computing device 502 may include a processor 504, memory 506, a machine-readable storage 508, a network interface card (NIC) 51 0 to connect computing system 102 to network 1 12, a direct memory access (DMA) engine 514, a priority processor 51 6, and a merge processor 51 8.
[0053] In some examples, the processor 504 may be a main processor that is adapted to execute the stored instructions. The processor 504 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations. The processor 504 may be implemented as Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors, x86 Instruction set compatible processors, ARMv7 Instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU).
[0054] In some examples, the memory device 506 may include random access memory (e.g., SRAM, BRAM, DRAM, zero capacitor RAM, SONOS, eDRAM, EDO RAM, DDR RAM, RRAM, PRAM, etc.), read only memory (e.g., Mask ROM, PROM, EPROM, EEPROM, etc.), flash memory, or any other suitable memory systems. As described below, in some examples, the memory may receive identified higher priority data from the priority processor 51 6.
[0055] In some examples, machine-readable storage 508 may be any electronic, magnetic, optical, or other physical storage device that stored executable
instructions. Thus, machine-readable storage medium may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. As described in detail below, machine-readable storage medium 508 may be encoded with executable instructions for prioritizing data. For example, the machine-readable storage medium 508 may be encoded with executable instructions for prioritizing individual updates.
[0056] In some examples, a NIC 510 may connect computing system 502 to a network 51 2. For example, the NIC 51 0 may connect computing system 502 to a local network 512, a virtual private network (VPN), or the Internet. In some examples, the NIC may include an Ethernet controller. In some examples, the Ethernet controller can be leveraged on a field programmable gate array (FPGA) to provide a connection of the pipeline stages using the transmission control protocol (TCP).
[0057] In some examples, the DMA engine 514 may be an embedded DMA controller. The DMA engine can be used to transport data without accessing the processor 504. For example, the direct memory access (DMA) engine 514 may be used to retrieve data from an FPGA-local dynamic random access memory (DRAM) into block random access memory (BRAM) buffers on a field programmable gate array (FPGA). In some examples, the DMA engine 514 may be used to retrieve an individual update 1 13 that is to be placed in the priority queue.
[0058] In examples, the priority processor 516 may be an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other type of specialized processor designed to perform the techniques described herein. For example, an FPGA may be programmed to efficiently prioritize updates 1 13 in a pipelined array as discussed in Figs. 3A-3F above. For example, the priority processor 516 may receive a first update 1 13 from the processor and output a second update residing at a root position of the priority queue 104 and send the second update to the processor and/or the memory and enter the first update at the root position of the priority queue 104. The priority processor 51 6 may also swap the first update residing at the root position of the priority queue 104 with a third update residing at a second position of the priority queue 104 based on a comparison of a first priority value associated with the first update and a second priority value associated with the third update. In some examples, the priority processor 516 may be further configured to swap the first update residing at the second position of the priority queue 1 04 with at least a fourth update residing at least a third position of the priority queue 1 04 based on a comparison of a third priority value associated with the fourth update and the first priority value associated with the first update. In some examples, the swapping between pairs of consecutive odd and even level updates may be executed concurrently.
[0059] In some examples, the priority queue 104 is a data structure that may receive updates 1 13 for sorting according to a priority. In some examples, the priority queue 1 04 may be located on priority processor 516. For example, memory 506 may be a memory associated with priority processor 516. In some examples, the priority queue 104 may be located on storage device 508.
[0060] The block diagram of Fig. 5 is not intended to indicate that the computing device 502 is to include all of the components shown in Fig. 5. Further, the computing device 502 may include any number of additional components not shown in Fig. 5, depending on the details of the specific implementation.
[0061] Fig. 6 is a drawing of an example machine-readable storage medium 600 that may be used to update data. Machine-readable storage medium 600 is connected to processor 602 via bus 604. Machine-readable storage medium 600 also contains data fetch module 606, a priority module 608, and a merge module 610. The machine-readable medium is generally referred to by the reference number 600. The machine-readable medium 600 may comprise Random Access Memory (RAM), a hard disk drive, an array of hard disk drives, an optical drive, an array of optical drives, a non-volatile memory, a Universal Serial Bus (USB) flash drive, a DVD, a CD, and the like. In one implementation of the present techniques, the machine-readable medium 600 may be accessed by a processor 602 over a computer bus 604.
[0062] The various software components discussed herein may be stored on the tangible, non-transitory machine-readable medium 600 as indicated in Fig. 6. For example, a first block 606 may include a data fetch module 606 to retrieve an update. For example, the update can be an individual update 1 13 from an SCU 1 10. A second block 608 may include a priority module to order the updates by priority value via a priority queue. For example, the priority value can be a primary key 1 14 associated with each individual update 1 13. The priority module 608 may further also receive a higher priority update from the priority queue. A third block 61 0 may include a merge module 610 to merge the higher priority update with a
corresponding database row. In some examples, merging the higher priority update may include instructions to update multiple fields of the corresponding database row based on a timestamp comparison. In some examples, the merge module 61 0 may receive additional higher priority updates from the priority queue. In some examples, the merge module 610 may merge the higher priority updates with the corresponding database row. In some examples, the merge module 610 may also send the merged database row to a memory after receiving a higher priority update corresponding to a different database row. In some examples, the instructions to update the multiple fields are to be executed in parallel. In some examples, the merge module 61 0 may store the merged database row to a storage device.
[0063] Although shown as contiguous blocks, the software components may be stored in any order or configuration. For example, if the computer-readable medium 600 is a hard drive, the software components may be stored in non-contiguous, or even overlapping, sectors.
[0064] The present techniques are not restricted to the particular details listed herein. Indeed, those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present techniques.

Claims

CLAIMS What is claimed is:
1 . A computing system for updating data, comprising:
a priority processor to:
receive an individual update;
output a prioritized update residing at a root position of a priority queue and replace the prioritized update with a new individual update; and
sort the individual updates in the priority queue based on their priority values; and
a merge processor to:
receive the sorted updates from the priority processor and merge the sorted updates into merged updates.
2. The computing system of claim 1 , the individual updates corresponding to database entries to be updated.
3. The computing system of claim 2, further comprising a direct memory access (DMA) engine to retrieve an individual update from a self-consistent update (SCU) to be placed in the priority queue.
4. The computing system of claim 3, the individual update comprising a primary key to be used as the priority value and self-consistent update (SCU) source ID.
5. The computing system of claim 1 , further comprising a block random access memory (BRAM) to provide temporary storage for a plurality of individual updates of each SCU.
6. A method for updating data, comprising:
receiving updates;
ordering the updates by priority value via a priority queue; receiving a higher priority update from the priority queue; and merging the higher priority update with a corresponding database row.
7. The method of claim 6, merging the higher priority update comprising updating multiple fields of the corresponding database row based on a timestamp comparison.
8. The method of claim 7, wherein the multiple fields are to be updated in parallel.
9. The method of claim 6, further comprising:
receiving additional higher priority updates from the priority queue;
merging the additional higher priority updates with the corresponding
database row; and
sending the merged database row to a memory after receiving a higher
priority update corresponding to a different database row.
10. The method of claim 9, further comprising storing the merged database row to a memory after receiving a higher priority update corresponding to a different database row.
1 1 . A non-transitory machine-readable storage medium for updating data encoded with instructions executable by a processor, the machine-readable storage medium comprising:
instructions to retrieve updates;
instructions to order the updates by priority value via a priority queue; instructions to receive a higher priority update from the priority queue; and
instructions to merge the higher priority update with a corresponding database row.
12. The non-transitory machine-readable storage medium of claim 1 1 , merging the higher priority update further comprising instructions to update multiple fields of the corresponding database row based on a timestamp comparison.
13. The non-transitory machine-readable storage medium of claim 12, wherein the instructions to update the multiple fields are to be executed in parallel.
14. The non-transitory machine-readable storage medium of claim 1 1 , further comprising:
instructions to receive additional higher priority updates from the priority queue to generate a merged database row;
instructions to merge the higher priority updates with the corresponding database row; and
instructions to send the merged database row to a memory after receiving a higher priority update corresponding to a different database row.
15. The non-transitory machine-readable storage medium of claim 1 1 , further comprising instructions to send the merged database row to a memory after receiving a higher priority update corresponding to a different database row.
PCT/US2014/049264 2014-07-31 2014-07-31 Data merge processing WO2016018400A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/049264 WO2016018400A1 (en) 2014-07-31 2014-07-31 Data merge processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/049264 WO2016018400A1 (en) 2014-07-31 2014-07-31 Data merge processing

Publications (1)

Publication Number Publication Date
WO2016018400A1 true WO2016018400A1 (en) 2016-02-04

Family

ID=55218114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/049264 WO2016018400A1 (en) 2014-07-31 2014-07-31 Data merge processing

Country Status (1)

Country Link
WO (1) WO2016018400A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111935100A (en) * 2020-07-16 2020-11-13 锐捷网络股份有限公司 Flowspec rule issuing method, device, equipment and medium
US11755926B2 (en) 2019-02-28 2023-09-12 International Business Machines Corporation Prioritization and prediction of jobs using cognitive rules engine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016914B2 (en) * 2002-06-05 2006-03-21 Microsoft Corporation Performant and scalable merge strategy for text indexing
US20090259617A1 (en) * 2008-04-15 2009-10-15 Richard Charles Cownie Method And System For Data Management
US20100281013A1 (en) * 2009-04-30 2010-11-04 Hewlett-Packard Development Company, L.P. Adaptive merging in database indexes
US20120254173A1 (en) * 2011-03-31 2012-10-04 Goetz Graefe Grouping data
US20130226967A1 (en) * 2011-01-20 2013-08-29 John N Gross Data acquisition system with on-demand and prioritized data fetching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016914B2 (en) * 2002-06-05 2006-03-21 Microsoft Corporation Performant and scalable merge strategy for text indexing
US20090259617A1 (en) * 2008-04-15 2009-10-15 Richard Charles Cownie Method And System For Data Management
US20100281013A1 (en) * 2009-04-30 2010-11-04 Hewlett-Packard Development Company, L.P. Adaptive merging in database indexes
US20130226967A1 (en) * 2011-01-20 2013-08-29 John N Gross Data acquisition system with on-demand and prioritized data fetching
US20120254173A1 (en) * 2011-03-31 2012-10-04 Goetz Graefe Grouping data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11755926B2 (en) 2019-02-28 2023-09-12 International Business Machines Corporation Prioritization and prediction of jobs using cognitive rules engine
CN111935100A (en) * 2020-07-16 2020-11-13 锐捷网络股份有限公司 Flowspec rule issuing method, device, equipment and medium
CN111935100B (en) * 2020-07-16 2022-05-20 锐捷网络股份有限公司 Flowspec rule issuing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US8751556B2 (en) Processor for large graph algorithm computations and matrix operations
CN103902702A (en) Data storage system and data storage method
CN102566976B (en) Register renaming system and method for managing and renaming registers
US11132383B2 (en) Techniques for processing database tables using indexes
JP2016524255A5 (en)
CN103902698A (en) Data storage system and data storage method
RU2019111895A (en) TRAVELING IN THE DATABASE OF SMART CONTRACTS THROUGH THE LOGICAL MAP
CN103902701A (en) Data storage system and data storage method
US9892149B2 (en) Parallelized in-place radix sorting
WO2013128333A1 (en) Finding a best matching string among a set of stings
CN107180031B (en) Distributed storage method and device, and data processing method and device
EP2751667A1 (en) Parallel operation on b+ trees
US9047363B2 (en) Text indexing for updateable tokenized text
WO2016018400A1 (en) Data merge processing
US9858040B2 (en) Parallelized in-place radix sorting
US8667008B2 (en) Search request control apparatus and search request control method
CN112000845B (en) Hyperspatial hash indexing method based on GPU acceleration
US20080306948A1 (en) String and binary data sorting
WO2015199734A1 (en) Buffer-based update of state data
CN110795469A (en) Spark-based high-dimensional sequence data similarity query method and system
US20170262482A1 (en) Data management system, data management device, data management method, and storage medium
WO2016018399A1 (en) Prioritization processing
CN104598567A (en) Data statistics and de-duplication method based on Hadoop MapReduce programming frame
CN110348693B (en) Multi-robot rapid task scheduling method based on multi-core computing
CN108897787B (en) SIMD instruction-based set intersection method and device in graph database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14898692

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14898692

Country of ref document: EP

Kind code of ref document: A1