WO2015122925A1 - Free flow assurance of atomicity of multiple updates in persistent memory - Google Patents

Free flow assurance of atomicity of multiple updates in persistent memory Download PDF

Info

Publication number
WO2015122925A1
WO2015122925A1 PCT/US2014/016634 US2014016634W WO2015122925A1 WO 2015122925 A1 WO2015122925 A1 WO 2015122925A1 US 2014016634 W US2014016634 W US 2014016634W WO 2015122925 A1 WO2015122925 A1 WO 2015122925A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
update
cache
count
transactions
Prior art date
Application number
PCT/US2014/016634
Other languages
French (fr)
Inventor
Boris Zuckerman
Alistair Veitch
Douglas L VOIGT
Harold Woods
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2014/016634 priority Critical patent/WO2015122925A1/en
Publication of WO2015122925A1 publication Critical patent/WO2015122925A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • G06F9/467Transactional memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1008Correctness of operation, e.g. memory ordering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction

Definitions

  • Persistent memory may be main memory implemented using non- volatile memory (NVM) technologies, such as flash, resistive random-access memory (RRAM), phase-change RAM (PCRAM), and memristor. PM may be durable across power cycles. PM may also be directly addressable by processors at byte granularity.
  • NVM non- volatile memory
  • RRAM resistive random-access memory
  • PCRAM phase-change RAM
  • Fig. 1 is a block diagram illustrating a device that propagates atomic updates in examples of the present disclosure
  • Fig. 2 is a block diagram illustrating components in a first cache to propagate atomic updates in examples of the present disclosure
  • Fig. 3 is a flowchart of a method for a CPU of Fig. 1 to execute a transact command to manage a monotonically incremented transaction ID and update counters in examples of the present disclosure
  • Fig. 4 is a flowchart of a method for the CPU of Fig. 1 to execute the transact command in a thread of execution to process a transaction's updates that are transferred into the first cache in examples of the present disclosure;
  • Fig. 5 is a flowchart of a method for the CPU of Fig. 1 to execute the transact command to process a transaction's updates that are being transferred out from first cache in examples of the present disclosure
  • Figs. 6 and 7 are block diagrams illustrating logically collapsing transactions in the first cache of Fig. 1 in examples of the present disclosure
  • Fig. 8 is a flowchart of a method for the CPU of Fig. 1 to collapse transactions in examples of the present disclosure
  • Fig. 9 is a flowchart of a method for the CPU of Fig. 1 to process updates of collapsed transactions being transferred out from the first cache in examples of the present disclosure
  • Fig. 10 is a block diagram of components in a persistent memory (PM) cache of Fig. 1 to propagate atomic updates in examples of the present disclosure;
  • PM persistent memory
  • Fig. 1 1 is a flowchart of a method for a PM controller of Fig. 1 to process transactions being transferred into the PM cache of Fig. 1 in examples of the present disclosure
  • Fig. 12 is a flowchart of a method for the PM cache controller of Fig. 1 to process updates being transferred into the PM cache of Fig. 1 in examples of the present disclosure
  • Figs. 13 and 14 are block diagrams illustrating processing collapsed transactions in the PM cache of Fig. 1 in examples of the present disclosure
  • Fig. 15 is a flowchart of a method for the PM controller of Fig. 1 to process updates of collapsed transactions being transferred in the PM cache of Fig. 1 in examples of the present disclosure
  • Fig. 16 is a flowchart of a method for the PM controller of Fig. 1 to process transactions left in the PM cache of Fig. 1 after a power failure or a system crash in examples of the present disclosure
  • Fig. 17 is a block diagram of a device for implementing the CPU or the PM controller of Fig. 1 in examples of the present disclosure.
  • the term “includes” means includes but not limited to, the term “including” means including but not limited to.
  • the terms “a” and “an” are intended to denote at least one of a particular element.
  • the term “based on” means based at least in part on.
  • the term “or” is used to refer to a nonexclusive such that “A or B” includes “A but not B,” “B but not A,” and “A and B” unless otherwise indicated.
  • CPU central processing unit
  • PM Persistent memory
  • RAM volatile random access memory
  • PM expands the ability of applications to preserve their states so they may be suspended and quickly restarted.
  • method and apparatus are provided to assure atomicity of updates through caches to a PM.
  • the instruction set of a processor is expanded to include commands that signal explicitly or implicitly the beginning and the end of a transaction, which is a set of updates to be committed or dropped together.
  • the processor assigns a transaction ID to the transaction, which is appended to cache elements (e.g., cache line) that store the transaction's updates), and begins to track the transaction's update count in a volatile first cache, which is the number of the transaction's updates that are currently in the first cache.
  • the processor monotonically increments the transaction ID value for each transaction to capture their incoming sequence.
  • the processor marks update requests that transfer the updates of the transaction from the first cache to a nonvolatile second cache with the state of the transaction and the update count. After the transaction ends, the update count is equal to the number of outstanding updates from the transaction that the second cache is waiting to receive from the first cache.
  • the cache controller may save the update count in the update request as the undelivered count in the second cache, which the cache controller decrements for each update of the transaction received from the first cache.
  • the cache controller may check for errors by comparing the update counts delivered in the update requests and the undelivered count kept in the second cache.
  • a cache controller does not allow updates from the transaction to be flushed to a PM until the transaction is closed and the undelivered count is equal to zero (0). This ensures the atomicity of the updates so they are committed or dropped together. Even then the cache controller does not allow the updates from the transaction to be flushed when prior transactions, as determined from their transaction IDs, have been flushed. This ensures strict ordering of the transactions based on their incoming sequence.
  • the processor collapses prior transactions from the new transaction up to the newer transaction into the newer transaction by transferring the update counts in the prior transactions to the new transaction and creating redirects (e.g., a negative index shift stored in an update counter that leads to another update update) from the prior transactions to the new transaction.
  • Any update from collapsed transactions that is transferred from the first cache to the second cache includes transaction IDs of the new and the old transactions in its update request.
  • the cache controller processes the collapse by pointing the new transaction to a linked list of updates of the old transaction, and then linking the other linked lists of the collapsed transaction in the incoming sequence of the transactions.
  • Fig. 1 is a block diagram illustrating a device 100 that propagates atomic updates in examples of the present disclosure.
  • Device 100 includes a central processing unit (CPU) 102, a volatile first cache 104, a PM memory controller 105, a PM cache 106, and a PM 108.
  • CPU central processing unit
  • PM memory controller 105 volatile first cache 104
  • PM cache 106 volatile second cache 108
  • PM 108 PM cache 108
  • CPU 102 runs a thread of execution that includes transaction markers, which are commands that separate one transaction from another.
  • a transaction is a set of related updates that are to be committed together to PM 108 or dropped as a unit.
  • the transaction markers include a "transact” command and an "etransact” command.
  • the transact command "opens” a new transaction by indicating the start of a new transaction and may implicitly "close” a previous transaction by indirectly indicating the end of the previously transaction.
  • the etransact command closes a transaction by indicating the end of a transaction. Issuing the transact command, a thread of execution is requesting a contract with the caching environment that updates made after it and until the etransact command (or the next transact command) are to be committed or dropped together. Note that a single thread of execution may have only one active transaction, but multiple threads running on multiple CPUs may have many active transactions. In a single thread, the start of a new transaction means the termination of a previous transaction.
  • Fig. 2 is a block diagram illustrating components in first cache 104 to propagate atomic updates in examples of the present disclosure. These components include a monotonically incremented transaction ID (M-TID) 202, an array of update counters 204-1 to 204-n (hereafter collectively as “update counters 204" and an individual, generic “update counter 204"), and indices (illustrated as arrowed lines) from updates stored in cache elements (e.g., cache lines) to the array of update counters 204.
  • M-TID monotonically incremented transaction ID
  • update counters 204" an array of update counters 204-1 to 204-n
  • indices illustrated as arrowed lines
  • CPU 102 increments and assigns the value of M-TID 202 to the transaction as its transaction ID.
  • CPU 102 may append the assigned transaction ID in the cache elements where the updates are stored.
  • CPU 102 also assigns an update counter 204 to the transaction and increment update counter 204 by one for each update in the transaction.
  • CPU 102 provides indices from the cached updates to the assigned update counter 204 in the array. For example, CPU 102 appends a transaction ID of 1 to cached updates 206 and 208 of a first transaction, provides indices 210 and 212 from cached updates 206 and 208 to update counter 204-1 in the array, and increments update counter 204-1 twice.
  • CPU 102 appends a transaction ID of 2 to cached updates 214 and 216 of a second transaction, provides indices 218 and 220 from cached updates 214 and 216 to update counter 204-2 in the array, and increments update counter 204-2 twice. Furthermore, CPU 102 appends a transaction ID of n to a cached update 222 of an nth transaction, provides index 224 from cached update 222 to update counter 204- n in the array, and increments update counter 204-n once.
  • CPU 102 decrements the corresponding update counter 204 and sends the update to the next cache with its transaction ID, cache address, and the update count in its update counter 204 in an update request.
  • CPU 102 may also mark the update as open or closed. Note that once a transaction is closed, the update count kept is the number of undelivered updates to the next cache. If the update is the last update, CPU 102 may also mark the update as final.
  • Fig. 3 is a flowchart of a method 300 for CPU 102 (Fig. 1) to execute the transact command to manage M-TID 202 and update counters 204 in examples of the present disclosure.
  • Method 300 may begin in block 302.
  • CPU 102 gets and increments the value of M-TID 202 (Fig. 2). This value is to be assigned as the transaction ID of a new transaction declared by the transact command.
  • Block 302 may be followed by block 304.
  • CPU 102 determines if the count at the update counter 204 (Fig. 2) for the new transaction is equal to zero. When update counter 204 is not equal to 0, then M-TID 202 has wrapped around to an old transaction with updates in first cache 104 (Fig. 1). In such a case, block 304 may be followed by block 306. If update counter 304 is equal to zero, method 300 may end.
  • CPU 102 flushes first cache 104 until update counter 204 is equal to zero.
  • CPU 102 may partially or fully flush first cache 104.
  • Method 300 may end at block 306.
  • FIG. 4 is a flowchart of a method 400 for CPU 102 (Fig. 1) to execute the transact command in a thread of execution to process a transaction's updates that are transferred into first cache 104 (Fig. 1) in examples of the present disclosure.
  • Method 400 may begin in block 402.
  • CPU 102 processes each of the transaction's updates that are transferred into first cache 104.
  • Block 402 may be followed by block 404.
  • CPU 102 appends a transaction ID to the cache element storing the update, provide an index from the cached update to update counter 204, and increment update counter 204 by one.
  • Block 404 may loop back to block 402 to process another update of the transaction until CPU 102 reaches the etransact command or another transact command in the thread of execution.
  • FIG. 5 is a flowchart of a method 500 for CPU 102 (Fig. 1) to execute the transact command to process a transaction's updates that are being transferred out from first cache 104 (Fig. 1) in examples of the present disclosure.
  • Method 500 may begin in block 502.
  • CPU 102 processes each of the transaction's cached updates that are being transferred out from first cache 104.
  • Block 502 may be followed by block 504.
  • CPU 102 decrement update counter 204 by one. Block 504 may be followed by block 506. [0025] In block 506, CPU 102 determines if the transaction is closed. CPU 102 knows the transaction is closed if it has processed the etransact command or another transact command in the thread of execution. If the transaction is not closed, block 506 may be followed by block 508. Otherwise block 506 may be followed by block 510.
  • CPU 102 marks the update as "open.”
  • Block 508 may be optional if an unmarked update indicates an open transaction. Block 508 may be followed by block 516.
  • CPU 102 may mark the update as "closed.” Block 510 may be followed by block 512.
  • CPU 102 determines if the update is the last update in the transaction. CPU 102 knows the update is the last update when the update count in update counter 204 is equal to zero (0). If so, block 512 may be followed by block 514. Otherwise block 512 may be followed by block 516.
  • CPU 102 sends the update with its transaction ID, cache address, cache content, update count, and possible flag (open, final, etc).to the next cache, such as PM cache 106 (Fig. 1).
  • Figs. 6 and 7 are block diagrams illustrating logically collapsing transactions in first cache 104 in examples of the present disclosure.
  • transactions having transaction IDs of 601, 602, 603, 604, and 605 are assigned to respective update counters 204-1, 204-2, 204-3, 204-4, and 204-5, which have respective counts of 3, 5, 0, 1, and 0.
  • Assume the current transaction 605 is to update a cached update 650 already pointing to update counter 204-1 of old transaction 601.
  • CPU 102 when CPU 102 detects that the current transaction 605 is to modify cached update 650 already pointing to update counter 204-1 of old transaction 601, CPU 102 collapses all transactions from old transaction 601 pointed by cached update 650 to current transaction 605. To record the collapse of transactions 601 to 6005, CPU 102 transfers counts from update counters 204-1, 204-2, 204-3, and 204-4 of respective transactions 601 to 604 to update counter 204-5 of current transaction 605. Thus update counter 204-5 now has an update count of 9 (3 + 5 + 0 + 1 + 0). Furthermore CPU 102 leaves negative redirects in update counters 204-1, 204-2, 204-3, and 204-4 to update counter 204-5 in the form of a negative index shift.
  • update counter 204- 1 has an update count of -4
  • update counter 204-4 has an update count of -3
  • update counter 204-3 has an update count of 0 as it has no updates
  • update counter 204-2 has an update count of -1.
  • CPU 102 When the current transaction 605 is closed and its update counter 204-5 becomes zero (0), CPU 102 zeroes out update counters 204-1 to 204-4 of transactions 601 to 604 to remove the negative redirects and indicate all the updates in the collapsed transactions have been transferred out from first cache 104. To efficiently do so, CPU 102 may reserve another update counter 204-6 in the array with an index to the first update counter 204- 1 in the range of collapsed transactions.
  • CPU 102 notifies the next cache all the collapsed transaction so the next cache can maintain monotonicity of the transaction IDs. To do so, CPU 102 includes the transaction IDs of the oldest transaction 601 and the newest transaction 605 to describe the range of the collapsed transactions in update requests to the next cache. Additional details may be omitted for the sake of clarity.
  • FIG. 8 is a flowchart of a method 800 for CPU 102 (Fig. 1) to collapse transactions in examples of the present disclosure.
  • Method 800 may begin in block 802.
  • CPU 102 determines if the current transaction is to modify a cached update pointing to an update counter of an old transaction. If so, block 802 may be followed by block 804. Otherwise block 802 loops back to itself.
  • CPU 102 transfers counts of all prior transactions from the old transaction to the current transaction into the current transaction.
  • CPU 102 also modifies the update counters of the prior transactions with respective negative redirects to the update counter of the current transaction.
  • CPU 102 may also set a flag to indicate a transaction is a collapsed transaction.
  • Fig. 9 is a flowchart of a method 900 for CPU 102 (Fig. 1) to process updates of collapsed transactions being transferred out from first cache 104 (Fig. 1) in examples of the present disclosure.
  • Method 900 includes blocks performed in addition to the blocks in method 500 to process updates being transferred out from first cache 104.
  • Method 900 may begin in block 902.
  • CPU 102 determines if a cached update being transferred out is from a collapsed transaction.
  • CPU 102 may determine if a cached update being transferred out is from a collapsed transaction when the cached update points to an update counter with a negative redirect or otherwise flagged as a collapsed transaction. If so, block 902 may be followed by block 904. Otherwise block 902 loops back to itself.
  • CPU 102 determines if the newest of the collapsed transactions is closed and if the update is the last update from newest collapsed transaction. If so, block 904 may be followed by block 906. Otherwise block 904 may be followed by block 908.
  • the update is the last update from the newest collapsed transaction when the update counter of the newest collapsed transaction has an update count of zero (0).
  • Block 906 may be followed by block 908.
  • CPU 102 sends an update request with the transaction IDs of the oldest and the newest of the collapsed transactions.
  • Block 908 may loop back to block 902 to process another cached update from the collapsed transactions.
  • Fig. 10 is a block diagram of components in PM cache 106 (Fig. 1) to propagate atomic updates in examples of the present disclosure.
  • PM cache 106 includes an array 1050 of transaction elements.
  • Each transaction element includes an undelivered counter of undelivered updates for a transaction and a pointer to the first cached update of the transaction.
  • the undelivered counter may have three (3) types of values reflecting specific states of a transaction.
  • the undelivered count is equal to zero (0), the transaction is either free when it does not point to a cached update or it may be flushed to PM 108 when all previous transactions have been flushed.
  • the undelivered count is equal to negative one (-1), the transaction is open.
  • the transaction When the undelivered count is greater than zero (0), the transaction has been closed by a thread of execution and the count reflects the number of undelivered updates to PM cache 106 for the transaction.
  • transaction elements for transactions 1001, 1002, 1003, and 1004 have respective counter 1052-1, 1052-2, 1052-3, and 1052-4 with respective undelivered counts of 0, 0, 3, and -1.
  • the pointer of each transaction element points to the first cached update of a transaction.
  • the updates of the transaction are organized as a linked list with each update pointing the subsequent update.
  • the pointer 1504-2 of a transaction element for transaction 1002 (also referred to as "transaction element 1002") points to the first cached update 1056 of transaction 1002, which points to the second cached update 1508 of transaction 1002.
  • the pointer 1504-3 of transaction element 1003 points to the first cached update 2060 of transaction 1003.
  • the pointer 1504-4 of transaction element 1004 points to the first cached update 1062 of transaction 1004, which points to the second cached update 1064 of transaction 1004.
  • Fig. 1 1 is a flowchart of a method 1 100 for PM controller 105 (Fig. 1) to process transactions being transferred into PM cache 106 (Fig. 1) in examples of the present disclosure.
  • Method 1 100 may begin in block 1102.
  • PM cache controller 105 determines if it detects a new transaction being transferred into PM cache 106. For example, PM cache controller 105 receives an update request to PM cache 106 and the transaction ID of the update is not associated with any transaction element in PM cache 106. Such an update would be the first update of the transaction. If PM cache controller 105 detects a new transaction being transferred into PM cache 106, block 1102 may be followed by block 1 104. Otherwise block 1102 may loop back to itself.
  • PM cache controller 105 associates the transaction to a transactional element in array 1050, indicates the transaction is open or close by setting the undelivered count of counter 1052, and sets a pointer 1054 to the first cached update of the transaction. If the update is marked open in the update request, PM cache controller 105 sets counter 1052 equal to negative one (-1). If the update is marked closed in the update request, PM cache controller 105 sets counter 1052 equal to the count in the update request. Block 1 104 may loop back to block 1102 to process another new transaction.
  • Fig. 12 is a flowchart of a method 1200 for PM cache controller 105 (Fig. 1) to process updates being transferred into PM cache 106 (Fig. 1) in examples of the present disclosure.
  • Method 1200 may begin in block 1202.
  • PM cache controller 105 determines if it detects an update of a transaction being transferred into PM cache 106. If so, block 1202 may be followed by block 1204. Otherwise block 1202 may loop back to itself.
  • PM cache controller 105 adds a pointer from the prior cached update of the same transaction to the cached update to form part of a linked list of cached updates of the transaction.
  • Block 1204 may be followed by block 1206.
  • PM cache controller 105 determines if the update is marked closed in the update request to indicate the transaction is closed. If so, block 1206 may be followed by block 1208. Otherwise block 1206 may loop back to block 1202 to process another update.
  • PM cache controller 105 determines if the transaction is marked open in the transactional element.
  • the transaction may be marked open in the transactional element if the undelivered count in the counter of the transactional element is set to negative one (-1).
  • block 1208 may be followed by block 1210. Otherwise block 1208 may be followed by block 1212.
  • PM cache controller 105 marks the transaction as closed in the transactional element.
  • PM cache controller 105 may mark the transaction as closed in the transactional element by setting the undelivered count in the counter of the transactional element equal to the count in the update request.
  • Block 1210 may be followed by block 1218.
  • PM cache controller 105 decrements the undelivered count in the counter in the transactional element by one to reflect that one of the updates for the transaction has been received.
  • Block 1212 may be followed by block 1214.
  • PM cache controller 105 determines if the undelivered count is equal to the update count in the update request. If they are not the same, then an error has occurred and block 1214 may be followed by block 1216. Otherwise block 1214 may be followed by block 1218.
  • PM cache controller 105 processes the error.
  • PM cache controller 105 may run a diagnostic test to determine and correct the error.
  • Block 1216 may loop back to block 1202 to process another update.
  • PM cache controller 105 determines if the update is the last update of the transaction. The update is the last update of the transaction when the update is marked final or the undelivered count is equal to zero (0). The final label of an update and the zero undelivered count may both be used to safeguard the independent accounting of the two caches. If the update is the last update of the transaction, block 1218 may be followed by block 1220. Otherwise block 1218 may loop back to block 1202 to process another update.
  • PM cache controller 105 determines if all previous transactions have been flushed.
  • a previous transaction is any transaction with a smaller transaction ID.
  • a previous transaction has been flushed when both their undelivered counts and pointers are equal to zero (0). If all previous transactions have not been flushed, then block 1220 may be followed by block 1222. Otherwise block 1220 may be followed by block 1224.
  • PM cache controller 105 does not allow the flushing of the transaction in order to enforce strict ordering of the transactions based on their incoming sequence.
  • Block 1216 may loop back to block 1202 to process another update. This may trigger release and flush of this and consequent transactions.
  • PM cache controller 105 allows flushing of the transaction up to any subsequent transactions that also have their undelivered count equal to zero (0). Note that after flushing the transaction, the pointer to the first cached update of the transaction is replaced with a zero (0) where zeroes in the undelivered count and the pointer indicate the transactional element is free. Block 1224 may loop back to block 1202 to process another update. This may trigger release and flush of this and consequent transactions.
  • Figs. 13 and 14 are block diagrams illustrating processing collapsed transactions in PM cache 106 in examples of the present disclosure.
  • Fig. 13 shows a transaction 1301 that may be flushed and three closed transactions 1302, 1303, and 1304.
  • 1302, 1303, and 1304 are assigned to transaction elements with respective counters 1350-1, 1350-2, 1350-3, and 1350-4, which have respective counts of 0, 1, 2, and 1.
  • the counters 1350-1, 1350-2, 1350-3, and 1350-4 which have respective counts of 0, 1, 2, and 1.
  • transactional elements 1301, 1302, 1303, and 1304 also have respective pointers 1352-1, 1352-2, 1352-3, and 1352-4 to the first cached updates of respective transactions 1301, 1302,
  • PM controller 105 When PM controller 105 receives the update request with the collapse indicator, PM controller 105 transfers the pointer 1352-2 of the oldest collapsed transaction 1302 to the newest collapsed transaction 1305 and links the lists of collapsed transactions 1302, 1303, 1304, and 1305 in descending order by age. PM controller 105 then zero out all counts of the prior collapsed transactions 1302, 1303, and 1304 before the newest collapsed transaction 1305.
  • Fig. 15 is a flowchart of a method 1500 for PM controller 105 (Fig. 1) to process updates of collapsed transactions being transferred in PM cache 106 (Fig. 1) in examples of the present disclosure.
  • Method 1500 includes blocks performed in addition to the blocks in method 1200.
  • Method 1500 may begin in block 1502.
  • PM controller 105 determines if it has received an update request with two (2) transaction IDs. If so, block 1502 may be followed by block 1504. Otherwise block 1502 loops back to itself.
  • PM controller 105 transfers the pointer to first update in the oldest collapsed transaction to the newest collapsed transaction.
  • Block 1504 may be followed by block 1506.
  • PM controller 105 removes the pointers from the prior collapsed transactions before the newest collapsed transaction and links the lists of the collapsed transactions based on the incoming sequence of their transactions.
  • Block 1506 may be followed by block 1508.
  • PM controller 105 zeroes out undelivered counts in the prior collapsed transactions before the newest collapsed transaction.
  • Block 1508 may loop back to block 1502 to process another update request for collapsed transactions.
  • Fig. 16 is a flowchart of a method 1600 for PM controller 105 (Fig. 1) to process transactions left in PM cache 106 (Fig. 1) after a power failure or a system crash in examples of the present disclosure.
  • Method 1600 may begin in block 1602.
  • PM controller 105 processes the transactions in PM cache 106 in the order of increasing transaction IDs (M-TIDs).
  • Block 1602 may be followed by block 1604.
  • PM controller 105 determines if the undelivered count of the transaction is equal to zero (0), which indicates the transaction is closed and all updates of the transaction are in PM cache 106. If so, block 1604 may be followed by block 1606.
  • PM controller 105 flushes the updates of the transaction to PM 108.
  • Block 1606 may loop back to block 1602 to process another transaction.
  • PM controller 105 discards the transactions and all transactions that have higher transaction IDs in order to enforce strict ordering of the transactions based on their incoming sequence.
  • Fig. 17 is a block diagram of a device 1600 for implementing CPU 102 or PM controller 105 (Fig. 1) in examples of the present disclosure.
  • Code 1702 for propagating atomic updates is stored in a non-transitory computer medium 1704, such as a read-only memory.
  • a microprocessor 1706 executes instructions 1702 to provide the described features and functionalities.
  • system 100 may include intermediate volatile or persistent caches between first cache 104 and PM cache 106 in some examples of the present disclosure. These intermediate caches may behave similarly to first cache 104 as described above.
  • System 100 may include multiple CPUs 102 in some examples of the present disclosure.
  • a CPU 102 running a thread of execution that declares a transaction may process all updates in the transaction without evicting or relocating the thread to another CPU 102.
  • the thread may be put in uninterruptible state. If a CPU 102 is to evict or relocate a thread of execution that declares a transaction, CPU 102 may include the necessary context to ensure consistency.
  • system 100 may throw an exception to the contending thread that is trying to access or modify a cache element that was already modified in a transaction opened by another thread.
  • system 100 allows the contending thread to wait and retry. This mode may include a deadlock detection mechanism that would throw an exception to the contending thread when deadlock is detected.
  • system 100 ignores the transaction boundaries and relies on application serialization for isolation.

Abstract

A method is provided for a device to assure atomicity of updates in persistent memory with free flow of the updates through multiple caches. The method includes, in a first cache, assigning a transaction ID to a transaction comprising a set of updates that are to be committed or dropped together as a unit, incrementing an update count for each update of the transaction transferred into the first cache. The method further includes, for each update of the transaction being transferred from the first cache to a second cache based on persistent memory, decrementing the update count, marking the update as closed in an update request after the transaction is closed,, and including the transaction ID and the update count in a update request to transfer to update from the first cache to the second cache.

Description

FREE FLOW ASSURANCE OF ATOMICITY OF MULTIPLE UPDATES IN
PERSISTENT MEMORY
BACKGROUND
[0001] Persistent memory (PM) may be main memory implemented using non- volatile memory (NVM) technologies, such as flash, resistive random-access memory (RRAM), phase-change RAM (PCRAM), and memristor. PM may be durable across power cycles. PM may also be directly addressable by processors at byte granularity.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] In the drawings:
Fig. 1 is a block diagram illustrating a device that propagates atomic updates in examples of the present disclosure;
Fig. 2 is a block diagram illustrating components in a first cache to propagate atomic updates in examples of the present disclosure;
Fig. 3 is a flowchart of a method for a CPU of Fig. 1 to execute a transact command to manage a monotonically incremented transaction ID and update counters in examples of the present disclosure;
Fig. 4 is a flowchart of a method for the CPU of Fig. 1 to execute the transact command in a thread of execution to process a transaction's updates that are transferred into the first cache in examples of the present disclosure;
Fig. 5 is a flowchart of a method for the CPU of Fig. 1 to execute the transact command to process a transaction's updates that are being transferred out from first cache in examples of the present disclosure;
Figs. 6 and 7 are block diagrams illustrating logically collapsing transactions in the first cache of Fig. 1 in examples of the present disclosure;
Fig. 8 is a flowchart of a method for the CPU of Fig. 1 to collapse transactions in examples of the present disclosure;
Fig. 9 is a flowchart of a method for the CPU of Fig. 1 to process updates of collapsed transactions being transferred out from the first cache in examples of the present disclosure;
Fig. 10 is a block diagram of components in a persistent memory (PM) cache of Fig. 1 to propagate atomic updates in examples of the present disclosure;
Fig. 1 1 is a flowchart of a method for a PM controller of Fig. 1 to process transactions being transferred into the PM cache of Fig. 1 in examples of the present disclosure;
Fig. 12 is a flowchart of a method for the PM cache controller of Fig. 1 to process updates being transferred into the PM cache of Fig. 1 in examples of the present disclosure;
Figs. 13 and 14 are block diagrams illustrating processing collapsed transactions in the PM cache of Fig. 1 in examples of the present disclosure;
Fig. 15 is a flowchart of a method for the PM controller of Fig. 1 to process updates of collapsed transactions being transferred in the PM cache of Fig. 1 in examples of the present disclosure;
Fig. 16 is a flowchart of a method for the PM controller of Fig. 1 to process transactions left in the PM cache of Fig. 1 after a power failure or a system crash in examples of the present disclosure; and
Fig. 17 is a block diagram of a device for implementing the CPU or the PM controller of Fig. 1 in examples of the present disclosure.
[0003] Use of the same reference numbers in different figures indicates similar or identical elements.
DETAILED DESCRIPTION
[0004] As used herein, the term "includes" means includes but not limited to, the term "including" means including but not limited to. The terms "a" and "an" are intended to denote at least one of a particular element. The term "based on" means based at least in part on. The term "or" is used to refer to a nonexclusive such that "A or B" includes "A but not B," "B but not A," and "A and B" unless otherwise indicated. [0005] A central processing unit (CPU) accesses its main memory through a CPU cache to improve performance. Persistent memory (PM) may be used instead of volatile random access memory (RAM) for main memory. Using PM expands the ability of applications to preserve their states so they may be suspended and quickly restarted. However, it may be difficult to maintain the consistency of the states left in the PM after power is lost while an application is in the middle of performing a set of related updates to be committed or dropped together. It may also be difficult to maintain the consistency of the states left in the PM after power is lost while a CPU flushes its cache into a PM controller or when the PM controller moves data into PM cells because the updates may be reordered by the CPU cache or PM controller.
[0006] In examples of the present disclosure, method and apparatus are provided to assure atomicity of updates through caches to a PM. The instruction set of a processor is expanded to include commands that signal explicitly or implicitly the beginning and the end of a transaction, which is a set of updates to be committed or dropped together. At the start of a transaction, the processor assigns a transaction ID to the transaction, which is appended to cache elements (e.g., cache line) that store the transaction's updates), and begins to track the transaction's update count in a volatile first cache, which is the number of the transaction's updates that are currently in the first cache. The processor monotonically increments the transaction ID value for each transaction to capture their incoming sequence. The processor marks update requests that transfer the updates of the transaction from the first cache to a nonvolatile second cache with the state of the transaction and the update count. After the transaction ends, the update count is equal to the number of outstanding updates from the transaction that the second cache is waiting to receive from the first cache.
[0007] When a cache controller of the second cache receives an update request that first indicates the transaction has ended, the cache controller may save the update count in the update request as the undelivered count in the second cache, which the cache controller decrements for each update of the transaction received from the first cache. The cache controller may check for errors by comparing the update counts delivered in the update requests and the undelivered count kept in the second cache.
[0008] At the second cache, a cache controller does not allow updates from the transaction to be flushed to a PM until the transaction is closed and the undelivered count is equal to zero (0). This ensures the atomicity of the updates so they are committed or dropped together. Even then the cache controller does not allow the updates from the transaction to be flushed when prior transactions, as determined from their transaction IDs, have been flushed. This ensures strict ordering of the transactions based on their incoming sequence.
[0009] When a new transaction updates a cache element in the first cache that has been previously updated by an old transaction, it is desirable to again update the cache element without causing the first cache to hasten or delay the flushing of the cache element. Some areas of PM may be updated very frequently so avoid flushing individual updates may improve performance. To avoid untimely flushing, the processor collapses prior transactions from the new transaction up to the newer transaction into the newer transaction by transferring the update counts in the prior transactions to the new transaction and creating redirects (e.g., a negative index shift stored in an update counter that leads to another update update) from the prior transactions to the new transaction. Any update from collapsed transactions that is transferred from the first cache to the second cache includes transaction IDs of the new and the old transactions in its update request. At the second cache, the cache controller processes the collapse by pointing the new transaction to a linked list of updates of the old transaction, and then linking the other linked lists of the collapsed transaction in the incoming sequence of the transactions.
[0010] Fig. 1 is a block diagram illustrating a device 100 that propagates atomic updates in examples of the present disclosure. Device 100 includes a central processing unit (CPU) 102, a volatile first cache 104, a PM memory controller 105, a PM cache 106, and a PM 108.
[0011] CPU 102 runs a thread of execution that includes transaction markers, which are commands that separate one transaction from another. A transaction is a set of related updates that are to be committed together to PM 108 or dropped as a unit. The transaction markers include a "transact" command and an "etransact" command. The transact command "opens" a new transaction by indicating the start of a new transaction and may implicitly "close" a previous transaction by indirectly indicating the end of the previously transaction. The etransact command closes a transaction by indicating the end of a transaction. Issuing the transact command, a thread of execution is requesting a contract with the caching environment that updates made after it and until the etransact command (or the next transact command) are to be committed or dropped together. Note that a single thread of execution may have only one active transaction, but multiple threads running on multiple CPUs may have many active transactions. In a single thread, the start of a new transaction means the termination of a previous transaction.
[0012] Fig. 2 is a block diagram illustrating components in first cache 104 to propagate atomic updates in examples of the present disclosure. These components include a monotonically incremented transaction ID (M-TID) 202, an array of update counters 204-1 to 204-n (hereafter collectively as "update counters 204" and an individual, generic "update counter 204"), and indices (illustrated as arrowed lines) from updates stored in cache elements (e.g., cache lines) to the array of update counters 204.
[0013] When a thread of execution declares the start of a transaction, CPU 102 (Fig. 1) increments and assigns the value of M-TID 202 to the transaction as its transaction ID. CPU 102 may append the assigned transaction ID in the cache elements where the updates are stored. CPU 102 also assigns an update counter 204 to the transaction and increment update counter 204 by one for each update in the transaction. CPU 102 provides indices from the cached updates to the assigned update counter 204 in the array. For example, CPU 102 appends a transaction ID of 1 to cached updates 206 and 208 of a first transaction, provides indices 210 and 212 from cached updates 206 and 208 to update counter 204-1 in the array, and increments update counter 204-1 twice. Similarly, CPU 102 appends a transaction ID of 2 to cached updates 214 and 216 of a second transaction, provides indices 218 and 220 from cached updates 214 and 216 to update counter 204-2 in the array, and increments update counter 204-2 twice. Furthermore, CPU 102 appends a transaction ID of n to a cached update 222 of an nth transaction, provides index 224 from cached update 222 to update counter 204- n in the array, and increments update counter 204-n once.
[0014] When a cached update is to be transferred to the next cache, CPU 102 decrements the corresponding update counter 204 and sends the update to the next cache with its transaction ID, cache address, and the update count in its update counter 204 in an update request.
Depending on if the transaction is open or closed, CPU 102 may also mark the update as open or closed. Note that once a transaction is closed, the update count kept is the number of undelivered updates to the next cache. If the update is the last update, CPU 102 may also mark the update as final.
[0015] Fig. 3 is a flowchart of a method 300 for CPU 102 (Fig. 1) to execute the transact command to manage M-TID 202 and update counters 204 in examples of the present disclosure. Method 300 may begin in block 302. [0016] In block 302, CPU 102 gets and increments the value of M-TID 202 (Fig. 2). This value is to be assigned as the transaction ID of a new transaction declared by the transact command. Block 302 may be followed by block 304.
[0017] In block 304, CPU 102 determines if the count at the update counter 204 (Fig. 2) for the new transaction is equal to zero. When update counter 204 is not equal to 0, then M-TID 202 has wrapped around to an old transaction with updates in first cache 104 (Fig. 1). In such a case, block 304 may be followed by block 306. If update counter 304 is equal to zero, method 300 may end.
[0018] In block 306, CPU 102 flushes first cache 104 until update counter 204 is equal to zero. CPU 102 may partially or fully flush first cache 104. Method 300 may end at block 306.
[0019] Fig. 4 is a flowchart of a method 400 for CPU 102 (Fig. 1) to execute the transact command in a thread of execution to process a transaction's updates that are transferred into first cache 104 (Fig. 1) in examples of the present disclosure. Method 400 may begin in block 402.
[0020] In block 402, CPU 102 processes each of the transaction's updates that are transferred into first cache 104. Block 402 may be followed by block 404.
[0021] In block 404, CPU 102 appends a transaction ID to the cache element storing the update, provide an index from the cached update to update counter 204, and increment update counter 204 by one. Block 404 may loop back to block 402 to process another update of the transaction until CPU 102 reaches the etransact command or another transact command in the thread of execution.
[0022] Fig. 5 is a flowchart of a method 500 for CPU 102 (Fig. 1) to execute the transact command to process a transaction's updates that are being transferred out from first cache 104 (Fig. 1) in examples of the present disclosure. Method 500 may begin in block 502.
[0023] In block 502, CPU 102 processes each of the transaction's cached updates that are being transferred out from first cache 104. Block 502 may be followed by block 504.
[0024] In block 504, CPU 102 decrement update counter 204 by one. Block 504 may be followed by block 506. [0025] In block 506, CPU 102 determines if the transaction is closed. CPU 102 knows the transaction is closed if it has processed the etransact command or another transact command in the thread of execution. If the transaction is not closed, block 506 may be followed by block 508. Otherwise block 506 may be followed by block 510.
[0026] In block 508, CPU 102 marks the update as "open." Block 508 may be optional if an unmarked update indicates an open transaction. Block 508 may be followed by block 516.
[0027] In block 510, CPU 102 may mark the update as "closed." Block 510 may be followed by block 512.
[0028] In block 512, CPU 102 determines if the update is the last update in the transaction. CPU 102 knows the update is the last update when the update count in update counter 204 is equal to zero (0). If so, block 512 may be followed by block 514. Otherwise block 512 may be followed by block 516.
[0029] In block 514, CPU 102 marks the update as "final." Block 514 may be followed by block 516.
[0030] In block 516, CPU 102 sends the update with its transaction ID, cache address, cache content, update count, and possible flag (open, final, etc).to the next cache, such as PM cache 106 (Fig. 1).
[0031] Figs. 6 and 7 are block diagrams illustrating logically collapsing transactions in first cache 104 in examples of the present disclosure. Referring to Fig. 6, transactions having transaction IDs of 601, 602, 603, 604, and 605 are assigned to respective update counters 204-1, 204-2, 204-3, 204-4, and 204-5, which have respective counts of 3, 5, 0, 1, and 0. Assume the current transaction 605 is to update a cached update 650 already pointing to update counter 204-1 of old transaction 601.
[0032] Referring to Fig. 7, when CPU 102 detects that the current transaction 605 is to modify cached update 650 already pointing to update counter 204-1 of old transaction 601, CPU 102 collapses all transactions from old transaction 601 pointed by cached update 650 to current transaction 605. To record the collapse of transactions 601 to 6005, CPU 102 transfers counts from update counters 204-1, 204-2, 204-3, and 204-4 of respective transactions 601 to 604 to update counter 204-5 of current transaction 605. Thus update counter 204-5 now has an update count of 9 (3 + 5 + 0 + 1 + 0). Furthermore CPU 102 leaves negative redirects in update counters 204-1, 204-2, 204-3, and 204-4 to update counter 204-5 in the form of a negative index shift. Thus update counter 204- 1 has an update count of -4, update counter 204-4 has an update count of -3, update counter 204-3 has an update count of 0 as it has no updates, and update counter 204-2 has an update count of -1. When one of update counters 204-1 to 204-4 is to be decremented, CPU 102 follows its negative redirect to update counter 204-5 and decrements update counter 204-5
[0033] When the current transaction 605 is closed and its update counter 204-5 becomes zero (0), CPU 102 zeroes out update counters 204-1 to 204-4 of transactions 601 to 604 to remove the negative redirects and indicate all the updates in the collapsed transactions have been transferred out from first cache 104. To efficiently do so, CPU 102 may reserve another update counter 204-6 in the array with an index to the first update counter 204- 1 in the range of collapsed transactions.
[0034] Once transactions 601 to 605 are collapsed, transactions 601 to 604 are not reported individually but rather become part of the current transaction 605. Thus CPU 102 notifies the next cache all the collapsed transaction so the next cache can maintain monotonicity of the transaction IDs. To do so, CPU 102 includes the transaction IDs of the oldest transaction 601 and the newest transaction 605 to describe the range of the collapsed transactions in update requests to the next cache. Additional details may be omitted for the sake of clarity.
[0035] Fig. 8 is a flowchart of a method 800 for CPU 102 (Fig. 1) to collapse transactions in examples of the present disclosure. Method 800 may begin in block 802.
[0036] In block 802, CPU 102 determines if the current transaction is to modify a cached update pointing to an update counter of an old transaction. If so, block 802 may be followed by block 804. Otherwise block 802 loops back to itself.
[0037] In block 804, CPU 102 transfers counts of all prior transactions from the old transaction to the current transaction into the current transaction. CPU 102 also modifies the update counters of the prior transactions with respective negative redirects to the update counter of the current transaction. CPU 102 may also set a flag to indicate a transaction is a collapsed transaction.
[0038] Fig. 9 is a flowchart of a method 900 for CPU 102 (Fig. 1) to process updates of collapsed transactions being transferred out from first cache 104 (Fig. 1) in examples of the present disclosure. Method 900 includes blocks performed in addition to the blocks in method 500 to process updates being transferred out from first cache 104. Method 900 may begin in block 902.
[0039] In block 902, CPU 102 determines if a cached update being transferred out is from a collapsed transaction. CPU 102 may determine if a cached update being transferred out is from a collapsed transaction when the cached update points to an update counter with a negative redirect or otherwise flagged as a collapsed transaction. If so, block 902 may be followed by block 904. Otherwise block 902 loops back to itself.
[0040] In block 904, CPU 102 determines if the newest of the collapsed transactions is closed and if the update is the last update from newest collapsed transaction. If so, block 904 may be followed by block 906. Otherwise block 904 may be followed by block 908. The update is the last update from the newest collapsed transaction when the update counter of the newest collapsed transaction has an update count of zero (0).
[0041] In block 906, CPU 102 zeroes out the update counters of all the collapsed
transactions. Block 906 may be followed by block 908.
[0042] In block 908, CPU 102 sends an update request with the transaction IDs of the oldest and the newest of the collapsed transactions. Block 908 may loop back to block 902 to process another cached update from the collapsed transactions.
[0043] Fig. 10 is a block diagram of components in PM cache 106 (Fig. 1) to propagate atomic updates in examples of the present disclosure. PM cache 106 includes an array 1050 of transaction elements. Each transaction element includes an undelivered counter of undelivered updates for a transaction and a pointer to the first cached update of the transaction. The undelivered counter may have three (3) types of values reflecting specific states of a transaction. When the undelivered count is equal to zero (0), the transaction is either free when it does not point to a cached update or it may be flushed to PM 108 when all previous transactions have been flushed. When the undelivered count is equal to negative one (-1), the transaction is open. When the undelivered count is greater than zero (0), the transaction has been closed by a thread of execution and the count reflects the number of undelivered updates to PM cache 106 for the transaction. For example, transaction elements for transactions 1001, 1002, 1003, and 1004 have respective counter 1052-1, 1052-2, 1052-3, and 1052-4 with respective undelivered counts of 0, 0, 3, and -1.
[0044] The pointer of each transaction element points to the first cached update of a transaction. The updates of the transaction are organized as a linked list with each update pointing the subsequent update. For example, the pointer 1504-2 of a transaction element for transaction 1002 (also referred to as "transaction element 1002") points to the first cached update 1056 of transaction 1002, which points to the second cached update 1508 of transaction 1002. The pointer 1504-3 of transaction element 1003 points to the first cached update 2060 of transaction 1003. The pointer 1504-4 of transaction element 1004 points to the first cached update 1062 of transaction 1004, which points to the second cached update 1064 of transaction 1004.
[0045] Fig. 1 1 is a flowchart of a method 1 100 for PM controller 105 (Fig. 1) to process transactions being transferred into PM cache 106 (Fig. 1) in examples of the present disclosure. Method 1 100 may begin in block 1102.
[0046] In block 1 102, PM cache controller 105 determines if it detects a new transaction being transferred into PM cache 106. For example, PM cache controller 105 receives an update request to PM cache 106 and the transaction ID of the update is not associated with any transaction element in PM cache 106. Such an update would be the first update of the transaction. If PM cache controller 105 detects a new transaction being transferred into PM cache 106, block 1102 may be followed by block 1 104. Otherwise block 1102 may loop back to itself.
[0047] In block 1 104, PM cache controller 105 associates the transaction to a transactional element in array 1050, indicates the transaction is open or close by setting the undelivered count of counter 1052, and sets a pointer 1054 to the first cached update of the transaction. If the update is marked open in the update request, PM cache controller 105 sets counter 1052 equal to negative one (-1). If the update is marked closed in the update request, PM cache controller 105 sets counter 1052 equal to the count in the update request. Block 1 104 may loop back to block 1102 to process another new transaction.
[0048] Fig. 12 is a flowchart of a method 1200 for PM cache controller 105 (Fig. 1) to process updates being transferred into PM cache 106 (Fig. 1) in examples of the present disclosure. Method 1200 may begin in block 1202. [0049] In block 1202, PM cache controller 105 determines if it detects an update of a transaction being transferred into PM cache 106. If so, block 1202 may be followed by block 1204. Otherwise block 1202 may loop back to itself.
[0050] In block 1204, PM cache controller 105 adds a pointer from the prior cached update of the same transaction to the cached update to form part of a linked list of cached updates of the transaction. Block 1204 may be followed by block 1206.
[0051] In block 1206, PM cache controller 105 determines if the update is marked closed in the update request to indicate the transaction is closed. If so, block 1206 may be followed by block 1208. Otherwise block 1206 may loop back to block 1202 to process another update.
[0052] In block 1208, PM cache controller 105 determines if the transaction is marked open in the transactional element. The transaction may be marked open in the transactional element if the undelivered count in the counter of the transactional element is set to negative one (-1). When the transaction is marked open in the transactional element, block 1208 may be followed by block 1210. Otherwise block 1208 may be followed by block 1212.
[0053] In block 1210, PM cache controller 105 marks the transaction as closed in the transactional element. PM cache controller 105 may mark the transaction as closed in the transactional element by setting the undelivered count in the counter of the transactional element equal to the count in the update request. Block 1210 may be followed by block 1218.
[0054] In block 1212, PM cache controller 105 decrements the undelivered count in the counter in the transactional element by one to reflect that one of the updates for the transaction has been received. Block 1212 may be followed by block 1214.
[0055] In block 1214, PM cache controller 105 determines if the undelivered count is equal to the update count in the update request. If they are not the same, then an error has occurred and block 1214 may be followed by block 1216. Otherwise block 1214 may be followed by block 1218.
[0056] In block 1216, PM cache controller 105 processes the error. PM cache controller 105 may run a diagnostic test to determine and correct the error. Block 1216 may loop back to block 1202 to process another update. [0057] In block 1218, PM cache controller 105 determines if the update is the last update of the transaction. The update is the last update of the transaction when the update is marked final or the undelivered count is equal to zero (0). The final label of an update and the zero undelivered count may both be used to safeguard the independent accounting of the two caches. If the update is the last update of the transaction, block 1218 may be followed by block 1220. Otherwise block 1218 may loop back to block 1202 to process another update.
[0058] In block 1220, PM cache controller 105 determines if all previous transactions have been flushed. A previous transaction is any transaction with a smaller transaction ID. A previous transaction has been flushed when both their undelivered counts and pointers are equal to zero (0). If all previous transactions have not been flushed, then block 1220 may be followed by block 1222. Otherwise block 1220 may be followed by block 1224.
[0059] In block 1222, PM cache controller 105 does not allow the flushing of the transaction in order to enforce strict ordering of the transactions based on their incoming sequence. Block 1216 may loop back to block 1202 to process another update. This may trigger release and flush of this and consequent transactions.
[0060] In block 1224, PM cache controller 105 allows flushing of the transaction up to any subsequent transactions that also have their undelivered count equal to zero (0). Note that after flushing the transaction, the pointer to the first cached update of the transaction is replaced with a zero (0) where zeroes in the undelivered count and the pointer indicate the transactional element is free. Block 1224 may loop back to block 1202 to process another update. This may trigger release and flush of this and consequent transactions.
[0061] Figs. 13 and 14 are block diagrams illustrating processing collapsed transactions in PM cache 106 in examples of the present disclosure. Fig. 13 shows a transaction 1301 that may be flushed and three closed transactions 1302, 1303, and 1304. Transactions 1301,
1302, 1303, and 1304 are assigned to transaction elements with respective counters 1350-1, 1350-2, 1350-3, and 1350-4, which have respective counts of 0, 1, 2, and 1. The
transactional elements 1301, 1302, 1303, and 1304 also have respective pointers 1352-1, 1352-2, 1352-3, and 1352-4 to the first cached updates of respective transactions 1301, 1302,
1303, and 1304.
[0062] Referring to Fig. 14, assume at some point transaction collapse happens in the prior cache (e.g., first cache 104 in Fig. 1) and another two (2) elements are updated. After that PM controller 105 (Fig. 1) receives an update request indicating the transaction collapse by specifying two (2) transaction IDs 1302 and 1305 as well the fact that the transaction is closed and the remaining total wait count is five (5). This total count of five (5) includes all counts of the included transactions (1 + 2 + 1), as well as two (2) new updates minus one (1) to reflect the same element has been updated again. Note that the update counter for a cache element is only incremented once when the cache element is updated multiple times in the same transaction. When PM controller 105 receives the update request with the collapse indicator, PM controller 105 transfers the pointer 1352-2 of the oldest collapsed transaction 1302 to the newest collapsed transaction 1305 and links the lists of collapsed transactions 1302, 1303, 1304, and 1305 in descending order by age. PM controller 105 then zero out all counts of the prior collapsed transactions 1302, 1303, and 1304 before the newest collapsed transaction 1305.
[0063] Fig. 15 is a flowchart of a method 1500 for PM controller 105 (Fig. 1) to process updates of collapsed transactions being transferred in PM cache 106 (Fig. 1) in examples of the present disclosure. Method 1500 includes blocks performed in addition to the blocks in method 1200. Method 1500 may begin in block 1502.
[0064] In block 1502, PM controller 105 determines if it has received an update request with two (2) transaction IDs. If so, block 1502 may be followed by block 1504. Otherwise block 1502 loops back to itself.
[0065] In block 1504, PM controller 105 transfers the pointer to first update in the oldest collapsed transaction to the newest collapsed transaction. Block 1504 may be followed by block 1506.
[0066] In block 1506, PM controller 105 removes the pointers from the prior collapsed transactions before the newest collapsed transaction and links the lists of the collapsed transactions based on the incoming sequence of their transactions. Block 1506 may be followed by block 1508.
[0067] In block 1508, PM controller 105 zeroes out undelivered counts in the prior collapsed transactions before the newest collapsed transaction. Block 1508 may loop back to block 1502 to process another update request for collapsed transactions.
[0068] Fig. 16 is a flowchart of a method 1600 for PM controller 105 (Fig. 1) to process transactions left in PM cache 106 (Fig. 1) after a power failure or a system crash in examples of the present disclosure. Method 1600 may begin in block 1602.
[0069] In block 1602, PM controller 105 processes the transactions in PM cache 106 in the order of increasing transaction IDs (M-TIDs). Block 1602 may be followed by block 1604.
[0070] In block 1604, PM controller 105 determines if the undelivered count of the transaction is equal to zero (0), which indicates the transaction is closed and all updates of the transaction are in PM cache 106. If so, block 1604 may be followed by block 1606.
Otherwise block 1604 may be followed by block 1608.
[0071] In block 1606, PM controller 105 flushes the updates of the transaction to PM 108. Block 1606 may loop back to block 1602 to process another transaction.
[0072] In block 1608, PM controller 105 discards the transactions and all transactions that have higher transaction IDs in order to enforce strict ordering of the transactions based on their incoming sequence.
[0073] Fig. 17 is a block diagram of a device 1600 for implementing CPU 102 or PM controller 105 (Fig. 1) in examples of the present disclosure. Code 1702 for propagating atomic updates is stored in a non-transitory computer medium 1704, such as a read-only memory. A microprocessor 1706 executes instructions 1702 to provide the described features and functionalities.
[0074] Referring to Fig. 1, system 100 may include intermediate volatile or persistent caches between first cache 104 and PM cache 106 in some examples of the present disclosure. These intermediate caches may behave similarly to first cache 104 as described above.
[0075] System 100 may include multiple CPUs 102 in some examples of the present disclosure. To ensure consistency, a CPU 102 running a thread of execution that declares a transaction may process all updates in the transaction without evicting or relocating the thread to another CPU 102. For example, the thread may be put in uninterruptible state. If a CPU 102 is to evict or relocate a thread of execution that declares a transaction, CPU 102 may include the necessary context to ensure consistency.
[0076] There may be an error when a thread tries to modify a cache element that is part of updates in an open transaction of another thread. The error may be handled in various ways by system 100. In the strict mode, system 100 may throw an exception to the contending thread that is trying to access or modify a cache element that was already modified in a transaction opened by another thread. In the isolation mode, system 100 allows the contending thread to wait and retry. This mode may include a deadlock detection mechanism that would throw an exception to the contending thread when deadlock is detected. In the transparent mode, system 100 ignores the transaction boundaries and relies on application serialization for isolation.
[0077] Various other adaptations and combinations of features of the examples disclosed are within the scope of the present disclosure.

Claims

What is claimed is:
Claim 1 : A method for a device to assure atomicity of updates, comprising, in a first cache: assigning a transaction ID to a transaction comprising a set of updates that are to be committed or dropped together as a unit; for each update of the transaction transferred into the first cache, incrementing an update count of the transaction; for each update of the transaction being transferred from the first cache to a second cache based on persistent memory: decrementing the update count; after the transaction is closed, marking the update as closed in an update request; and including the transaction ID and the update count in a update request to transfer to update from the first cache to the second cache.
Claim 2: The method of claim 1, wherein a start and an end of the transaction are declared in a thread of execution.
Claim 3: The method of claim 1, further comprising monotonically incrementing the transaction ID for each transaction to capture an incoming sequence of transactions.
Claim 4: The method of claim 1, further comprising, in the second cache: assigning the transaction to a transaction element to store a pointer to a first update in the transaction; for each update of the transaction transferred into the second cache, adding a pointer from a prior update to the update to form a linked list of updates from the transaction; when all updates of the transaction have been transferred to the second cache and previous transactions have been flushed, allowing flushing of the transaction to a main memory based on persistent memory; and when all updates of the transaction have been transferred to the second cache but not all previous transactions have been flushed, preventing flushing of the transaction.
Claim 5: The method of claim 4, further comprising, in the second cache: when all updates of the transaction have been transferred to the second cache and previous transactions have been flushed, allowing flushing of subsequent transactions that have all updates of the subsequent transactions transferred to the second cache.
Claim 6: The method of claim 1, further comprising, in the second cache: assigning the transaction to a transaction element to store an undelivered count of the transaction; setting the undelivered count to a negative value to indicate the transaction is open; when an update is marked closed and the transaction is indicated as open in the second cache, setting the undelivered count equal to the update count in the update to indicate the transaction is closed; and after setting the undelivered count equal to the update count in the update, decrementing the undelivered count for each update of the transaction transferred into the second cache.
Claim 7: The method of claim 6, further comprising, in the second cache: after a power failure or a system crash: when all updates of the transaction have been transferred to the second cache so the undelivered count is equal to zero, flushing the transaction to a main memory based on persistent memory; and when all updates of the transaction have not been transferred to the second cache so the undelivered count is not equal to zero, discarding the transaction and transactions with higher transaction IDs from the second cache.
Claim 8: The method of claim 4, further comprising, in the first cache: detecting the transaction modifies a cached update previously modified by an old transaction; collapsing transactions from the old transaction to the transaction by: transferring update counts of prior transactions from the old transaction to the transactions into the update count of the transaction; and modifying the update counts of the prior transactions with redirects to the update count of the transaction; detecting an update from a collapsed transaction being transferred from the first cache to the second cache; and including a transaction ID of the old update along with the transaction ID of the transaction in the update request to transfer to update from the first cache to the second cache.
Claim 9: The method of claim 8, further comprising, in the second cache: setting the undelivered count of the transaction to the update count in the update request; detecting the update from the collapsed transaction; transferring a pointer from a first update in the old transaction to the transaction; linking each link list of the collapsed transactions to a subsequent link list based on an incoming sequence of the collapsed transactions; and zeroing out undelivered counts in the prior transactions.
Claim 10: A device, comprising: a first cache; a second cache based on persistent memory; a cache controller for the second cache; a main memory based on persistent memory; and a processor to: assign a transaction ID to a transaction comprising a set of updates that are to be committed or dropped together as a unit, the transaction ID being monotonically incremented for each transaction to capture an incoming sequence of transactions; for each update of the transaction transferred into the first cache, increment an update count of the transaction; for each update of the transaction being transferred from the first cache to the second cache: decrement the update count; after the transaction is closed, mark the update as closed in an update request; and include the transaction ID and the update count in a update request to transfer to update from the first cache to a second cache.
Claim 11 : The device of claim 10, wherein the cache controller is to: assign the transaction to a transaction element to store a pointer to a first update in the transaction in the second cache; for each update of the transaction transferred into the second cache, add a pointer from a prior update to the update to form a linked list of updates from the transaction; when all updates of the transaction have been transferred to the second cache and previous transactions have been flushed, allow flushing of the transaction and subsequent transactions that have also have all of their updates transferred to the second cache to the main memory; and when all the updates of the transaction have been transferred to the second cache but all the previous transactions have not been flushed, preventing flushing of the transaction. Claim 12: The device of claim 10, wherein the cache controller is to: assign the transaction to a transaction element to store an undelivered count of the transaction; set the undelivered count to a negative value to indicate the transaction is open; when an update is marked closed and the transaction is indicated as open in the second cache, set the undelivered count equal to the update count in the update to indicate the transaction is closed; and after the undelivered count is set equal to the update count in the update, decrement the undelivered count for each update of the transaction transferred into the second cache.
Claim 13 : The device of claim 12, wherein the cache controller is to: after a power failure or a system crash: when all updates of the transaction have been transferred to the second cache so the undelivered count is equal to zero, flush the transaction to a main memory based on persistent memory; and when all updates of the transaction have not been transferred to the second cache so the undelivered count is not equal to zero, discard the transaction and transactions with higher transaction IDs from the second cache.
Claim 14: The device of claim 10, wherein: the processor is to: detect the transaction modifies a cached update previously modified by an old transaction; collapse transactions from a old transaction to the transaction by: transferring update counts of prior transactions from the old transaction to the transactions into the update count of the transaction; and modify the update counts of the prior transactions with redirects to the update count of the transaction. detect an update from a collapsed transaction being transferred from the first cache to the second cache; and include a transaction ID of the old update along with the transaction ID of the transaction in the update request to transfer to update from the first cache to the second cache. the cache controller is to: set the undelivered count of the transaction to the count in the update request; detect the update from the collapsed transaction; transfer a pointer from a first update in the old transaction to the transaction; link each link list of the collapsed transactions to a subsequent link list based on an incoming sequence of the collapsed transactions link each link list to a subsequent link list; and zero out undelivered counts in the prior transactions.
Claim 15: A non-transitory computer readable medium encoded with executable instructions for execution by a processor to: assign a transaction ID to a transaction comprising a set of updates that are to be committed or dropped together as a unit; for each update of the transaction transferred into a first cache, incrementing an update count; for each update of the transaction being transferred from the first cache to a second cache based on persistent memory: decrement the update count; after the transaction is closed, mark the update as closed in an update request; and include the transaction ID and the update count in a update request to transfer to update from the first cache to a second cache.
PCT/US2014/016634 2014-02-14 2014-02-14 Free flow assurance of atomicity of multiple updates in persistent memory WO2015122925A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/016634 WO2015122925A1 (en) 2014-02-14 2014-02-14 Free flow assurance of atomicity of multiple updates in persistent memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/016634 WO2015122925A1 (en) 2014-02-14 2014-02-14 Free flow assurance of atomicity of multiple updates in persistent memory

Publications (1)

Publication Number Publication Date
WO2015122925A1 true WO2015122925A1 (en) 2015-08-20

Family

ID=53800505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/016634 WO2015122925A1 (en) 2014-02-14 2014-02-14 Free flow assurance of atomicity of multiple updates in persistent memory

Country Status (1)

Country Link
WO (1) WO2015122925A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018118040A1 (en) * 2016-12-21 2018-06-28 Hewlett-Packard Development Company, L.P. Persistent memory updating

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020103815A1 (en) * 2000-12-12 2002-08-01 Fresher Information Corporation High speed data updates implemented in an information storage and retrieval system
US20080104332A1 (en) * 2006-10-31 2008-05-01 Gaither Blaine D Cache memory system and method for providing transactional memory
US20090106494A1 (en) * 2007-10-19 2009-04-23 Patrick Knebel Allocating space in dedicated cache ways
US20140040550A1 (en) * 2011-09-30 2014-02-06 Bill Nale Memory channel that supports near memory and far memory access

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020103815A1 (en) * 2000-12-12 2002-08-01 Fresher Information Corporation High speed data updates implemented in an information storage and retrieval system
US20080104332A1 (en) * 2006-10-31 2008-05-01 Gaither Blaine D Cache memory system and method for providing transactional memory
US20090106494A1 (en) * 2007-10-19 2009-04-23 Patrick Knebel Allocating space in dedicated cache ways
US20140040550A1 (en) * 2011-09-30 2014-02-06 Bill Nale Memory channel that supports near memory and far memory access

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IULIAN MORAU ET AL.: "Persistent, Protected and Cached: Building Blocks for Main Memory Data Stores", CMU-PDL-11-114, November 2012 (2012-11-01), Carnegie Mellon University, XP055219254, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.252.7301&rep=repl&type=pdf> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018118040A1 (en) * 2016-12-21 2018-06-28 Hewlett-Packard Development Company, L.P. Persistent memory updating
US10860246B2 (en) 2016-12-21 2020-12-08 Hewlett-Packard Development Company, L.P. Persistent memory updating

Similar Documents

Publication Publication Date Title
US8756374B2 (en) Store queue supporting ordered and unordered stores
JP2022534892A (en) Victim cache that supports draining write-miss entries
US9798590B2 (en) Post-retire scheme for tracking tentative accesses during transactional execution
US10572179B2 (en) Speculatively performing memory move requests with respect to a barrier
JP6470300B2 (en) Method and processor for data processing
US7698504B2 (en) Cache line marking with shared timestamps
US7890700B2 (en) Method, system, and computer program product for cross-invalidation handling in a multi-level private cache
US20130205120A1 (en) Processor performance improvement for instruction sequences that include barrier instructions
TWI383295B (en) Disowning cache entries on aging out of the entry
US10140052B2 (en) Memory access in a data processing system utilizing copy and paste instructions
US10152322B2 (en) Memory move instruction sequence including a stream of copy-type and paste-type instructions
US9430380B2 (en) Managing memory transactions in a distributed shared memory system supporting caching above a point of coherency
JP6568575B2 (en) Call stack maintenance for transaction data processing execution mode
US9569365B2 (en) Store-exclusive instruction conflict resolution
US10691348B2 (en) Issuing write requests to a fabric
US9959213B2 (en) Implementing barriers to efficiently support cumulativity in a weakly-ordered memory system
US8850129B2 (en) Memory ordered store system in a multiprocessor computer system
US10241945B2 (en) Memory move supporting speculative acquisition of source and destination data granules including copy-type and paste-type instructions
JP2017520857A5 (en)
US9606923B2 (en) Information processing device with shared memory, memory order guarantee method using counters fence instructions in relation to cache-oriented requests, and recording medium storing program
WO2015122925A1 (en) Free flow assurance of atomicity of multiple updates in persistent memory
US10126952B2 (en) Memory move instruction sequence targeting a memory-mapped device
US10331373B2 (en) Migration of memory move instruction sequences between hardware threads
US9081689B2 (en) Methods and systems for pushing dirty linefill buffer contents to external bus upon linefill request failures
US8930627B2 (en) Mitigating conflicts for shared cache lines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14882446

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14882446

Country of ref document: EP

Kind code of ref document: A1