Press, New York, 1965, in Knuth, "Additional com
SYSTEM FOR ACHIEVING ATOMIC ments on a problem in concurrent programming con
NON-SEQUENTTAL MULTI-WORD OPERATIONS trol," Communications of the ACM, Vol. 9, No. 5, pp.
IN SHARED MEMORY 321-322, May 1966, in Peterson, "Myths about the mu
5 tual exclusion problem," Information Processing Let
BACKGROUND AND SUMMARY OF THE ters, Vol. 12, pp. 115-116, June 1981, and in Lamport,
INVENTION "A new solution of Dijkstra's concurrent programming
This invention relates to operation of computer sys- problem," Communications of the ACM, Vol. 18, No.
terns having shared memory, and more particularly to a 8, pp. 453-455, August 1974.
transactional memory for use in multiprocessor systems. 10 These algorithms using only load and store operaSynchronizing access to shared data structures is one tions, however, are cumbersome and inefficient, so curof the oldest and most difficult problems in designing rent practice is to provide support for read-modifysoftware for shared-memory multiprocessors. Without write (RMW) operations directly in hardware. A readcareful synchronization, a data structure may be left in modify-write operation is parameterized by a function f. an inconsistent state if different processes try to modify 15 It atomically (1) reads the value v from a location, (2) it at the same time. Conventional techniques for syn- computes f(v) and stores it in that location, and (3) chronizing access to shared data structures in shared- returns v to the caller. Common read-modify-write memory multiprocessors center around mutual exclu- operations include TEST&SET, atomic register-tosion protocols. Each data structure has an associated memory SWAP (see Graunke et al, "Synchronization lock. Before a process can access a data structure, it 20 algorithms for shared memory multiprocessors," IEEE must acquire the lock, and as long it holds that lock, no Computer, Vol. 23, No. 6, pp. 60-70, June 1990), other process may access the data structure. FETCH&ADD (see Gottlieb et al, "The NYU UlNevertheless, locking is poorly suited for modern tracomputer—designing an MIMD parallel computer," shared-memory multiprocessor architectures, for sev- IEEE Xrans on Computers, Vol. C-32, No. 2, pp. eral reasons. First, locking is poorly suited for processes 25 175_lg9) February 1984)) COMPARE&SWAP (see that must modify multiple date objects, particularly if IBM> «System/370 principles of operation," Order No. the set of objects to be modified is not known m ad- GA22-7000), and LOAD_LINKED and STORE., vance. Care must be taken to avoid deadlocks that arise CONDiTIONAL (see Jensen et al, "A new approach when processes attempt to acquire the same locks m ^^m ... m mul^roces. different orders. Second if the process holding a lock is 30 Te6hnical Report TJCRL-97663, Lawrence Livdescheduled, perhaps by exhausting its scheduling '„ . . Lab~atorv November 1987^ quantum, by a page fault, or by some other kind of erm?ie mTMnal Laboratory, November mi). interrupt, then other processes capable of running may A Although these hardware primitives were onginaUy be unable to progress Third, locking interacts poorly developed to support locking, they can sometimes be with priority systems. A lower-priority process may be 35 used to avoid locking for certain data structures. A preempted in favor of a higher-priority process, but if systematic analysis of the relative power of different the preempted process is holding a lock, then other, read-modify-wnte primitives for this purpose is given m perhaps higher priority processes will be unable to Herhhy, Wait-free synchronization, ACM Trans, on progress (this phenomenon is sometimes called "prior- Programming Languages and Systems, Vol 13, No. 1, ity inversion"). And fourth, locking can produce "hot- 40 PP- 123-149, January 1991. If an architecture provides spot" contention. In particular, spin locking techniques, omy read md write operations, then it is provably imin which processes repeatedly poll a lock until it be- possible to construct non-blocking implementations of comes free, perform poorly because of excessive mem- manv simple and familiar data types, such as stacks, ory contention. queues, lists, etc. Moreover, many of the "classical" By contrast, a concurrent object implementation is 45 synchronization primitives such as TEST&SET, non-blocking if some process is guaranteed to complete SWAP, and FETCH&ADD are also computationally an operation after the system as a whole takes a finite weak. Nevertheless, there do exist simply universal number of steps (referred to as atomicity, as will be primitives from which one can construct a non-blocking described). This condition rules out the use of locking, implementation of any object. Examples of universal since a process that halts while holding a lock may force 50 primitives include COMPARE&SWAP, LOAD— other processes trying to acquire that lock to run for- LINED and STORE-CONDITIONAL, and others, ever without making progress. Although the universal primitives are powerful Described herein is a new multiprocessor architec- enough in theory to support non-blocking implementature that permits programmers to construct non-block- tions of any concurrent data structure, they may pering implementations of complex data structures in a 55 form poorly in practice because they can update only simple and efficient way. This architecture is referred to one word of memory at a time. To modify a complex as transactional memory, and consists of two parts: (1) a data structure, it is necessary to copy the object, modify collection of special machine instructions, and (2) par- the copy, and then to use a read-modify-write operation ticular techniques for implementing these instructions. to swing a base pointer from the old version to the new Most of the programming language constructs pro- 60 version. Detailed protocols of this kind for COMposed for concurrent programming in the multiproces- PARE&SWAP and for LOAD-LINED and STOsor with shared memory model employ locks, either RE-CONDITIONAL have been published (see Herexplicitly or implicitly (Andrews et al, "Concepts and lihy, "A methodology for implementing highly concurnotations for concurrent programming," ACM Com- rent data structures," Proc. 2nd ACM SIGPLAN puting Surveys, Vol. 15, No. 1, pp. 3-43, March 1983, 65 Symp. on Princ. and Practice of Parallel Programming, disclose a survey). Early-locking algorithms used only pp. 197-206, March 1990, and Herlihy, "A methodolload and store operations, as disclosed in Dijkstra, "Co- ogy for implementing highly concurrent data objects," operating sequential processes," pp. 43-112, Academic Tech. Rpt. No. 91/10, Digital Equipment Corporation,