WO1991020039A1

WO1991020039A1 - Method and apparatus for a load and flag instruction

Info

Publication number: WO1991020039A1
Application number: PCT/US1991/004059
Authority: WO
Inventors: Roger E. Eckert; Richard E. Hessel; Andrew E. Phelps; George A. Spix; Jimmie R. Wilson
Original assignee: Supercomputer Systems Limited Partnership
Priority date: 1990-06-11
Filing date: 1991-06-10
Publication date: 1991-12-26
Also published as: JPH05508496A

Abstract

A method and apparatus for providing a resource lockout mechanism in a shared memory, multiprocessor system (10) that is capable of performing both a read and write operation during the same memory operation. The load and flag instruction (load f) of the present invention can execute a read operation, followed by a write operation of a preselected flag value to the same memory location during the same memory operation. The load and flag instruction is particularly useful as a resource lockout mechanism for use in Monte Carlo applications.

Description

METHOD AND APPARATUS FOR A

LOAD AND FLAG INSTRUCTION

TECHNICAL FIELD This invention relates generally to the field of memory systems and memory management for computer and electronic logic systems. More particularly, the present invention relates to a method and apparatus for providing a resource lockout mechanism through execution of a load and flag instruction on a memory location. The resource lockout is accomplished by performing both a read and write operation to the same memory location during the same memory operating cycle.

BACKGROUND ART

The previously filed parent application entitled CLUSTER ARCHITECTURE FOR A HIGHLY PARALLEL SCALAR/VECTOR

MULTIPROCESSOR SYSTEM, PCT Serial No. PCT/US90/07655 describes a new cluster architecture for high-speed computer processing systems, referred to as supercomputers. For most supercomputer applications, the objective is to provide a computer processing system with the fastest processing speed and the greatest processing flexibility, i.e., the ability to process a large variety of traditional application programs. In an effort to increase the processing speed and flexibility of supercomputers, the cluster architecture for highly parallel multiprocessors described in the previously identified parent application provides an architecture for supercomputers wherein a multiple number of processors and external interface means can make multiple and simultaneous requests to a common set of shared hardware resources, such as main memory, secondary memory, global registers, interrupt mechanisms, or other shared resources present in the system.

One of the important considerations in designing such shared resource multiprocessor systems, is how to indicate that a particular resource is presently being used by one of the processors and to provide a resource lockout mechanism so that another processor does not simultaneously use the same resource. The problem of resource lockout is further compounded by two additional factors associated with the cluster architecture for highly parallel multiprocessors that is described in the parent application identified above. First, the resource lockout mechanism must not only work with the registers and interrupt hardware for the system, it must also work on all of the locations of main memory. Second, the resource lockout mechanism must operate in a distributed environment where there is no central scheduler and where requestors other than processors (e.g., a distributed I/O controller) are also allowed direct access to shared resources without processor intervention.

In essence, the problem of resource lockout has been managed in prior art supercomputers by assigning a single, central scheduling processor to keep track of what resources are currently being used by which processor. In a distributed access architecture, access to all shared resources is equal and democratic and there is no central scheduler. Hence, each requestor to a common shared resource must be provided with the information necessary to determine whether that resource is presently being used by another requestor in the multiprocessor system. Present methods and systems for resource lockout mechanism for use in a shared memory, multiprocessor environment do not allow for fully distributed and democratic access to the common shared main memory by all of the possible requestors in the multiprocessor environment. Consequently, a new method and apparatus is needed that provides for an indication to all other requestors in the multiprocessor system when a particular shared resource is being utilized. In addition, a new method and apparatus is needed for requestors to manage their own requests for shared resource access, responsive to an indication that a resource is or is not available.

SUMMARY OF THE INVENTION

The present invention is a method and apparatus for providing a resource lockout mechanism in a shared resource, multiple requestor system that is capable of performing both a read and write function during the same resource operation. The present invention comprises a load and flag instruction that can execute a read operation, immediately followed by a write operation of a preselected flag value to the same resource location during the same resource operating cycle. The preselected flag value is one that is recognized by the requestors as not being valid data, thus indicating to the requestor that the resource they attempted to read from is locked. Unlike prior art resource lockout mechanisms, the present invention can operate on every memory location in main memory in a multiprocessor or a distributed I/O environment that has fully distributed and democratic access to all shared resources by all possible requestors.

In a preferred embodiment, the load and flag instruction is initiated by presenting both an address to a targeted memory location in a memory bank, along with a delayed write pulse. The write pulse is delayed just long enough to allow the data to be read from the targeted memory location, after which the write pulse overwrites a preselected flag value into the targeted memory location. The memory bank of the present invention is provided with both load data and store data ports that allow a given operation access to read and write data paths to memory locations in the memory bank respectively. During the load and flag operation, the preselected flag value is placed on the store data port and the data value in the targeted memory location is accessed in the memory bank and placed onto the load port. By controlling the timing signals that are being presented to the memory by the bank cycle control logic, the load and flag instruction is completed during a single memory operation, thereby making the present invention an atomic operation, i.e., an operation that is an uninterruptable one-step process.

One of the more useful applications of the present invention is as a resource lockout mechanism for use in Monte Carlo applications. The term Monte Carlo refers to the random nature of the requested memory addresses created by certain applications as the application attempts to update various memory locations determined by pseudo-random techniques. In the preferred embodiment, the preselected flag value is chosen to be a floating point not-a-number (NaN) value. As the NaN value is not a valid number for purposes of the Monte Carlo application, the various processors in a multiprocessor or distributed I/O system running a Monte Carlo application can quickly and easily determine whether any given random memory access has encountered a memory location that is subject to resource lockout because another requestor is presently updating that memory location. Because Monte Carlo operations are most efficiently performed as a pipeline or vector operation, the present invention allows the processor to continue processing in a pipelined fashion without interrupting the Monte Carlo process because a resource lockout was encountered. Instead, the processor notes the flagged location and returns to request that memory location again at a later time.

An objective of the present invention is to provide a method and apparatus for a resource lockout mechanism in a shared resource, multiple requestor system that is capable of performing both a read function and a write function as a single operation. Another objective of the present invention is to provide a resource lockout mechanism that can operate on every memory location in main memory in a multiprocessor or distributed I/O environment that has fully distributed and democratic access to all shared resources by all possible requestors. A further objective of the present invention is to provide a load and flag instruction that can execute a read operation, followed by a write operation of a preselected flag value to the same memory location as an atomic memory operation.

Still another objective of the present invention is to provide a resource lockout mechanism that can effectively indicate memory lockout conditions for a multiprocessor Monte Carlo application in such a way as to allow continued pipelined processing of the Monte Carlo application.

These and other objectives of the present invention will become apparent with reference to the drawings, the detailed description of the preferred embodiment and the appended claims.

DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram of a multiprocessor memory system incorporating the load and flag instruction. Fig. 2 is a timing diagram for the load, store, and load and flag

(loadf) instructions as executed in the multiprocessor memory system of Fig. l. DESCRIPTION OF THE PREFERRED EMBODIMENT

In the preferred embodiment, the load and flag instruction of the present invention is accomplished by issuing a load and flag command operation to a location in the main memory. Logic at the selected memory bank of the main memory interprets this command operation as a read of the data value in the memory location, followed by a write of a predefined flag value to the memory location. Because the address for the memory location is set up prior to the issue of the write of the predefined flag value, this logic can read the data currently at the memory location prior to issuing the write operation as part of a single atomic memory operation. The data value that is read is returned to the requestor using the normal read mechanisms.

Referring now to Fig. 1, one embodiment of the present invention will be described. Fig. 1 shows a multiprocessor memory system, having M processors 10 that may issue memory requests. Each of the processors 10 has a two-bit command that is part of a memory instruction which specifies the type of memory operation occurring: no-op, store to memory, load from memory, and load and flag. Although the embodiment shown in Fig. 1 will be described in terms of requests issued by a processor 10 to main memory, it will be understood that in the preferred embodiment, other requestors, such as an input/ output controller may also make similar memory requests. It will also be understood that the multiple requestors need not be physically distinct, but may be multiple logical requestors from a single physical source, such as in time division multiplexing.

When a processor 10 has a request to memory, it generates a command, an address and store data to an arbitration network 30. The arbitration network 30 is used as a steering mechanism for the memory request. The arbitration network 30 for each memory bank 20 selects which processor 10 will have access to the bank 20. For a more detailed description of the operation of the arbitration network 30 of the preferred embodiment, reference is made to the previously identified parent application. Although the arbitration network 30 is described in terms of a switching mechanism, it will be understood that the present invention would also work with a system of multiplexers having any number of mechanisms for resolving conflicting requests, such as FIFO systems, revolving switches, etc.

Once the memory instruction is presented to the proper memory bank 20, a bank cycle control means 40 accepts the memory command inputs, decodes the memory command and starts an operation on the memory bank 20. The bank cycle control means 40 is used to keep track of the state of the memory bank 20 as the operation progresses. The bank cycle control means 40 also maintains a busy condition for the memory bank 20. In the embodiment shown in Fig. 1, a set of multiplexers 50 provide the processors 10 with a bank busy signal, thereby allowing the processors 10 to examine which memory banks 20 are busy and which are not. This mechanism 50 is strictly a hardware interlock means that is used to prevent the processors 10 from overrunning the bank cycle control 40.

For load and loadf operation, a switching mechanism 60 enables load data to be returned from the memory banks 20 to the processors 10. In the embodiment shown in Fig. 1, the switching mechanism 60 is implemented using a single multiplexer per processor 10 which selects data from the appropriate memory bank 20.

In the preferred embodiment, the predetermined flag value is loaded onto the store data lines for the appropriate memory bank 20 by using the switching mechanism 30. In response to the loadf instruction decode, the switching mechanism 30 selects a predetermined flag value stored in a memory or latch associated with the switching mechanism 30 and loads the predetermined flag value on the store data lines. In an alternate embodiment, the processor 10 could load a predetermined flag value onto the store data lines for the appropriate memory bank 20 in the same manner as performed by the switching mechanism.

In the preferred embodiment of the load and flag command, a floating point not-a-number (NaN) number is defined as the predefined flag value. The floating point NaN value chosen is a value that has the least possibility of interfering with useful data patterns. In the preferred embodiment, the value used for the flag value is hexadecimal (FFF4000000000000). When interpreted as a signed integer the value is a very large negative number, specifically, -3,377,699,720,527,872. When interpreted as an unsigned integer, the value is a very large positive number, specifically, 18,443,366,373,989,023,744. Referring now to Fig. 2, the relative timing of the memory operations for the load, store, and load and flag commands of the present invention are shown. As shown, the load and flag command requires a slightly longer interval to complete than the load or store commands. In the preferred embodiment, the WR ENABLE signal is asserted by the bank cycle control means 40 a fixed amount of time after the start of the load and flag operation. This amount of time is the minimum amount necessary to ensure that the load data has been accessed from the memory bank 20 and captured by the switching mechanism 60. The load and flag mechanism of the present invention is an atomic memory operation that simultaneously returns the current value of a memory location and stores a predefined pattern in its place. The preferred embodiment of the load and flag instruction in accordance with the present invention is set forth as the loadf instruction in Appendix A. The loadf instruction can be used as a method for synchronizing shared memory variables in a multiprocessor application.

In use, the loadf instruction operates somewhat like a self-setting semaphore register. When a first requestor uses the loadf instruction to retrieve a shared data item that is held in a given memory location, that requestor reads the value in the memory location and then locks the memory location by writing the preselected flag value in mat location. In essence, this memory location is now checked out to the requestor who issued the loadf instruction. When other requestors attempt to access the data in this memory location, they are locked out from the memory location because they will retrieve the preselected flag value and not the data associated with that memory location. When the original requestor is finished with the data associated with the flagged location, a subsequent store is issued to the flagged location, loading the data back into the memory location, which also serves to "clear", i.e., unlock, the resource. It will be observed that in this embodiment of the present invention, each memory location is treated as an independent resource in that the memory location can be locked without affecting other memory locations.

It will also be observed that by using the loadf instruction, the original data value "stored" in the memory location is not actually present in the memory location for the duration of the flagged operation. During this time, the only copy of the data value is the copy in the processor or requestor that first retrieved the data using the loadf instruction. As the data value is generally updated or otherwise modified during the duration of the flagged operation, it will be recognized that the consequence of loss of significant data in the event of a system failure is negligible because the data is in the process of being updated. The load and flag mechanism can also be used as a method of pipelining memory references with conflicts. This applies to both vector and parallel codes. The load and flag mechanism especially provides a solution to the problem of how to vectorize random access to memory while sustaining the memory pipeline. By providing the software running in a multiprocessor environment with a predefined flag value that indicates when a memory location is presently being used by another requestor, the load and flag mechanism supports distributed control over resource lockout. In this manner, the load and flag mechanism provides a powerful means for multithreading, vectorizing, and pipelining of traditionally scalar "Monte Carlo" applications. The term "Monte Carlo" refers to the random nature of the requested memory address stream created by these applications as they attempt to update various memory locations determined by pseudo-random techniques. In prior art, this random address stream prevented the use of pipelines, vectorization, and multithreading because address conflicts might occur. In this invention, the load and flag mechanism does not eliminate these conflicts, rather it supports pipelining of the detection and processing of these conflicts. Because the software running in a multiprocessor can detect when a resource conflict occurred, the hardware does not have to stop the pipeline in order to resolve these conflicts.

It should be noted that memory conflicts for a pipelined application, such as a Monte Carlo application, can occur when only a single processor is working on an application, as well as when multiple processors are working on the application. In the event that a single processor makes a second request to a memory location before it has restored a new data value and, thereby, reset the flag, the present invention operates as a self- checking resource lockout mechanism to prevent one part of the application from getting ahead of another part of the application.

Appendix A

LOADF (v+s) vb6 π JJ kk Gather Vi from Main Memory, addresses from Vj and Sk

Assembly syntax loadf (vj+sk) vi

Hold issue Sk reserved, conditions

Hold init conditions Vj reserved for reading.

Vi reserved for writing.

Memory vector load port unavailable.

VL or VS reserved.

Scatter/Gather address port unavailable.

Function Load each element of vi from main memory from the address in (sk) plus the corresponding element of vj and then write FFF4000000000000 into that memory location. Elements O through (VL)- 1 will be loaded. VM has no effect on this operation.

Time to completion Load port busy, and scatter/gather address port busy: TBD + (VL) cycles.

Vi reserved for reading: TBD.

Vi reserved for writing: TBD + (VL) cycles.

Sk to be read or written: One cycle.

Exceptions If any of the (VL) addresses are not mapped, an Operand Range Error will set that element to NaN and may optionally cause an exception.

Comments The VS register affects the address vector but not the data vector. LOADF (v+q) v b7πjjqq

Gather Vi from Main Memory, addresses from Vj plus constant Assembly syntax loadf (vj+q) vi

Where q is a signed 8-bit number.

Hold issue None conditions

Hold init conditions Vj reserved for reading.

Vi reserved for writing.

Memory vector load port unavailable.

VL or VS reserved.

Scatter/Gather address port unavailable.

Function Load each element of vi from main memory from the address q plus the corresponding element of vj and then write FFF4000000000 into that memory location. Elements O through (VL)- 1 will be loaded. VM has no effect on this operation.

Vi reserved for reading: TBD.

Vi reserved for writing: TBD + (VL) cycles.

Comments The VS register affects the address vector but not the data vector. LOADF (s),sv bδlljj kk

Load Vi from Main Memory, address (sj), stride (sk) Assembly syntax loadf (s/),sfc vi

Hold issue Sj, sk reserved, conditions

Hold init conditions Vi reserved for writing.

Memory vector load port unavailable.

VL reserved.

Function Load (vi) from main memory at address (sj) and then write FFF4000000000000 into that memory location.

Sk contains a signed number which is added to the address after each element is loaded to form the address of the next load. Thus element e will load from address (sj) + (e * (sk)) and then write FFF4000000000000 into that address. Elements O through (VL)- 1 will be loaded. The VS and VM registers have no effect on this operation.

Time to completion Load port busy: TBD + (VL) cycles. Vi reserved for writing: TBD + (VL) cycles. Vi reserved for reading: at least TBD cycles. Sj, sk to be read or written: One cycle.

Exceptions If any of the (VL) addresses are not mapped, an Operand Range Error will set that element to NaN and may optionally cause an exception. LOADF (s+q)s D9 U j j xx

Load Si from Main Memory, address (sj) + q

Assembly syntax loadf (sj q si

Hold issue Si, sj reserved conditions Memory scalar load port unavailable. Function Load (si) from main memory at address (sj) + q and then write FFF4000000000000 into that memory location.

Time to completion Load port busy: One cycle.

Si may be used as an operand: At least TBD cycles, depending on memory conflicts.

Exceptions If the address is not mapped, an Operand Range Error will set Si to NaN and may optionally cause an exception.

LOADF (s+q) v b9Hjjkk

Load Vi from Main Memory, address (sj)+q, stride 1

Assembly syntax loadf (sj)+ q vi Where q is an 8-bit signed number.

Hold issue Sj reserved, conditions

Hold init conditions Vi reserved for writing.

Memory vector load port unavailable. VL reserved.

Function Load (vi) from main memory at address (sj) + q, stride 1.

Element e will load from address (sj) + q + e.

After each read, write FFF4000000000000 to that location.

Elements O through (VLH will be loaded. The

VS and VM registers have no effect on this operation.

Time to completion Load port busy: TBD + (VL) cycles.

Vi reserved for writing: TBD + (VL) cycles. Vi reserved for reading: at least TBD cycles. Sj to be read or written: One cycle.

Exceptions If any of the (VL) addresses are not mapped, an

Operand Range Error will set that element to NaN and may optionally cause an exception.

Although the description of the preferred embodiment has been presented, it is contemplated that various changes could be made without deviating from the spirit of the present invention. Accordingly, it is intended that the scope of the present invention be dictated by the appended claims rather than by the description of the preferred embodiment.

We claim:

Claims

CLAIMS 1. A method for providing a resource lockout mechanism in a shared resource for a multiple requestor system comprising the steps of:

(a) in response to a resource request, routing the request to a resource as requested by an address of the resource request;

(b) decoding a command in the resource request to determine if the command is for an atomic read and write resource operation; and

(c) if the command is for the atomic read and write resource operation, issuing a read operation to the resource, followed by a write operation to the resource as part of the same resource operation, wherein a data value stored in the resource is returned to the requestor, and a predefined flag value is written into the resource.

2. A method for providing a resource lockout mechanism in a shared memory for a multiprocessor system comprising the steps of:

(a) in response to a memory request, routing the memory request to a memory bank in the shared memory as requested by an address in the memory request; (b) decoding a command in the memory request to determine if the command is for an atomic read and write memory operation; and (c) if the command is for the atomic read and write memory operation, issuing a read operation to the address in the memory bank, followed by a write operation to the same address in the memory bank as part of the same memory operation, wherein a data value stored at the address is returned to the processor and a predefined flag value is written into the memory at the address.

3. The method of claim 1 wherein the predefined flag value is a floating point not-a-number value.

4. The method of claim 1 wherein step (c) comprises the following substeps: (cO) issuing a bank busy signal for the memory bank;

(cl) placing the address on an address line for the memory bank; (c2) placing the predefined flag value on a store data bus for the memory bank; (c3) reading the data value onto a load data bus for the memory bank; (c4) issuing a write enable signal to the memory bank; and

(c5) clearing the bank busy signal to the value associated with any subsequent reference.

5. A resource lockout mechanism for a shared memory in a multiprocessor system, wherein each of a plurality of requestors in the multiprocessor system includes means for issuing a memory request, the memory request having means for identifying the type of memory operation being issued, and wherein the shared memory comprises a plurality of memory banks, each memory bank having switching means associated therewith for routing the memory request to the requested memory bank and separate store port means and load port means for providing data access to and from the memory bank, the resource lockout mechanism comprising: address means operably connected to each memory bank for receiving an address for a memory location in the memory bank as part of the memory request and for placing the address on an address port for the memory bank; control means operably connected to each memory bank for generating a plurality of control signals to the memory bank in response to the memory request, including: means for issuing a load operation to the memory bank in response to a load request; means for issuing a store operation to the memory bank in response to a store request and for placing a data value to be stored at the address on the store port means; and means for issuing both a load and a store operation to the memory bank in response to a load and flag request and for placing a predefined flag value to be stored at the address on the store port means, the load and store operation being issued such that the write enable signal is delayed until after the load operation has caused a data value stored at the address to be placed on the load port means.

6. The resource lockout mechanism of daim 4 wherein the predefined flag value is a floating point not-a-number value.

7. A resource lockout mechanism for a shared memory wherein a plurality of requestors may each request access to a preselected memory location in the shared memory, comprising: a load and flag command for issuing in a single memory cyde a load memory operation for providing the data in the selected memory location to the first requestor requesting access to the memory location, and a store command for storing during the same memory operating cyde a predetermined flag value so that subsequent requestors cannot access the data in the selected memory location while it is being utilized by the first requestor.