WO1994003860A1 - Massively parallel computer including auxiliary vector processor - Google Patents

Massively parallel computer including auxiliary vector processor Download PDF

Info

Publication number
WO1994003860A1
WO1994003860A1 PCT/US1993/007415 US9307415W WO9403860A1 WO 1994003860 A1 WO1994003860 A1 WO 1994003860A1 US 9307415 W US9307415 W US 9307415W WO 9403860 A1 WO9403860 A1 WO 9403860A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processor
processing
register
response
Prior art date
Application number
PCT/US1993/007415
Other languages
French (fr)
Inventor
Jon P. Wade
Daniel R. Cassiday
Robert D. Lordi
Guy Lewis Steele, Jr.
Margaret A. St. Pierre
Monica C. Wong-Chan
Zahi S. Abuhamden
David C. Douglas
Mahesh N. Ganmukhi
Jeffrey V. Hill
W. Daniel Hillis
Scott J. Smith
Shaw-Wen Yang
Robert C. Zak, Jr.
Original Assignee
Thinking Machines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thinking Machines Corporation filed Critical Thinking Machines Corporation
Priority to AU48044/93A priority Critical patent/AU4804493A/en
Publication of WO1994003860A1 publication Critical patent/WO1994003860A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • G06F15/8092Array of vector units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17381Two dimensional, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • G06F15/8076Details on data register access
    • G06F15/8084Special arrangements thereof, e.g. mask or switch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow

Definitions

  • the invention relates generally to the field of digital computer systems, and more particularly to massively parallel computer systems.
  • Background Of The Invention Computer system have long been classified according to a taxonomy "SISD” (io ⁇ singje- instruction/single-data), "SIMD” (for single-instruction/multiple-data) and "MIMD” (for "multiple- instruction/multiple-data).
  • SISD singje- instruction/single-data
  • SIMD single-instruction/multiple-data
  • MIMD for "multiple- instruction/multiple-data
  • SISD io ⁇ singje- instruction/single-data
  • SIMD single-instruction/multiple-data
  • MIMD for "multiple- instruction/multiple-data
  • SIMD processors have been developed which incorporate a large number of processing nodes all of which are controlled to operate concurrently on the same instruction stream, but with each processing node processing a separate data stream.
  • MIMD processors have been developed which have a number of processing nodes each controlled separately in response to its own instruction stream.
  • SPMD single-program/multiple-data
  • An SPMD processor includes a number of processing nodes, each controlled separately in response to its own instruction stream, but which may be controlled generally concurrently in response to commands which generally control portions of instruction streams to be processed.
  • An SPMD system thus has the possibility of having a global point of control and synchronization, namely, the source of the commands to be processed, which is present in an SIMD system, with the further possibility ol having local control of processing in response to each of the commands by each of the processing nodes, which is present in an MIMD system.
  • the invention provides a new and improved auxiliary processor for use in connection with a massively parallel computer system.
  • a massively-parallel computer system includes a plurality of processing nodes (11) interconnected by a network (15).
  • Each processing node comprises a network interface (22), a memory module (24), a vector processor (21) and a node processor.
  • the vector processor (21) is connected to the memory module for performing vector data processing operations in connection with data in the memory module in response to vector instructions from the node processor.
  • the node processor (20) is responsive to commands to (i) process data in the memory module (ii) generate vector instructions for controlling the auxiliary processor, and (iii) control the generation of messages by the network interface.
  • the network transfers messages generated by the network interfaces of the processing nodes among the processing nodes thereby to transfer information thereamong.
  • a control arrangement (12, 14) generates commands to control the processing nodes in parallel.
  • FIG. 1 is a general block diagram depicting a massively parallel computer incorporating an auxiliary processor constructed in accordance with the invention
  • Figs. 2A and 2B together comprise a general block diagram of the auxiliary processor depicted in Fig. 1
  • Fig. 3 is a detailed block diagram of the context logic circuit in the auxiliary processor as shown in Fig.2B.
  • Fig. 1 depicts a general block diagram of a massively parallel digital computer system 10 in which an auxiliary processor according to the invention may be used.
  • the computer system 10 includes a plurality of processing nodes 11(0) through 11(N) (generally identified by reference numeral 11) which operate under control of one or more partition managers 12(0) through 12(M) (generally identified by reference numeral 12). Selected ones of the processing nodes ll(x) through ll(y) ("x" and "y” are integers) are assigned to a particular partition manager 12(z) ("z" is an integer), which transmits data processing commands to processing nodes ll(x) through ll(y) defining a particular partition assigned thereto.
  • the processing nodes ll(x) through ll(y) process the data processing commands, generally in parallel, and in response generate status and synchronization information which they transmit among themselves and to the controlling partition manager 12(z).
  • the partition manager 12(z) may use the status and synchronization information in determining the progress of the processing nodes ll(x) through ll(y) in processing the data processing commands, and in determining the timing of transmission of data processing commands to the processing nodes, as well as the selection of particular data processing commands to transmit.
  • processing nodes 11 and partition managers 12 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications.
  • the system further includes one or more input/output processors 13(i) through 13(k) (generally identified by reference numeral 13) which store data and programs which may be transmitted to the processing nodes 11 and partition managers 12 under control of input/output commands from the partition managers 12.
  • the partition managers 12 may enable the processing nodes 11 in particular partitions assigned thereto to transmit processed data to the input/output processors 13 for storage therein.
  • Input/output processors 13 useful in one embodiment of system 10 are described in detail in the aforementioned Wells, et al., patent application.
  • the system 10 further includes a plurality of communications networks, including a control network 14 and a data router 15 which permit the processing nodes 11, partition managers 12 and input/output processors 13 to communicate to transmit data, commands and status and synchronization information thereamong.
  • the control network 14 defines the processing nodes 11 and partition managers 12 assigned to each partition.
  • control network 14 is used by the partition managers 12 to transmit processing and input/output commands to the processing nodes 11 of the partition and by the processing nodes 11 of each partition to transmit status and synchronization information among each other and to the partition manager 12.
  • the control network 14 may also be used to facilitate the down-loading of program instructions by or under control of a partition manager 12(z) to the processing nodes ll(x) through ll(y) of its partition, which the processing nodes execute in the processing of the commands.
  • a control network 14 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications.
  • the data router 15 facilitates the transfer of data among the processing nodes 11, partition managers 12 and input/output processors 13.
  • partitioning of the system is defined with respect to the control network 14, but the processing nodes 11, partition managers and input/output processors 13 can use the data router 15 to transmit data to others in any partition.
  • partition managers 12 use the data router 15 to transmit input/output commands to the input/output processors 13, and the input/output processors 13 use the data router 15 to carry input/output status information to the partition managers 12.
  • a data router 15 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications.
  • system 10 also includes a diagnostic network 16, which facilitates diagnosis of failures, establishes initial operating conditions within the system 10 and conditions the control network 14 to facilitate the establishment of partitions.
  • the diagnostic network 16 operates under control of a diagnostic processor (not shown) which may comprise, for example, one of the partition managers 16.
  • diagnostic network 16 useful in system 10 is also described in connection with the aforementioned Douglas, et al., patent applications.
  • the system 10 operates under control of a common system clock 17, which provides SYS CLK system clocking signals to the components of the system 10.
  • the various components use the SYS CLK signal to synchronize their operations.
  • the processing nodes 11 are similar, and so only one processing node, in particular processing node ll(j) is shown in detail. As shown in Fig.
  • the processing node ll(j) includes a node processor 20, one or more auxiliary processors 21(0) through 21(1) [generally identified by reference numeral 21(i)], and a network interface 22, all of which are interconnected by a processor bus 23.
  • the node processor 20 may comprise a conventional microprocessor, and one embodiment of network interface 22 is described in detail in the aforementioned Douglas, et al., patent applications.
  • Also connected to each auxiliary processor 21(i) are two memory banks 24(0)(A) through 24(I)(B) [generally identified by reference numeral 24(i)(j), where "i” corresponds to the index "i” of the auxiliary processor reference numeral 21(i) and index "j" corresponds to bank identifier "A" or "B”].
  • the memory banks 24(i)(j) contain data and instructions for use by the node processor 20 in a plurality of addressable storage locations (not shown).
  • the addressable storage locations of the collection of memory banks 24(i)(j) of a processing node ll(j) form an address space defined by a plurality of address bits, the bits having a location identifier portion that is headed by an auxiliary processor identifier portion and memory bank identifier.
  • the node processor 20 may initiate the retrieval of the contents of a particular storage location in a memory bank 24(i) j) by transmitting an address over the bus 23 whose auxiliary processor identifier identifies the particular auxiliary processor 21 (i) connected to the memory bank 24(i)(j) containing the location whose contents are to be retrieved, and location identifier identifies the particular memory bank 24(i)(j) and storage location whose contents are to be retrieved.
  • the auxiliary processor 21(i) connected to the memory bank 24(i)(j) which contains the storage location identified by the address signals retrieves the contents of the storage location and transmits them to the node processor 20 over the bus 23.
  • the node processor 20 may enable data or instructions (both generally referred to as "data") to be loaded into a particular storage location by transmitting an address and the data over the bus 23, and the auxiliary processor 21(i) that is connected to the memory bank 24(i)(j) containing the storage location identified by the address signals enables the memory bank 24(i)(j) that is identified by the address signals to store the data in the storage location identified by the address signals.
  • the auxiliary processors 21(1) can process operands, comprising either data provided by the node processor 20 or the contents of storage locations it retrieves from the memory banks 24(i)(j) connected thereto, in response to auxiliary processing instructions transmitted thereto by the node processor 20.
  • the node processor 20 can transmit an auxiliary processing instruction over processor bus 23, which includes the identification of one or more auxiliary processors 21 (i) to execute the instruction, as well as the identification of operands to be processed in response to the auxiliary processing instruction.
  • the identified auxiliary processors 21(i) retrieve operands from the identified locations, perform processing operation(s) and store the resulting operand(s), representing the result of the processing operation(s), in one or more storage location(s) in memory banks 24(i)(j).
  • the auxiliary processors 21(i) are in the form of a "RISC,” or “reduced instruction set computer,” in which retrievals of operands to be processed thereby from, or storage of operands processed thereby in, a memory bank 24(i)(j), are controlled only by explicit instructions, which are termed “load/store” instructions.
  • Load/store instructions enable operands to be transferred between particular storage locations and registers (described below in connection with Figs. 2A and 2B) in the auxiliary processor 21(i).
  • a "load” instruction enables operands to be transferred from one or more storage locations to the registers
  • a "store” instruction enables operands to be transferred from the registers to one or more storage locations.
  • auxiliary processors 21 control transfer of operands to be processed by the auxiliary processor 21(i) as well as operands representing the results of processing by the auxiliary processor 21(i).
  • the node processor 20 and auxiliary processors 21(i) do not use the load/store instructions to control transfers directly between memory banks 24(i)(j) and the node processor 20.
  • Other instructions termed here "auxiliary data processing instructions," control processing in connection with the contents of registers and storage of the results of the processing in such registers.
  • Each auxiliary processing instruction may include both a load/store instruction and an auxiliary data processing instruction.
  • the node processor 20 transmits individual auxiliary processing instructions for processing by individual auxiliary processors 21(i), or by selected groups of auxiliary processors 21(i), or by all auxiliary processors 21(i) on the processing node, generally in parallel.
  • each load/store auxiliary processing instruction is further accompanied by a value which represents an offset, from the base of the particular memory bank 24(i)(j), of a storage location in memory which is to be used in connection with the load/store operation.
  • each auxiliary data processing instruction identifies one or more registers in the auxiliary processor 21(i) whose operands are to be used in execution of the auxiliary data processing instruction.
  • the node processor 20 can, with a single auxiliary data processing instruction transmitted for execution by multiple auxiliary processors 21(i), enable the auxiliary processors 21 (i) to process the matrix elements generally in parallel, which may serve to speed up matrix processing.
  • the auxiliary processors 21(i) enable operands comprising large matrices to be processed very rapidly.
  • Each auxiliary processing instruction can enable an auxiliary processor 21 (i) to process a series of operands as a vector, performing the same operation in connection with each operand, or element, of the vector.
  • auxiliary processor 21(i) processes corresponding elements from the required number of such vectors, performing the same operation in connection with each set of operands. If an auxiliary processing instruction enables an auxiliary processor 21 (i) to so process operands as vectors, the processing of particular sets of operands may be conditioned on the settings of particular flags of a vector mask.
  • each auxiliary processor 21 (i) may process data retrievals and stores for the node processor 20, as well as auxiliary processing instructions, in an overlapped manner. That is, node processor 20 may, for example, initiate a storage or retrieval operation with an auxiliary processor 21 (i) and transmit an auxiliary processing instruction to the auxiliary processor 21(i) before it has finished the storage or retrieval operation. In that example, the auxiliary processor 21 (i) may also begin processing the auxiliary processing instruction before it has finished the retrieval or storage operation.
  • auxiliary processor 21(i) may transmit an auxiliary processing instruction to the auxiliary processor 21(i), and thereafter initiate one or more storage or retrieval operations.
  • the auxiliary processor 21 (i) may, while executing the auxiliary processing instruction, also perform the storage or retrieval operations.
  • Figs. 2A and 2B depict a general block diagram of one embodiment of auxiliary processor 21(i).
  • auxiliary processor 21(i) includes a control interface 30 (Fig. 2A), a memory interface 31 (Fig.
  • the control interface 30 receives storage and retrieval requests (which will generally be termed "remote operations") over processor bus 23.
  • the control interface 30 enables the memory interface 31 to retrieve the contents of the storage location identified by an accompanying address for transfer to the processor 20.
  • the control interface 30 enables the memory interface 31 to store data accompanying the request in a storage location identified by an accompanying address.
  • auxiliary processing instructions which will be generally termed "local operations").
  • auxiliary processing instruction received by the auxiliary processor 21(i) contains a load/store instruction
  • the control interface 30 enables the memory interface 31 and data processor 32 to cooperate to transfer data between one or more storage locations and registers in a register file 34 in the data processor 32.
  • the auxiliary processing instruction contains an auxiliary data processing instruction
  • the control interface 30 enables the data processor 32 to perform the data processing operations as required by the instruction in connection with operands in registers in the register file 34.
  • an auxiliary processing instruction includes both a load/store instruction and an auxiliary data processing instruction, it will enable both a load/stroe and a data processing operation to occur.
  • the memory interface 31 controls storage in and retrieval from the memory banks 24(i)(j) connected thereto during either a remote or local operation.
  • the memory interface 31 receives from the control interface 30 address information, in particular a base address which identifies a storage location at which the storage or retrieval is to begin.
  • the memory interface 31 receives from the control interface 30 other control information. For example, if the storage or retrieval operation is to be in connection with multiple storage locations, the control interface 30 controls the general timing of each successive storage or retrieval operation, in response to which the memory interface 31 generates control signals for enabling a memory bank 24(i)(j) to actually perform the storage or retrieval operation.
  • the control interface 30 provides a stride value, which the memory interface 31 uses in connection with the base address to generate the series of addresses for transmission to a memory banks 24(i)(j).
  • the memory interface 31 receives offset values, which are transmitted from registers in the register file 34 of the data processor 32 under control of the control interface 30, which it uses in connection with the base address to generate addresses for transmission to the memory banks 24(i)(j).
  • the data processor 32 operates in connection with local operations, also under control of the control interface 30, to perform data processing operations in connection with operands stored in its register file 34.
  • the control interface 30 provides register identification information identifying registers containing operands to be processed, as well as control information identifying the particular operation to be performed and the register into which the result is to be loaded. If the local operation is to be in connection with vectors, the control interface 30 also provides information from which the data processor 32 can identify the registers containing operands comprising the vectors, as well as the register in which each result operand is to be loaded.
  • operands comprising successive vector elements may be provided by registers having fixed strides from particular base registers and the control interface will provide the base identifications and stride values.
  • At least some operands may come from registers selected using "indirect" register addressing, as described above in connection with the memory interface 31, and the control interface 30 identifies a base register and a register in the register file 34 which is the base of a table containing register offset values. From the base register identification and the register offset vlues in the table, data processor identifies the registers whose values are to be used as the successive operands.
  • the bus system 33 provides data paths among the control interface 30, memory controller 31 and data processor 32.
  • the bus system 33 includes two buses, identified as an A bus 35 and a B bus 36, as well as two gated drivers 37 and 38 which are controlled by A TO B and B TO A signals from the control interface 30.
  • the control interface 30 includes an address register 40, a data register 41 and a processor bus control circuit 42, all of which are connected to the processor bus 23.
  • the processor bus control circuit 42 receives P CTRL processor bus control signals from the processor bus 23 controlling transfers over the processor bus 23 and when they indicate that an address is on the processor bus, initiating a transfer over the processor bus, enables the address register 40 to latch P ADRS processor address signals from the bus.
  • the data register 41 is connected to receive P DATA processor data signals. If the control signals received by the processor bus control circuit 42 indicate that the processor bus transfer is accompanied by data, it enables the data register 41 to latch the P DATA signals, which comprise the data for the transfer.
  • the processor bus control circuit 42 further notifies a scheduler and dispatcher circuit 43 that an address and data have been received and latched in the address and data registers 40 and 41, respectively.
  • the scheduler and dispatcher 43 examines the LAT ADRS latched address signals coupled by the address register 40 to determine whether the transfer is for the particular auxiliary processor 21(1), and if so, enables the processor bus control circuit 42 to transmit P CTRL processor bus control signals to acknowledge the bus transaction. If the scheduler and dispatcher circuit 43 determines that the LAT ADRS address signals indicate that the transfer is for this auxiliary processor 21(i), it further examines them to determine the nature of the transfer.
  • the address signals may indicate a storage location in a memory bank 24(i)(j), and if so the bus transfer serves to indicate the initiation of a remote operation.
  • the address signals may indicate one of a plurality of registers, which will be described below in connection with Fig.
  • the address signals may indicate that the accompanying P DATA signals comprise an auxiliary processing instruction to be processed by the auxiliary processor 21(i). If the LAT ADRS latched address signals indicate a remote operation in connection with a storage location in a memory bank 24(i)(j), it also identifies a transaction length, that is, a number of storage locations to be involved in the operation.
  • the scheduler and dispatcher circuit 43 When the LAT ADRS latched address signals identify a register, the scheduler and dispatcher circuit 43 enables the contents of the data register 41 to be loaded into the indicated register during a write operation, or the contents of the indicated register to be transferred to the data register 41 for transmission over the processor bus 23 during a read operation. However, if the LAT ADRS latched address signals indicate that the accompanying P DATA processor data signals define an auxiliary processing instruction, the data in the data register 41 is an auxiliary processing instruction initiating a local operation. In response, the scheduler and dispatcher circuit 43 uses the contents of the data register 41 to initiate an operation for the data processor 32.
  • the scheduler and dispatcher circuit 43 uses the low- order portion of the address defined by the LAT ADRS latched address signals to identify a storage location in a memory banks 24(i)(j) to be used in connection with the load/store operation.
  • the control interface 30 further includes two token shift registers, identified as a remote strand 44 and a local strand 45, and a local strand control register set 46.
  • the remote strand 44 comprises a shift register including a series of stages, identified by reference numeral 44(i), where "i" is an index from “0” to “I.”
  • the successive stages 44(i) of the remote strand 44 control successive ones of a series of specific operations performed by the auxiliary processor 21 (i) in performing a remote operation.
  • the local strand 45 comprises a shift register including a series of stages, - identified by reference numeral 45 (k), where "k” is an index from "0” to "K.”
  • the successive stages 45(k) of the local strand 45 control successive ones of a series of operations performed by the auxiliary processor 21(i) during a local operation.
  • the local strand control register set 46 includes a plurality of registers 46(0) through 46(K), each associated with a stage 45 (k) of the local strand 45, and each storing operational information used in controlling a particular operation initiated in connection with the associated stage 45 (k) of the local strand 45.
  • the scheduler and dispatcher circuit 43 transmits REM TOKEN signals comprising a remote token to the remote strand 44, generally to the first stage 44(0).
  • the scheduler and dispatcher circuit 43 will provide successive REM TOKEN remote token signals defining a series of remote tokens.
  • the remote strand 44 shifts each remote token through the successive stages 44(i), it generates MEM CTRL memory control signals that are transmitted to the memory interface 31, in particular, to an address/refresh and control signal generator circuit 50, which receives the low-order portion of the LAT ADRS latched address signals and the MEM CTRL memory control signals from the successive stages 44(i) of the remote strand 44 and in response generates address and control signals in an appropriate sequence for transmission to the memory banks 24(i)(j) to enable them to use the address signals and to control storage if the remote operation is a storage operation.
  • the address/refresh and control signal generator circuit 50 generates "j" ADRS address signals ("j" being an index referencing "A” or "B"), which identify a storage location in the corresponding memory bank 24(i)(j), along with "j" RAS row address strobe, "j” CAS column address strobe and "j” WE write enable signals.
  • Each memory bank 24(i)(j) also is connected to receive from a data interface circuit 51, and transmit to the data interface circuit, "i" DATA data signals representing, during the data to be stored in the respective memory bank 24(i)(j) during a write or store operation or the data to be retrieved during a read or load operation.
  • each memory bank is organized as a logical array comprising a plurality of rows and columns, with each row and column being identified by a row identifier and a column identifier, respectively. Accordingly, each storage location will be uniquely identified by its row and column identifiers.
  • the address/refresh and control signal generator 50 can transmit successive "j" ADRS address signals representing, successively, the row identifier and the column identifier for the storage location, along with successive assertions of the "j" RAS and "j" CAS signals.
  • Each memory bank 24(i)(j) includes, in addition to the storage locations, a data in/out interface register 52(j), which receives and transmits the "j" DATA signals.
  • the memory bank 24(i)(j) loads the contents of the storage locations in the row identified by the "j" ADRS signals, into the data in/out interface register 52(j) and thereafter uses the "j" ADRS signals present when the "j" CAS signal is asserted to select data from the data in/out interfaceregister 52(j) to transmit as the "j” DATA signals.
  • the address/reference and control signal generator 50 may operate in "fast page mode,” enabling a retrieval directly from the data in/out interface register 52(j) by transmitting the column identifier as the "j" DATA signals and asserting the "j" CAS signal, enabling the memory bank 24(i)(j) to transmit the data from that column as the "j" DATA signals.
  • the memory bank 24(i)(j) Since the memory bank 24(i)(j) does not have to re-load the data into the data in/out interface register 52(i) while in the fast page mode, the amount of time required by the memory bank 24(i)(j) to provide the data from the requested storage location can be reduced.
  • a memory bank 24(i)(j) has to load a row, or "page,” into its data in/out interface register 52(j) because the row identifier of the retrieval differs from that of the previous retrieval (which is termed here a "miss page” condition)
  • the retrieval will likely take longer than if the retrieval operation did not result in a miss page condition, because of the extra time required to load the data in/out interface register 52(i).
  • the address/refresh and control signal generator circuit 50 also controls refreshing of the memory banks 24(i)(j).
  • the memory banks 24(i)(j) will initiate a refresh operation if they receive an asserted "j" CAS signal a selected time period before they receive an asserted "j" RAS signal, in so-called “CAS-before-RAS” refreshing.
  • the address/refresh and control signal generator 50 controls the "j" RAS and "j" CAS signals as necessary to enable the memory banks 24(i)(j) to perform refreshing.
  • the address/refresh and control signal generator 50 further generates MEM STATUS memory status signals which indicate selected status information in connection with a memory operation.
  • the timings of an operation enabled by a remote token at a particular stage 44(s) ("s" is an integer) of the remote strand 44 will be delayed, which will be indicated by the condition of the MEM STATUS signals.
  • the remote token at that particular stage 44(s) and the upstream stages 44(0) through 44(s-l) are stalled in their respective stages, and will not be advanced until the stall condition is removed.
  • the scheduler and dispatcher circuit 43 also receives the MEM STATUS memory status signals and will also be stalled in issuing additional remote tokens to the remote strand 44.
  • the scheduler and dispatcher circuit 43 transmits LOC TOKEN signals comprising a local token to the first stage 45(0) of the local strand 45. If the local operation is for a vector of operands, the scheduler and dispatcher circuit 43 will provide LOC TOKEN local token signals defining a series of local tokens. As the local strand 45 shifts the first local token through the successive stages 45 (k), the operational information, which is provided by the auxiliary processing instruction latched in the data register 41, is latched in the corresponding ones of the registers 46(k) of the local strand contro ⁇ register set 46.
  • the local token in each stage 45(0) of the local strand 45 along with operational information stored in each associated register 46(k), provide LOC CTRL local control signals.
  • Some of the LOC CTRL signals are coupled to the address/refresh and control signal generator 50 and if the local operation includes a load/store operation they control the memory interface 31 in a manner similar to that as described above in connection with remote operation to effect a memory access for a load/store operation.
  • the LOC CTRL signals wiil enable the data processor 32 to select a register in the register file 34 and enable it to participate in the load/store operation.
  • the LOC CTRL local control signals will ⁇ enable the data processor 32 to select registers in the register file 34 to provide the operands, to perform the operation, and to store the results in a selected register.
  • the MEM STATUS memory status signals from the address/refresh and control signal generator 50 also may stall selected stages 45 (j) of the local strand 45, in particular at least those stages which enable load/store operations and any stages upstream thereof, under the same conditions and for the same purposes as the remote strand 44. If the MEM STATUS signals enable such a stall, they also stall the scheduler and dispatcher circuit 43 from issuing additional local tokens.
  • the memory interface 31 in addition to the address/refresh and control signal generator 51, includes a data interface circuit 51, which includes an error correction code check and generator circuit (not shown).
  • the data interface 51 under control of the address/refresh and control signal generator 50, receives DATA signals representing the data to be stored from the B bus 36, generates an error correction code in connection therewith, and couples both the data and error correction code as A DATA or B DATA signals, depending on the particular memory bank 24(i)(j) in which the data is to be stored.
  • the data interface 51 under control of the address/refresh and control signal generator 50, receives the A DATA or B DATA signals from the particular storage location in the memory bank 24(i)(j) in which the data is to be stored, and uses the error correction code to check and, if necessary, correct the data.
  • the data interface receives the DATA signals representing the data to be stored from the B bus 36, merges it into the retrieved data, thereafter generates an error correction code in connection therewith, and couples both the data and error correction code as A DATA or B DATA signals, depending on the particular memory bank 24(i)(j) in which the data is to be stored.
  • the data register 41 couples the data onto A bus 35, and the control interface 30 asserted the A TO B signal enabling driver 37 to couple the data signals on A bus 35 onto B bus 36, from which the data interface 51 received them.
  • the store operation is a local operation, the data is provided by the data processor 32, in particular the register file 34,' which couples the data directly onto the B bus 36.
  • the data interface receives the A DATA or B DATA signals, defining the retrieved data and error correction code, from the appropriate memory bank 24(i)(j) and uses the error correction code to verify the correctness of the data. If the data interface 51 determines that the data is correct, it transmits it onto B bus 36. If the operation is a remote operation, the control interface asserts the B TO A signal to enable the gated driver 38 to couple the data on B bus 36 onto A bus 35. The data on A bus 35 is then coupled to the data register 41, which latches it for transmission onto the processor bus 23 as P DATA processor data signals.
  • the data interface 51 determines, during either a retrieval operation of a remote operation or a load operation of a local operation, that the data is incorrect, it uses the error correction code to correct the data before transmitting it onto B bus 36. In addition, if the data interface determines that the data is incorrect, it will also notify the address/refresh and control signal generator 50, which generates MEM STATUS memory status signals enabling a stall of the local and remote strands 45 and 44 and the scheduler and dispatcher circuit 43 while the data interface 51 is performing the error correction operation.
  • the data processor 32 includes the aforementioned register file 34, and further includes a set of register identifier generator circuits 61 through 65, an arithmetic and logic unit ("ALU") and multiplier circuit 66, a context logic circuit 67 and a multiplexer 70.
  • the register file 34 includes a plurality of registers for storing data which may be used as operands for auxiliary processing instructions. Each register is identified by a register identifier comprising a plurality of bits encoded to define a register identifier space.
  • the registers in register file 34 are divided into two register banks 34(A) and 34(B) [generally identified by reference numeral 34(j)], with the high-order bit of the register identifier comprising a register bank identifier that divides the registers into the two register banks.
  • Each register bank 34(j) is associated with one memory bank 24(i)(j).
  • the association between a memory bank 24(i)(j) and a register bank is such that the value of the memory bank identifier which identifies a memory bank 24(i)(j) in the address transmitted over the processor bus 23 corresponds to the value of the register bank identifier.
  • the auxiliary processor 21(i) effectively emulates two auxiliary processors separately processing operands stored in each memory bank 24(i)(j), separately in each register bank 34(j). If an auxiliary processing instruction enables a load/store operation with respect to both register banks, and processing of operands from the two register banks 34(j), the scheduler and dispatcher circuit 43 issues tokens to local strand 45 for alternating register banks 34(j) and the load/store operation and processing proceeds an interleaved fashion with respect to the alternating register banks 34(j).
  • the register file 34 has six ports through which data is transferred to or from a register in response to REG FILE R/W CTRL register file read write control signals from the control interface 30 and the context logic 67.
  • the ports are identified respectively as an L/S DATA load/store data port, an INDIR ADRS DATA indirect address data port, an SRC 1 DATA source (1) data port, a SRC 2 DATA source (2) data port, a SRC 3 DATA source (3) data port and a DEST DATA IN destination data input port.
  • the register identifier circuits 61 through 65 generate register identifier signals for identifying registers whose contents are to be transferred through the respective ports for use as operands, in which processed data is to be stored, or which are to be used in connection with load/store operations or indirect addressing.
  • register identifier circuits 61 through 65 identify registers into which immediate operands, that is, operand values supplied in an auxiliary processing instruction, are to be loaded, and registers in register file 34 to be accessed during a remote operation.
  • a load/store register identification generator circuit 61 generates L/S REG ID load/store register identification signals, which are used to identify registers in the register file 34 into which data received from the B bus 36 through the L/S DATA port is to be loaded during a load operation, or from which data is to be obtained for transfer to the B bus 36 through the L/S DATA port during a store operation.
  • register identifier circuits 62 through 64 provide register identifications for use in connection with processing of operands.
  • a source 1 register identifier generator circuit 62, a source 2 register identifier generator circuit 63, and a destination register identification generator circuit 64 generate, respectively, SRC 1 REG ID and SRC 2 REG ID source 1 and 2 register identification signals and DEST REG ID destination register identification signals. These signals are used to identify registers from which operands are transmitted, respectively, as SRC 1 DATA source 1 data signals through the SRC 1 DATA port, SRC 2 DATA source 2 data signals through the SRC 2 DATA port, and SRC 3 DATA source 3 data signals through the SRC 3 DATA port, all to the ALU and multiplier circuit 66.
  • the ALU and multiplier circuit 66 generates result data in the form of ALU/MULT RESULT result signals, which are directed through the destination data input port DEST DATA IN.
  • the destination data is stored in a destination register, which is identified by the DEST REG ID destination register identification signals from destination register identification generator circuit 64.
  • an indirect address register identifier generator circuit 65 provides a register identification for use in identifying registers in register file 34 into which data from A bus 35 is to be loaded or from which data is to be coupled onto A bus 34.
  • the data may be used in connection with indirect addressing for the memory banks 24(i)(j) as described above.
  • the data may comprise immediate operands to be loaded into a register in register file 34 from an auxiliary processing instruction, or data to be loaded into the register or read from the register during a remote operation.
  • the circuit 65 provides register identifications for a series of registers in the register file 34, with the series of registers containing the diverse offset values for the series of locations in a memory bank 24(i)(j).
  • the indirect address register identifier generator circuit generates INDIR ADRS REG ID indirect address register identification signals which are coupled through the INDIR ADRS DATA indirect address data port.
  • Each register identifier generator circuit 61 through 65 generates the respective register identification signals using register identification values which they receive from the A bus 35, and operates in response to respective XXX REG ID register identification signals ("xxx" refers to the particular register identification generator circuit).
  • the XXX REG ID signals may enable the respective circuit 61 through 65 to iteratively generate one or a series of register identifications, depending on the particular operation to be performed.
  • the ALU and multiplier circuit 66 receives the SRC 1 DATA source 1 data signals, the SRC 2 DATA source 2 data signals, and SRC 3 DATA source 3 data signals and performs an operation in connection therewith as determined by SEL FUNC selected function signals from the multiplexer 70.
  • the multiplexer 70 selectively couples one of the ALU/MULT FUNC function signals, forming part of the LOC CTRL local control signals from the control interface 30, or ALU/MULT NOP no-operation signals as the SEL FUNC selected function signals. If the multiplexer 70 couples the ALU/MULT FUNC signals to the ALU and multiplier circuit 66, the circuit 66 performs an operation in connection with the received signals and generates resulting ALU/MULT RESULT signals, which are coupled to the destination data port on the register file, for storage in the register identified by the DEST REG ID destination register identification signals.
  • the ALU and multiplier circuit 66 generates ALU/MULT STATUS signals which indicate selected status conditions, such as whether the operation resulted in an under- or overflow, a zero result, or a carry.
  • the ALU/MULT STATUS signals are coupled to the context logic 67.
  • the multiplexer 70 couples ALU/MULT NOP no-operation signals to the ALU and multiplier circuit 66, it performs no operation and generates no ALU/MULT RESULT or ALU/MULT STATUS signals.
  • the multiplexer 70 is controlled by the context logic 67. As noted above, and as will be described further below in connection with Fig.
  • auxiliary processor 21 (i) when the auxiliary processor 21 (i) is processing operands as elements of vectors, it may be desirable to selectively disable both load/store and data processing operations with respect to selected vector elements.
  • the context logic 67 determines the elements for which the operations are to be disabled, and controls a FUNC/NOP SEL function/no operation select signal in response.
  • the context logic 65 further controls a DEST WRT COND destination write condition signal, which aids in controlling storage of ALU/MULT RESULT signals in the destination register, and, when it determines that operations for an element are to be disabled, it disables storage for that particular result.
  • auxiliary processor 21 may process data retrievals and stores for the node processor 20, as well as auxiliary processing instructions, in an overlapped manner.
  • the scheduler and dispatcher circuit 43 handles token dispatch scheduling both between operations, as well as within a local or remote operation (that is, between elemental operations within a local or remote operation. It will be appreciated that, for inter- operational scheduling, there are four general patterns, namely: (1) a local operation followed by a local operation; (2) a local operation followed by a remote operation; (3) a remote operation followed by a local operation; and (4) a remote operation followed by a remote operation. It will be appreciated that one purpose for scheduling is to facilitate overlapping of processing in connection with multiple operations, while at the same time limiting the complexity of the control circuitry required for the overlapping.
  • the complexity of the control circuitry is limited by limiting the number of operations that can be overlapped in connection with the remote strand 44 or the local strand 45.
  • the scheduling limits the number of operations, that is, the number of local operations for which tokens can be in the local strand 45 or the number of remote operations for which tokens can be in the remote strand 44, to two.
  • the scheduler and dispatcher circuit 43 ensures that there be a predetermined minimum spacing between the first tokens for each of the two successive operations which it dispatches into a strand 44 or 45 corresponding to one-half the number of stages required for a local operation or a remote operation.
  • the scheduler and dispatcher circuit 43 provides that there be a minimum spacing of eight from the first token of one local operation to the first token of the next local operation. Similarly, the scheduler and dispatcher circuit 43 provides that there be a minimum spacing of four from the first token of one remote operation to the first token of the next remote operation.
  • a further purpose for scheduling is to ensure that no conflict will arise in connection with the use of specific circuits in the auxiliary processor 21 (i), after the dispatch of all of the tokens required for a first operation, from beginning the dispatch of tokens for a subsequent operation. Inter-token, intra-operation scheduling generally has a similar purpose.
  • the operations performed during the successive states are such that it will normally be able to begin a new operation for each token in the local strand 45 for tokens successively dispatched for each tick of the aforementioned global clocking signal.
  • the ALU and multiplier circuit 66 will require a spacing of several ticks, and the scheduler and dispatcher circuit 43 will schedule the dispatch of the successive tokens within the series required for local operation accordingly.
  • the scheduler and dispatcher circuit 43 can generate successive tokens at successive ticks of the global clocking signal.
  • the scheduler and dispatcher circuit 43 after it has finished generating all tokens for such a local operation, can begin generating tokens for a subsequent local operation, subject to the minimum spacing constraint between initial tokens for the operations as described above.
  • the required inter- operation spacing will depend (1) on the sequence of load and store operations, and (2) if the first operation is a store operation, whether a store operation is of the entire storage location: (A) If the first local operation involves a store operation of less than an entire storage location, and the second involves either a load operation or a store operation, the second operation will be delayed to accommodate the use of generation of addresses for both the read and write portions of the initial store operation of the first local operation and (2) for the early states of either a load operation or a store operation for the second local operation.
  • the first local operation involves a store operation of the entire storage location
  • the second local operation involves either a load operation or a store operation of less than an entire storage location
  • the address will be generated only at the beginning of operations for each element of the first local operation, and so a small or zero delay thereafter will be required.
  • C If a local operation involving a load operation is followed by a local operation involving a store operation, the required spacing will also depend on whether the store operation involves an entire storage location. If the store operation does involve an entire storage location, it should be noted that, while the memory addresses will be generated for the same stages for both the load operation and the store operation, the load/store register identifier generator 61 will be used late in the load operation, but relatively early in the store operation.
  • the scheduler and dispatcher circuit 43 will provide a generally large spacing between the first local operation and the second local operation to ensure that the load/store register identifier generator 61 will not be used for the first vector element of the second local operation until the state after the generator 61 has been used for last vector element for the local operation's load operation.
  • the second local operation is a store involving data for less than an entire storage location
  • the load/store register identifier generator 61 will be used in connection with the store operation, which is closer to 1 the stage in which the generator is used in connection with the load operation, and so the spacing
  • 6 token for the second local operation may be dispatched immediately following the last token for the
  • the ALU and multiplier circuit 66 will not accept a new operation at each tick of the global clock 0 signal, the actual spacing will be the greater of the above-identified spacing to accommodate load 1 and store operations and the spacing to accommodate the ALU and multiplier circuit 66. 2
  • the particular spacing enabled for other combinations of local and remote operations are 3 determined in a generally similar manner and will not be described in detail. It will be appreciated, 4 however, that the auxiliary processor 21 (i) may initiate a remote operation, that is, scheduler and 5 dispatcher circuit 43 may begin generating tokens for the remote strand 44, before it has finished 6 generating tokens for a local operation so that the auxiliary processor 21 (i) will begin processing of 7 the remote operation before it begins processing in connection with some of the vector elements of
  • Fig. 3 depicts the details of context logic 67.
  • the context logic 2 includes the vector mask register 104, vector mask mode register, vector mask buffer register 106, 3 and the vector mask direction register 107.
  • the context logic 67 includes separate 4 vector mask registers 104(A) and 104(B) [generally identified by reference numeral 104(j), with index
  • register file 34 is divided into two register banks, each of which loads data from a memory bank
  • Each vector mask register 104(j) is essentially a bi-directional shift register having a number
  • Each vector mask register 104(j) stores a vector mask that determines, if the 35.
  • auxiliary processing instruction calls for processing series of operands as vectors, whether, for each 36 successive vector element or corresponding ones of the vector elements, the operations to be performed will be performed for particular vector elements.
  • the node processor 21(i) prior to providing an auxiliary processing instruction, enable a vector mask to be loaded into the vector mask register by initiating a remote operation identifying one or more of the vector mask registers 104(j) and providing the vector mask as P DATA processor data signals (Fig. 2A), or by enabling the contents of a register in register file 34 or the vector mask buffer register 106(j) to be copied into the vector mask register 104(j).
  • the control interface 30 will latch the P DATA processor data signals in the data register 41, couple them onto A bus 35, and will assert a LD VM PAR -"j" load vector mask parallel bank “j" signal to enable the vector mask register 104(j) to latch the signals on the A bus 35 representing the vector mask.
  • Each vector mask register 104(j) generates at its low-order stage a VM-j(O) signal and at its high-order stage a VM-j(N-l) signal (index "j" corresponding to "A” or "B"), one of which will be used to condition, for the corresponding vector element, the load/store operation if an L/S mode flag 105(B) is set, and processing by the ALU and multiplier circuit 66 of operands from the register file 34 if the ALU mode flag 105(A) is set.
  • Each vector mask register 104(j) can shift its contents in a direction determined by a ROT DIR rotation direction signal corresponding to the condition of the vector mask direction flag controlled by an auxiliary processing instruction.
  • Each vector mask register 104(j) shifts in response to a ROTATE EN rotate enable signal from the control interface 30, which asserts the signal as each successive vector element is processed so that the VM-A(0) or VM-A(N-l) signal is provided corresponding to the bit of the vector mask appropriate to the vector element being processed.
  • the VM-A(0) and VM-A(N-l) signals are coupled to a multiplexer 320 which selectively couples one of them in response to the ROT DIR signal as a SEL VM-A selected vector mask (bank "A”) signal.
  • the SEL VM-A signal is coupled to one input terminal of an exclusive-OR gate 324, which under control of a VM COMP vector mask complement signal of an auxiliary processing instruction, generates a MASKED VE masked vector element signal. It will be appreciated that, if the VM COMP signal is negated, the MASKED VE signal will have the same asserted or negated condition as the SEL VM-A signal, but if the VM COMP signal is asserted the exclusive-OR gate 324 will generate the MASKED VE signal as the complement of the SEL VM-A signal.
  • the MASKED VE signal will control the conditioning of the FUNC/NOP SEL function/no-operation select signal and the DEST WRT COND destination write condition signal by the context logic 67 (Fig. 2B), as well as the generation of the 'j' WE write enable signal by the memory control circuit 50 to control storage in memory banks 24(i)(j) in connection with the corresponding vector element.
  • the circuit 66 generates conventional ALU/MULT STATUS status signals indicating selected information concerning the results of processing, such as whether an overflow or underflow occurred, whether the result was zero, whether a carry was generated, and the like.
  • the context logic 67 uses such status information to generate a status bit that is stored in the vector mask register 104(j) so that, when the contents of the register 104Q) have been fully rotated, the bit will be in the stage corresponding to the vector element for which the status information was generated. That is, if the status bit was generated during processing of operands comprising a vector element "k,” the context logic 67 will enable the status bit to be stored in a stage of the vector mask register 104(j) so that, after all of the vector elements have been processed, the status bit will be in stage "k" of the vector mask 104(j).
  • the status bit can be used to control processing of the "k"-th elements of one or more vectors in response to a subsequent auxiliary processing instruction; this may be useful in, for example, processing of exceptions indicated by the generated status information.
  • the context logic 67 includes an AND circuit 321 that receives the ALU/MULT STATUS status signals from the ALU and multiplier circuit 66 and STATUS MASK signals generated in response to an auxiliary processing instruction.
  • the AND circuit 321 generates a plurality of MASKED STATUS signals, whose asserted or negated condition corresponds to the logical AND of one of the ALU/MULT STATUS signal and an associated one of the STATUS MASK signals.
  • the MASKED STATUS signals are directed to an OR gate 322, which asserts a SEL STATUS selected status signal if any of the MASKED STATUS signals is asserted.
  • the SEL STATUS signal is coupled to the vector mask register 104(j) and provides the status bit that is loaded into the appropriate stage of the vector mask register 104(j) as described above.
  • the particular stage of the vector mask register 104(j) into which the bit is loaded is determined by a vector mask store position select circuit 323 (j) (index "j" corresponding to "A” or "B") which, under control of VECTOR LENGTH signals indicating the length of a vector, and the ROTATE EN rotate enable and ROT DIR rotate direction signals from the control interface 30, generates -"j" POS ID position identification signals to selectively direct the SEL STATUS signal for storage in a particular stage of the correspondingly-indexed vector mask register 104(j).
  • the vector mask register 104(j) stores the bit in the stage identified by the -"j" POS ID position identification signals in response to the assertion of a LD VM SER -"j" load vector mask serial bank "j" signal by the control interface 30.
  • the control interface 30 asserts the LD VM SER -"j" signal to enable the vector mask register 104(j) to store the status bit for each vector element when the SEL STATUS signal representing the status bit appropriate for the particular vector element has been generated.
  • the vector mask store position select circuit will, for a particular vector length and rotation direction, enable the vector mask register 104(j) to latch the SEL STATUS selected status signal in the same stage.
  • the particular stage that is selected will be determined only by the vector length and rotation direction, as indicated by the VECTOR LENGTH and ROT DIR signals, respectively.
  • the vector mask buffer registers 106(A) and 106(B) are used to buffer the vector mask in the correspondingly-indexed vector mask register 104(A) and 104(B).
  • the node processor 20 may load a vector mask into a vector mask register 104(j) of an auxiliary processor 21(i), enable the auxiliary processor 21(i) to buffer the vector mask to the vector mask buffer 106(j), and thereafter issue an auxiliary processing instruction to initiate processing of operands in the form of vectors using the vector mask in the vector mask register 104(j).
  • the ALU and multiplier circuit 66 While executing the auxiliary processing instruction, the ALU and multiplier circuit 66 generates status information which is used to create a vector mask in vector mask register 104(i) as described above.
  • the node processor may then enable the auxiliary processor to use the newly-created vector mask in connection with, for example, processing of exception conditions as indicated by the bits of that vector mask. Thereafter, the node processor 20 may enable the auxiliary processor to restore the original vector mask, currently in the vector mask buffer 106(j) to the vector mask 104(j) for subsequent processing.
  • each vector mask register 104(j) and the correspondingly-indexed vector mask buffer register 106(j) are interconnected so as to permit the contents of each to be loaded into the other.
  • the control interface 30 When enabled by the node processor 20 to buffer a vector mask in a vector mask register 104(j), the control interface 30 asserts a SAVE VMB-"j" vector mask buffer save signal (index "j" corresponding to "A” or "B") which enables the contents of the correspondingly-indexed vector mask register 104(j) to be saved in the vector mask buffer register 106(j).

Abstract

A computer system including a plurality of processing nodes (11) interconnected by a network (15). Each processing node comprises a network interface (22), a memory module (24), a vector processor (21) and a node processor. The vector processor (21) is connected to the memory module for performing vector data processing operations in connection with data in the memory module in response to vector instructions from the node processor. The node processor (20) is responsive to commands to (i) process data in the memory module, (ii) generate vector instructions for controlling the auxiliary processor, and (iii) control the generation of messages by the network interface. The network transfers messages generated by the network interfaces of the processing nodes among the processing nodes thereby to transfer information thereamong. A control arrangement (12, 14) generates commands to control the processing nodes in parallel.

Description

Massively Parallel Computer Including Auxiliary Vector Processor
Field Of The Invention The invention relates generally to the field of digital computer systems, and more particularly to massively parallel computer systems. Background Of The Invention Computer system have long been classified according to a taxonomy "SISD" (ioτ singje- instruction/single-data), "SIMD" (for single-instruction/multiple-data) and "MIMD" (for "multiple- instruction/multiple-data). In an SISD system, a single processor operates in response to a single instruction stream on a single data stream. However, if a program requires the same program segment to be used to operate on a number of diverse data items to produce a number of calculations, the program causes the processor to loop through that segment for each data item. In some cases, in which the program segment is short or there are only a few data elements, the time required to perform such a calculation may not be unduly long. However, for many types of such programs, SISD processors would require a very long time to perform all of the calculations required. Accordingly, SIMD processors have been developed which incorporate a large number of processing nodes all of which are controlled to operate concurrently on the same instruction stream, but with each processing node processing a separate data stream. On the other hand, if a program requires generally independent program segments to be used on diverse data items, the segments may be processed concurrently but using separate instruction streams. For such cases, MIMD processors have been developed which have a number of processing nodes each controlled separately in response to its own instruction stream. The flexibility of separate control in an MIMD system can be advantageous in some circumstances, but problems can arise when it is necessary to synchronize operations by the processing nodes, which may occur when, for example, transfers of data are required thereamong. Since all operations of an SIMD system are controlled by a global point of control, synchronization is provided by that global point of control. Recently, "SPMD" (for "single-program/multiple-data) systems have been developed which have many of the benefits of both SIMD and MIMD systems. An SPMD processor includes a number of processing nodes, each controlled separately in response to its own instruction stream, but which may be controlled generally concurrently in response to commands which generally control portions of instruction streams to be processed. An SPMD system thus has the possibility of having a global point of control and synchronization, namely, the source of the commands to be processed, which is present in an SIMD system, with the further possibility ol having local control of processing in response to each of the commands by each of the processing nodes, which is present in an MIMD system. Summary Of The Invention The invention provides a new and improved auxiliary processor for use in connection with a massively parallel computer system. In brief summary, a massively-parallel computer system includes a plurality of processing nodes (11) interconnected by a network (15). Each processing node comprises a network interface (22), a memory module (24), a vector processor (21) and a node processor. The vector processor (21) is connected to the memory module for performing vector data processing operations in connection with data in the memory module in response to vector instructions from the node processor. The node processor (20) is responsive to commands to (i) process data in the memory module (ii) generate vector instructions for controlling the auxiliary processor, and (iii) control the generation of messages by the network interface. The network transfers messages generated by the network interfaces of the processing nodes among the processing nodes thereby to transfer information thereamong. A control arrangement (12, 14) generates commands to control the processing nodes in parallel. Brief Description Of The Drawings This invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which: Fig. 1 is a general block diagram depicting a massively parallel computer incorporating an auxiliary processor constructed in accordance with the invention; Figs. 2A and 2B together comprise a general block diagram of the auxiliary processor depicted in Fig. 1; and Fig. 3 is a detailed block diagram of the context logic circuit in the auxiliary processor as shown in Fig.2B. Detailed Description of an Illustrative Embodiment Fig. 1 depicts a general block diagram of a massively parallel digital computer system 10 in which an auxiliary processor according to the invention may be used. With reference to Fig. 1, the computer system 10 includes a plurality of processing nodes 11(0) through 11(N) (generally identified by reference numeral 11) which operate under control of one or more partition managers 12(0) through 12(M) (generally identified by reference numeral 12). Selected ones of the processing nodes ll(x) through ll(y) ("x" and "y" are integers) are assigned to a particular partition manager 12(z) ("z" is an integer), which transmits data processing commands to processing nodes ll(x) through ll(y) defining a particular partition assigned thereto. The processing nodes ll(x) through ll(y) process the data processing commands, generally in parallel, and in response generate status and synchronization information which they transmit among themselves and to the controlling partition manager 12(z). The partition manager 12(z) may use the status and synchronization information in determining the progress of the processing nodes ll(x) through ll(y) in processing the data processing commands, and in determining the timing of transmission of data processing commands to the processing nodes, as well as the selection of particular data processing commands to transmit. One embodiment of processing nodes 11 and partition managers 12 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications. The system further includes one or more input/output processors 13(i) through 13(k) (generally identified by reference numeral 13) which store data and programs which may be transmitted to the processing nodes 11 and partition managers 12 under control of input/output commands from the partition managers 12. In addition, the partition managers 12 may enable the processing nodes 11 in particular partitions assigned thereto to transmit processed data to the input/output processors 13 for storage therein. Input/output processors 13 useful in one embodiment of system 10 are described in detail in the aforementioned Wells, et al., patent application. The system 10 further includes a plurality of communications networks, including a control network 14 and a data router 15 which permit the processing nodes 11, partition managers 12 and input/output processors 13 to communicate to transmit data, commands and status and synchronization information thereamong. The control network 14 defines the processing nodes 11 and partition managers 12 assigned to each partition. In addition, the control network 14 is used by the partition managers 12 to transmit processing and input/output commands to the processing nodes 11 of the partition and by the processing nodes 11 of each partition to transmit status and synchronization information among each other and to the partition manager 12. The control network 14 may also be used to facilitate the down-loading of program instructions by or under control of a partition manager 12(z) to the processing nodes ll(x) through ll(y) of its partition, which the processing nodes execute in the processing of the commands. A control network 14 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications. The data router 15 facilitates the transfer of data among the processing nodes 11, partition managers 12 and input/output processors 13. In one embodiment, described in the aforementioned Douglas, et al., patent applications, partitioning of the system is defined with respect to the control network 14, but the processing nodes 11, partition managers and input/output processors 13 can use the data router 15 to transmit data to others in any partition. In addition, in that embodiment the partition managers 12 use the data router 15 to transmit input/output commands to the input/output processors 13, and the input/output processors 13 use the data router 15 to carry input/output status information to the partition managers 12. A data router 15 useful in one embodiment of system 10 is described in detail in the aforementioned Douglas, et al., patent applications. One embodiment of system 10 also includes a diagnostic network 16, which facilitates diagnosis of failures, establishes initial operating conditions within the system 10 and conditions the control network 14 to facilitate the establishment of partitions. The diagnostic network 16 operates under control of a diagnostic processor (not shown) which may comprise, for example, one of the partition managers 16. One embodiment of diagnostic network 16 useful in system 10 is also described in connection with the aforementioned Douglas, et al., patent applications. The system 10 operates under control of a common system clock 17, which provides SYS CLK system clocking signals to the components of the system 10. The various components use the SYS CLK signal to synchronize their operations. The processing nodes 11 are similar, and so only one processing node, in particular processing node ll(j) is shown in detail. As shown in Fig. 1, the processing node ll(j) includes a node processor 20, one or more auxiliary processors 21(0) through 21(1) [generally identified by reference numeral 21(i)], and a network interface 22, all of which are interconnected by a processor bus 23. The node processor 20 may comprise a conventional microprocessor, and one embodiment of network interface 22 is described in detail in the aforementioned Douglas, et al., patent applications. Also connected to each auxiliary processor 21(i) are two memory banks 24(0)(A) through 24(I)(B) [generally identified by reference numeral 24(i)(j), where "i" corresponds to the index "i" of the auxiliary processor reference numeral 21(i) and index "j" corresponds to bank identifier "A" or "B"]. The memory banks 24(i)(j) contain data and instructions for use by the node processor 20 in a plurality of addressable storage locations (not shown). The addressable storage locations of the collection of memory banks 24(i)(j) of a processing node ll(j) form an address space defined by a plurality of address bits, the bits having a location identifier portion that is headed by an auxiliary processor identifier portion and memory bank identifier. The node processor 20 may initiate the retrieval of the contents of a particular storage location in a memory bank 24(i) j) by transmitting an address over the bus 23 whose auxiliary processor identifier identifies the particular auxiliary processor 21 (i) connected to the memory bank 24(i)(j) containing the location whose contents are to be retrieved, and location identifier identifies the particular memory bank 24(i)(j) and storage location whose contents are to be retrieved. In response, the auxiliary processor 21(i) connected to the memory bank 24(i)(j) which contains the storage location identified by the address signals retrieves the contents of the storage location and transmits them to the node processor 20 over the bus 23. Similarly, the node processor 20 may enable data or instructions (both generally referred to as "data") to be loaded into a particular storage location by transmitting an address and the data over the bus 23, and the auxiliary processor 21(i) that is connected to the memory bank 24(i)(j) containing the storage location identified by the address signals enables the memory bank 24(i)(j) that is identified by the address signals to store the data in the storage location identified by the address signals. In addition, the auxiliary processors 21(1) can process operands, comprising either data provided by the node processor 20 or the contents of storage locations it retrieves from the memory banks 24(i)(j) connected thereto, in response to auxiliary processing instructions transmitted thereto by the node processor 20. To enable processing by an auxiliary processor 21 (i), the node processor 20 can transmit an auxiliary processing instruction over processor bus 23, which includes the identification of one or more auxiliary processors 21 (i) to execute the instruction, as well as the identification of operands to be processed in response to the auxiliary processing instruction. In response to the auxiliary processing instructions, the identified auxiliary processors 21(i) retrieve operands from the identified locations, perform processing operation(s) and store the resulting operand(s), representing the result of the processing operation(s), in one or more storage location(s) in memory banks 24(i)(j). In one particular embodiment, the auxiliary processors 21(i) are in the form of a "RISC," or "reduced instruction set computer," in which retrievals of operands to be processed thereby from, or storage of operands processed thereby in, a memory bank 24(i)(j), are controlled only by explicit instructions, which are termed "load/store" instructions. Load/store instructions enable operands to be transferred between particular storage locations and registers (described below in connection with Figs. 2A and 2B) in the auxiliary processor 21(i). A "load" instruction enables operands to be transferred from one or more storage locations to the registers, and a "store" instruction enables operands to be transferred from the registers to one or more storage locations. It should be noted that the load/store instructions processed by the auxiliary processors 21 (i) control transfer of operands to be processed by the auxiliary processor 21(i) as well as operands representing the results of processing by the auxiliary processor 21(i). The node processor 20 and auxiliary processors 21(i) do not use the load/store instructions to control transfers directly between memory banks 24(i)(j) and the node processor 20. Other instructions, termed here "auxiliary data processing instructions," control processing in connection with the contents of registers and storage of the results of the processing in such registers. Each auxiliary processing instruction may include both a load/store instruction and an auxiliary data processing instruction. The node processor 20 transmits individual auxiliary processing instructions for processing by individual auxiliary processors 21(i), or by selected groups of auxiliary processors 21(i), or by all auxiliary processors 21(i) on the processing node, generally in parallel. As will be described below in connection with Fig. 2C in greater detail, each load/store auxiliary processing instruction is further accompanied by a value which represents an offset, from the base of the particular memory bank 24(i)(j), of a storage location in memory which is to be used in connection with the load/store operation. As noted above, each auxiliary data processing instruction identifies one or more registers in the auxiliary processor 21(i) whose operands are to be used in execution of the auxiliary data processing instruction. Accordingly, if, for example, operands represent matrix elements which are distributed among the auxiliary processors, the node processor 20 can, with a single auxiliary data processing instruction transmitted for execution by multiple auxiliary processors 21(i), enable the auxiliary processors 21 (i) to process the matrix elements generally in parallel, which may serve to speed up matrix processing. In addition, since such processing may be performed on all processing nodes 11 of a partition generally concurrently and in parallel, the auxiliary processors 21(i) enable operands comprising large matrices to be processed very rapidly. Each auxiliary processing instruction can enable an auxiliary processor 21 (i) to process a series of operands as a vector, performing the same operation in connection with each operand, or element, of the vector. If a operation initiated by a particular auxiliary processing instruction requires one ("monadic") operand, only one vector is required. However, if an operation requires two ("dyadic") or three ("triadic") operands, the auxiliary processor 21(i) processes corresponding elements from the required number of such vectors, performing the same operation in connection with each set of operands. If an auxiliary processing instruction enables an auxiliary processor 21 (i) to so process operands as vectors, the processing of particular sets of operands may be conditioned on the settings of particular flags of a vector mask. An auxiliary processing instruction which does not enable processing of series of operands as a vector is said to initiate a "scalar" operation, and the operands therefor are in the form of "scalar" operands. As will be further described in more detail below, each auxiliary processor 21 (i) may process data retrievals and stores for the node processor 20, as well as auxiliary processing instructions, in an overlapped manner. That is, node processor 20 may, for example, initiate a storage or retrieval operation with an auxiliary processor 21 (i) and transmit an auxiliary processing instruction to the auxiliary processor 21(i) before it has finished the storage or retrieval operation. In that example, the auxiliary processor 21 (i) may also begin processing the auxiliary processing instruction before it has finished the retrieval or storage operation. Similarly, the node processor 20 may transmit an auxiliary processing instruction to the auxiliary processor 21(i), and thereafter initiate one or more storage or retrieval operations. The auxiliary processor 21 (i) may, while executing the auxiliary processing instruction, also perform the storage or retrieval operations. With this background, the structure and operation of an auxiliary processor 21 (i) will be described in connection with Figs. 2A through 3. In one particular embodiment, the structure and operation of the auxiliary processors 21 are all similar. Figs. 2A and 2B depict a general block diagram of one embodiment of auxiliary processor 21(i). With reference to Figs. 2A and 2B, auxiliary processor 21(i) includes a control interface 30 (Fig. 2A), a memory interface 31 (Fig. 2A), and a data processor 32 (Fig. 2B), all interconnected by a bus system 33 (the bus system 33 is depicted on both Figs. 2A and 2B). The control interface 30 receives storage and retrieval requests (which will generally be termed "remote operations") over processor bus 23. For a retrieval operation, the control interface 30 enables the memory interface 31 to retrieve the contents of the storage location identified by an accompanying address for transfer to the processor 20. For a storage operation, the control interface 30 enables the memory interface 31 to store data accompanying the request in a storage location identified by an accompanying address. In addition, the control interface 30 receives auxiliary processing instructions (which will be generally termed "local operations"). If a auxiliary processing instruction received by the auxiliary processor 21(i) contains a load/store instruction, the control interface 30 enables the memory interface 31 and data processor 32 to cooperate to transfer data between one or more storage locations and registers in a register file 34 in the data processor 32. If the auxiliary processing instruction contains an auxiliary data processing instruction, the control interface 30 enables the data processor 32 to perform the data processing operations as required by the instruction in connection with operands in registers in the register file 34. If an auxiliary processing instruction includes both a load/store instruction and an auxiliary data processing instruction, it will enable both a load/stroe and a data processing operation to occur. As noted above, the memory interface 31 controls storage in and retrieval from the memory banks 24(i)(j) connected thereto during either a remote or local operation. In that function, the memory interface 31 receives from the control interface 30 address information, in particular a base address which identifies a storage location at which the storage or retrieval is to begin. In addition, the memory interface 31 receives from the control interface 30 other control information. For example, if the storage or retrieval operation is to be in connection with multiple storage locations, the control interface 30 controls the general timing of each successive storage or retrieval operation, in response to which the memory interface 31 generates control signals for enabling a memory bank 24(i)(j) to actually perform the storage or retrieval operation. In addition, if the storage or retrieval operation is to be in connection with a series of storage locations whose addresses are separated by a fixed "stride" value, the control interface 30 provides a stride value, which the memory interface 31 uses in connection with the base address to generate the series of addresses for transmission to a memory banks 24(i)(j). On the other hand, if the storage or retrieval operation is to be in connection with "indirect" addresses, in which the storage locations are at addresses which are diverse offsets from the base address, the memory interface 31 receives offset values, which are transmitted from registers in the register file 34 of the data processor 32 under control of the control interface 30, which it uses in connection with the base address to generate addresses for transmission to the memory banks 24(i)(j). As further noted above, the data processor 32 operates in connection with local operations, also under control of the control interface 30, to perform data processing operations in connection with operands stored in its register file 34. In that connection the control interface 30 provides register identification information identifying registers containing operands to be processed, as well as control information identifying the particular operation to be performed and the register into which the result is to be loaded. If the local operation is to be in connection with vectors, the control interface 30 also provides information from which the data processor 32 can identify the registers containing operands comprising the vectors, as well as the register in which each result operand is to be loaded. As in memory operations, operands comprising successive vector elements may be provided by registers having fixed strides from particular base registers and the control interface will provide the base identifications and stride values. In addition, at least some operands may come from registers selected using "indirect" register addressing, as described above in connection with the memory interface 31, and the control interface 30 identifies a base register and a register in the register file 34 which is the base of a table containing register offset values. From the base register identification and the register offset vlues in the table, data processor identifies the registers whose values are to be used as the successive operands. With reference to Figs. 2A and 2B, the bus system 33 provides data paths among the control interface 30, memory controller 31 and data processor 32. The bus system 33 includes two buses, identified as an A bus 35 and a B bus 36, as well as two gated drivers 37 and 38 which are controlled by A TO B and B TO A signals from the control interface 30. If both gated drivers 37 and 38 are disabled, which occurs if both A TO B and B TO A signals are negated, the A bus 35 and B bus 36 are isolated from each other. If, however, the control interface 30 asserts the A TO B signal, the gated driver 37 couples signals on the A bus 35 onto the B bus 36. Similarly, if the control interface asserts the B TO A signal, the gated driver 38 couples signals on the B bus 36 onto the A bus 35. With reference to Fig. 2A, the control interface 30 includes an address register 40, a data register 41 and a processor bus control circuit 42, all of which are connected to the processor bus 23. The processor bus control circuit 42 receives P CTRL processor bus control signals from the processor bus 23 controlling transfers over the processor bus 23 and when they indicate that an address is on the processor bus, initiating a transfer over the processor bus, enables the address register 40 to latch P ADRS processor address signals from the bus. The data register 41 is connected to receive P DATA processor data signals. If the control signals received by the processor bus control circuit 42 indicate that the processor bus transfer is accompanied by data, it enables the data register 41 to latch the P DATA signals, which comprise the data for the transfer. The processor bus control circuit 42 further notifies a scheduler and dispatcher circuit 43 that an address and data have been received and latched in the address and data registers 40 and 41, respectively. In response, the scheduler and dispatcher 43 examines the LAT ADRS latched address signals coupled by the address register 40 to determine whether the transfer is for the particular auxiliary processor 21(1), and if so, enables the processor bus control circuit 42 to transmit P CTRL processor bus control signals to acknowledge the bus transaction. If the scheduler and dispatcher circuit 43 determines that the LAT ADRS address signals indicate that the transfer is for this auxiliary processor 21(i), it further examines them to determine the nature of the transfer. In particular, the address signals may indicate a storage location in a memory bank 24(i)(j), and if so the bus transfer serves to indicate the initiation of a remote operation. Similarly, the address signals may indicate one of a plurality of registers, which will be described below in connection with Fig. 2C, which are located on the auxiliary processor 21(i) itself, and if so the address signals also serve to indicate the initiation of a remote operation. In addition, the P ADRS signals may indicate that the accompanying P DATA signals comprise an auxiliary processing instruction to be processed by the auxiliary processor 21(i). If the LAT ADRS latched address signals indicate a remote operation in connection with a storage location in a memory bank 24(i)(j), it also identifies a transaction length, that is, a number of storage locations to be involved in the operation. When the LAT ADRS latched address signals identify a register, the scheduler and dispatcher circuit 43 enables the contents of the data register 41 to be loaded into the indicated register during a write operation, or the contents of the indicated register to be transferred to the data register 41 for transmission over the processor bus 23 during a read operation. However, if the LAT ADRS latched address signals indicate that the accompanying P DATA processor data signals define an auxiliary processing instruction, the data in the data register 41 is an auxiliary processing instruction initiating a local operation. In response, the scheduler and dispatcher circuit 43 uses the contents of the data register 41 to initiate an operation for the data processor 32. In addition, if the local operation includes a load/store operation, the scheduler and dispatcher circuit 43 uses the low- order portion of the address defined by the LAT ADRS latched address signals to identify a storage location in a memory banks 24(i)(j) to be used in connection with the load/store operation. The control interface 30 further includes two token shift registers, identified as a remote strand 44 and a local strand 45, and a local strand control register set 46. The remote strand 44 comprises a shift register including a series of stages, identified by reference numeral 44(i), where "i" is an index from "0" to "I." The successive stages 44(i) of the remote strand 44 control successive ones of a series of specific operations performed by the auxiliary processor 21 (i) in performing a remote operation. Similarly, the local strand 45 comprises a shift register including a series of stages, - identified by reference numeral 45 (k), where "k" is an index from "0" to "K." The successive stages 45(k) of the local strand 45 control successive ones of a series of operations performed by the auxiliary processor 21(i) during a local operation. The local strand control register set 46 includes a plurality of registers 46(0) through 46(K), each associated with a stage 45 (k) of the local strand 45, and each storing operational information used in controlling a particular operation initiated in connection with the associated stage 45 (k) of the local strand 45. To initiate a remote operation involving a storage location in a memory bank 24(i)(j), the scheduler and dispatcher circuit 43 transmits REM TOKEN signals comprising a remote token to the remote strand 44, generally to the first stage 44(0). If the LAT ADRS latched address signals identify a transaction length greater than one word, referencing a transfer with a like number of storage locations, the scheduler and dispatcher circuit 43 will provide successive REM TOKEN remote token signals defining a series of remote tokens. As the remote strand 44 shifts each remote token through the successive stages 44(i), it generates MEM CTRL memory control signals that are transmitted to the memory interface 31, in particular, to an address/refresh and control signal generator circuit 50, which receives the low-order portion of the LAT ADRS latched address signals and the MEM CTRL memory control signals from the successive stages 44(i) of the remote strand 44 and in response generates address and control signals in an appropriate sequence for transmission to the memory banks 24(i)(j) to enable them to use the address signals and to control storage if the remote operation is a storage operation. In particular, the address/refresh and control signal generator circuit 50 generates "j" ADRS address signals ("j" being an index referencing "A" or "B"), which identify a storage location in the corresponding memory bank 24(i)(j), along with "j" RAS row address strobe, "j" CAS column address strobe and "j" WE write enable signals. Each memory bank 24(i)(j) also is connected to receive from a data interface circuit 51, and transmit to the data interface circuit, "i" DATA data signals representing, during the data to be stored in the respective memory bank 24(i)(j) during a write or store operation or the data to be retrieved during a read or load operation. As is conventional, the storage locations in each memory bank are organized as a logical array comprising a plurality of rows and columns, with each row and column being identified by a row identifier and a column identifier, respectively. Accordingly, each storage location will be uniquely identified by its row and column identifiers. In accessing a storage location in a memory bank 24(i)(j), the address/refresh and control signal generator 50 can transmit successive "j" ADRS address signals representing, successively, the row identifier and the column identifier for the storage location, along with successive assertions of the "j" RAS and "j" CAS signals. Each memory bank 24(i)(j) includes, in addition to the storage locations, a data in/out interface register 52(j), which receives and transmits the "j" DATA signals. During a retrieval from a memory bank 24(i)(j), in response to the "j" ADRS signals and the assertion of the "j" RAS signal, the memory bank 24(i)(j) loads the contents of the storage locations in the row identified by the "j" ADRS signals, into the data in/out interface register 52(j) and thereafter uses the "j" ADRS signals present when the "j" CAS signal is asserted to select data from the data in/out interfaceregister 52(j) to transmit as the "j" DATA signals. If subsequent retrievals from the memory bank 24(i)(j) are from storage locations in the same row, which is termed a "page," the address/reference and control signal generator 50 may operate in "fast page mode," enabling a retrieval directly from the data in/out interface register 52(j) by transmitting the column identifier as the "j" DATA signals and asserting the "j" CAS signal, enabling the memory bank 24(i)(j) to transmit the data from that column as the "j" DATA signals. Since the memory bank 24(i)(j) does not have to re-load the data into the data in/out interface register 52(i) while in the fast page mode, the amount of time required by the memory bank 24(i)(j) to provide the data from the requested storage location can be reduced. Otherwise stated, if, to respond to a retrieval, a memory bank 24(i)(j) has to load a row, or "page," into its data in/out interface register 52(j) because the row identifier of the retrieval differs from that of the previous retrieval (which is termed here a "miss page" condition), the retrieval will likely take longer than if the retrieval operation did not result in a miss page condition, because of the extra time required to load the data in/out interface register 52(i). The address/refresh and control signal generator circuit 50 also controls refreshing of the memory banks 24(i)(j). In one embodiment, the memory banks 24(i)(j) will initiate a refresh operation if they receive an asserted "j" CAS signal a selected time period before they receive an asserted "j" RAS signal, in so-called "CAS-before-RAS" refreshing. In that embodiment, the address/refresh and control signal generator 50 controls the "j" RAS and "j" CAS signals as necessary to enable the memory banks 24(i)(j) to perform refreshing. The address/refresh and control signal generator 50 further generates MEM STATUS memory status signals which indicate selected status information in connection with a memory operation. In connection with certain occurrences, such as a miss page condition as described above and others as will be described below, the timings of an operation enabled by a remote token at a particular stage 44(s) ("s" is an integer) of the remote strand 44 will be delayed, which will be indicated by the condition of the MEM STATUS signals. When that occurs, the remote token at that particular stage 44(s) and the upstream stages 44(0) through 44(s-l) are stalled in their respective stages, and will not be advanced until the stall condition is removed. The scheduler and dispatcher circuit 43 also receives the MEM STATUS memory status signals and will also be stalled in issuing additional remote tokens to the remote strand 44. To initiate a local operation, including a load/store operation, the scheduler and dispatcher circuit 43 transmits LOC TOKEN signals comprising a local token to the first stage 45(0) of the local strand 45. If the local operation is for a vector of operands, the scheduler and dispatcher circuit 43 will provide LOC TOKEN local token signals defining a series of local tokens. As the local strand 45 shifts the first local token through the successive stages 45 (k), the operational information, which is provided by the auxiliary processing instruction latched in the data register 41, is latched in the corresponding ones of the registers 46(k) of the local strand contro^ register set 46. The local token in each stage 45(0) of the local strand 45, along with operational information stored in each associated register 46(k), provide LOC CTRL local control signals. Some of the LOC CTRL signals are coupled to the address/refresh and control signal generator 50 and if the local operation includes a load/store operation they control the memory interface 31 in a manner similar to that as described above in connection with remote operation to effect a memory access for a load/store operation. In addition, the LOC CTRL signals wiil enable the data processor 32 to select a register in the register file 34 and enable it to participate in the load/store operation. If, on the other hand, the local operation includes an auxiliary data processing operation, the LOC CTRL local control signals will ■ enable the data processor 32 to select registers in the register file 34 to provide the operands, to perform the operation, and to store the results in a selected register. The MEM STATUS memory status signals from the address/refresh and control signal generator 50 also may stall selected stages 45 (j) of the local strand 45, in particular at least those stages which enable load/store operations and any stages upstream thereof, under the same conditions and for the same purposes as the remote strand 44. If the MEM STATUS signals enable such a stall, they also stall the scheduler and dispatcher circuit 43 from issuing additional local tokens. The memory interface 31, in addition to the address/refresh and control signal generator 51, includes a data interface circuit 51, which includes an error correction code check and generator circuit (not shown). During a store operation of a remote operation or during a load/store operation in which the data to be stored is for an entire storage location in a memory bank 24(i)(j), the data interface 51, under control of the address/refresh and control signal generator 50, receives DATA signals representing the data to be stored from the B bus 36, generates an error correction code in connection therewith, and couples both the data and error correction code as A DATA or B DATA signals, depending on the particular memory bank 24(i)(j) in which the data is to be stored. If the data to be stored is less than an entire storage location in a memory bank 24(i)(j), the data interface 51, under control of the address/refresh and control signal generator 50, receives the A DATA or B DATA signals from the particular storage location in the memory bank 24(i)(j) in which the data is to be stored, and uses the error correction code to check and, if necessary, correct the data. In addition, the data interface receives the DATA signals representing the data to be stored from the B bus 36, merges it into the retrieved data, thereafter generates an error correction code in connection therewith, and couples both the data and error correction code as A DATA or B DATA signals, depending on the particular memory bank 24(i)(j) in which the data is to be stored. In either case, if the store operation is a remote operation, the data is provided by the data register 41. In particular, the data register 41 couples the data onto A bus 35, and the control interface 30 asserted the A TO B signal enabling driver 37 to couple the data signals on A bus 35 onto B bus 36, from which the data interface 51 received them. On the other hand, if the store operation is a local operation, the data is provided by the data processor 32, in particular the register file 34,' which couples the data directly onto the B bus 36. During a retrieval operation of a remote operation or during a load operation of a local operation, the data interface receives the A DATA or B DATA signals, defining the retrieved data and error correction code, from the appropriate memory bank 24(i)(j) and uses the error correction code to verify the correctness of the data. If the data interface 51 determines that the data is correct, it transmits it onto B bus 36. If the operation is a remote operation, the control interface asserts the B TO A signal to enable the gated driver 38 to couple the data on B bus 36 onto A bus 35. The data on A bus 35 is then coupled to the data register 41, which latches it for transmission onto the processor bus 23 as P DATA processor data signals. On the other hand, if the operation is a local operation, the data is transferred from B bus 36 to the register file 34 for storage in an appropriate register. If the data interface 51 determines, during either a retrieval operation of a remote operation or a load operation of a local operation, that the data is incorrect, it uses the error correction code to correct the data before transmitting it onto B bus 36. In addition, if the data interface determines that the data is incorrect, it will also notify the address/refresh and control signal generator 50, which generates MEM STATUS memory status signals enabling a stall of the local and remote strands 45 and 44 and the scheduler and dispatcher circuit 43 while the data interface 51 is performing the error correction operation. With reference to Fig.2B, the data processor 32 includes the aforementioned register file 34, and further includes a set of register identifier generator circuits 61 through 65, an arithmetic and logic unit ("ALU") and multiplier circuit 66, a context logic circuit 67 and a multiplexer 70. The register file 34 includes a plurality of registers for storing data which may be used as operands for auxiliary processing instructions. Each register is identified by a register identifier comprising a plurality of bits encoded to define a register identifier space. The registers in register file 34 are divided into two register banks 34(A) and 34(B) [generally identified by reference numeral 34(j)], with the high-order bit of the register identifier comprising a register bank identifier that divides the registers into the two register banks. Each register bank 34(j) is associated with one memory bank 24(i)(j). The association between a memory bank 24(i)(j) and a register bank is such that the value of the memory bank identifier which identifies a memory bank 24(i)(j) in the address transmitted over the processor bus 23 corresponds to the value of the register bank identifier. In one embodiment, the auxiliary processor 21(i) effectively emulates two auxiliary processors separately processing operands stored in each memory bank 24(i)(j), separately in each register bank 34(j). If an auxiliary processing instruction enables a load/store operation with respect to both register banks, and processing of operands from the two register banks 34(j), the scheduler and dispatcher circuit 43 issues tokens to local strand 45 for alternating register banks 34(j) and the load/store operation and processing proceeds an interleaved fashion with respect to the alternating register banks 34(j). The register file 34 has six ports through which data is transferred to or from a register in response to REG FILE R/W CTRL register file read write control signals from the control interface 30 and the context logic 67. The ports are identified respectively as an L/S DATA load/store data port, an INDIR ADRS DATA indirect address data port, an SRC 1 DATA source (1) data port, a SRC 2 DATA source (2) data port, a SRC 3 DATA source (3) data port and a DEST DATA IN destination data input port. The register identifier circuits 61 through 65 generate register identifier signals for identifying registers whose contents are to be transferred through the respective ports for use as operands, in which processed data is to be stored, or which are to be used in connection with load/store operations or indirect addressing. In addition, the register identifier circuits 61 through 65 identify registers into which immediate operands, that is, operand values supplied in an auxiliary processing instruction, are to be loaded, and registers in register file 34 to be accessed during a remote operation. In particular, a load/store register identification generator circuit 61 generates L/S REG ID load/store register identification signals, which are used to identify registers in the register file 34 into which data received from the B bus 36 through the L/S DATA port is to be loaded during a load operation, or from which data is to be obtained for transfer to the B bus 36 through the L/S DATA port during a store operation. Several register identifier circuits 62 through 64 provide register identifications for use in connection with processing of operands. A source 1 register identifier generator circuit 62, a source 2 register identifier generator circuit 63, and a destination register identification generator circuit 64 generate, respectively, SRC 1 REG ID and SRC 2 REG ID source 1 and 2 register identification signals and DEST REG ID destination register identification signals. These signals are used to identify registers from which operands are transmitted, respectively, as SRC 1 DATA source 1 data signals through the SRC 1 DATA port, SRC 2 DATA source 2 data signals through the SRC 2 DATA port, and SRC 3 DATA source 3 data signals through the SRC 3 DATA port, all to the ALU and multiplier circuit 66. The ALU and multiplier circuit 66 generates result data in the form of ALU/MULT RESULT result signals, which are directed through the destination data input port DEST DATA IN. The destination data is stored in a destination register, which is identified by the DEST REG ID destination register identification signals from destination register identification generator circuit 64. During a load operation, if the load/store register identification generator circuit 61 identifies the same register in register file 34 as one of the source rgister identifier generator circuits 62 through 64, the register file 34, in addition to loading the data in the register identified by the load/store register identification generator circuit 61, will at the same time supply the data as SCR (i) DATA signals through the particular SRC (i) DATA port whose register identifier generator circuit 62, 63 or 64 identifies the register. Finally, an indirect address register identifier generator circuit 65 provides a register identification for use in identifying registers in register file 34 into which data from A bus 35 is to be loaded or from which data is to be coupled onto A bus 34. The data may be used in connection with indirect addressing for the memory banks 24(i)(j) as described above. In addition, the data may comprise immediate operands to be loaded into a register in register file 34 from an auxiliary processing instruction, or data to be loaded into the register or read from the register during a remote operation. In indirect addressing, the circuit 65 provides register identifications for a series of registers in the register file 34, with the series of registers containing the diverse offset values for the series of locations in a memory bank 24(i)(j). The indirect address register identifier generator circuit generates INDIR ADRS REG ID indirect address register identification signals which are coupled through the INDIR ADRS DATA indirect address data port. Each register identifier generator circuit 61 through 65 generates the respective register identification signals using register identification values which they receive from the A bus 35, and operates in response to respective XXX REG ID register identification signals ("xxx" refers to the particular register identification generator circuit). The XXX REG ID signals may enable the respective circuit 61 through 65 to iteratively generate one or a series of register identifications, depending on the particular operation to be performed. The ALU and multiplier circuit 66 receives the SRC 1 DATA source 1 data signals, the SRC 2 DATA source 2 data signals, and SRC 3 DATA source 3 data signals and performs an operation in connection therewith as determined by SEL FUNC selected function signals from the multiplexer 70. The multiplexer 70, in turn, selectively couples one of the ALU/MULT FUNC function signals, forming part of the LOC CTRL local control signals from the control interface 30, or ALU/MULT NOP no-operation signals as the SEL FUNC selected function signals. If the multiplexer 70 couples the ALU/MULT FUNC signals to the ALU and multiplier circuit 66, the circuit 66 performs an operation in connection with the received signals and generates resulting ALU/MULT RESULT signals, which are coupled to the destination data port on the register file, for storage in the register identified by the DEST REG ID destination register identification signals. In addition, the ALU and multiplier circuit 66 generates ALU/MULT STATUS signals which indicate selected status conditions, such as whether the operation resulted in an under- or overflow, a zero result, or a carry. The ALU/MULT STATUS signals are coupled to the context logic 67. On the other hand, if the multiplexer 70 couples ALU/MULT NOP no-operation signals to the ALU and multiplier circuit 66, it performs no operation and generates no ALU/MULT RESULT or ALU/MULT STATUS signals. The multiplexer 70 is controlled by the context logic 67. As noted above, and as will be described further below in connection with Fig. 6, when the auxiliary processor 21 (i) is processing operands as elements of vectors, it may be desirable to selectively disable both load/store and data processing operations with respect to selected vector elements. The context logic 67 determines the elements for which the operations are to be disabled, and controls a FUNC/NOP SEL function/no operation select signal in response. The context logic 65 further controls a DEST WRT COND destination write condition signal, which aids in controlling storage of ALU/MULT RESULT signals in the destination register, and, when it determines that operations for an element are to be disabled, it disables storage for that particular result. As noted above, auxiliary processor 21 may process data retrievals and stores for the node processor 20, as well as auxiliary processing instructions, in an overlapped manner. This is accomplished by the control interface 30, in particular by the scheduler and dispatcher circuit 43, in connection with dispatching tokens. The scheduler and dispatcher circuit 43 handles token dispatch scheduling both between operations, as well as within a local or remote operation (that is, between elemental operations within a local or remote operation. It will be appreciated that, for inter- operational scheduling, there are four general patterns, namely: (1) a local operation followed by a local operation; (2) a local operation followed by a remote operation; (3) a remote operation followed by a local operation; and (4) a remote operation followed by a remote operation. It will be appreciated that one purpose for scheduling is to facilitate overlapping of processing in connection with multiple operations, while at the same time limiting the complexity of the control circuitry required for the overlapping. The complexity of the control circuitry is limited by limiting the number of operations that can be overlapped in connection with the remote strand 44 or the local strand 45. In one particular embodiment, the scheduling limits the number of operations, that is, the number of local operations for which tokens can be in the local strand 45 or the number of remote operations for which tokens can be in the remote strand 44, to two. To accomplish that, the scheduler and dispatcher circuit 43 ensures that there be a predetermined minimum spacing between the first tokens for each of the two successive operations which it dispatches into a strand 44 or 45 corresponding to one-half the number of stages required for a local operation or a remote operation. Thus, for a local operation, the scheduler and dispatcher circuit 43 provides that there be a minimum spacing of eight from the first token of one local operation to the first token of the next local operation. Similarly, the scheduler and dispatcher circuit 43 provides that there be a minimum spacing of four from the first token of one remote operation to the first token of the next remote operation. A further purpose for scheduling is to ensure that no conflict will arise in connection with the use of specific circuits in the auxiliary processor 21 (i), after the dispatch of all of the tokens required for a first operation, from beginning the dispatch of tokens for a subsequent operation. Inter-token, intra-operation scheduling generally has a similar purpose. Conflicts may particularly arise in connection with use of the memory interface 31 in accessing of memory banks 24(i)(j) during a load, store, write or read operation, and also in connection with use of the bus system 33 in connection with transfer of information thereover at various points in a memory access. For example, for a store operation in which data for less than an entire storage location is stored, requiring first a read of the location, followed by a merge of the new data in the data from the location, followed by a write operation, it will be appreciated that certain components of the memory interface 50 will be used for both the read and write operations for each vector element, and so the intra-operation inter- to ken spacing will be such as to accommodate the use of the address generator for the write operation. In addition, for the ALU and multiplier circuit 66 (Fig. 2B) in one particular embodiment, the operations performed during the successive states are such that it will normally be able to begin a new operation for each token in the local strand 45 for tokens successively dispatched for each tick of the aforementioned global clocking signal. However, for some types of complex operations, the ALU and multiplier circuit 66 will require a spacing of several ticks, and the scheduler and dispatcher circuit 43 will schedule the dispatch of the successive tokens within the series required for local operation accordingly. It will be appreciated, therefore, that for local operations which do not include a load or a store operation, and for which the ALU and multiplier circuit 66 can initiate a new operation for tokens dispatched at each clock tick, the scheduler and dispatcher circuit 43 can generate successive tokens at successive ticks of the global clocking signal. In addition, the scheduler and dispatcher circuit 43, after it has finished generating all tokens for such a local operation, can begin generating tokens for a subsequent local operation, subject to the minimum spacing constraint between initial tokens for the operations as described above. On the other hand, if the successive local operations involve load or store operations, ignoring any spacing to accommodate the ALU and multiplier circuit 66, the required inter- operation spacing, will depend (1) on the sequence of load and store operations, and (2) if the first operation is a store operation, whether a store operation is of the entire storage location: (A) If the first local operation involves a store operation of less than an entire storage location, and the second involves either a load operation or a store operation, the second operation will be delayed to accommodate the use of generation of addresses for both the read and write portions of the initial store operation of the first local operation and (2) for the early states of either a load operation or a store operation for the second local operation. (B) If the first local operation involves a store operation of the entire storage location, and the second local operation involves either a load operation or a store operation of less than an entire storage location, it will be appreciated that the address will be generated only at the beginning of operations for each element of the first local operation, and so a small or zero delay thereafter will be required. (C) If a local operation involving a load operation is followed by a local operation involving a store operation, the required spacing will also depend on whether the store operation involves an entire storage location. If the store operation does involve an entire storage location, it should be noted that, while the memory addresses will be generated for the same stages for both the load operation and the store operation, the load/store register identifier generator 61 will be used late in the load operation, but relatively early in the store operation. Accordingly, the scheduler and dispatcher circuit 43 will provide a generally large spacing between the first local operation and the second local operation to ensure that the load/store register identifier generator 61 will not be used for the first vector element of the second local operation until the state after the generator 61 has been used for last vector element for the local operation's load operation. On the other hand, if the second local operation is a store involving data for less than an entire storage location, the load/store register identifier generator 61 will be used in connection with the store operation, which is closer to 1 the stage in which the generator is used in connection with the load operation, and so the spacing
2 provided by the scheduler and dispatcher circuit 43 will substantially less.
3 (D) Finally, if two successive local operations both involve load operations, since the
4 progression of operations through the successive stages will be the same for both local operations,
5 and the various circuits of the auxiliary processor 21(i) are not used in two diverse states, the first
6 token for the second local operation may be dispatched immediately following the last token for the
7 first local operation.
8 It will be appreciated that, if the computation operation required for the local operation is such that
9 the ALU and multiplier circuit 66 will not accept a new operation at each tick of the global clock 0 signal, the actual spacing will be the greater of the above-identified spacing to accommodate load 1 and store operations and the spacing to accommodate the ALU and multiplier circuit 66. 2 The particular spacing enabled for other combinations of local and remote operations are 3 determined in a generally similar manner and will not be described in detail. It will be appreciated, 4 however, that the auxiliary processor 21 (i) may initiate a remote operation, that is, scheduler and 5 dispatcher circuit 43 may begin generating tokens for the remote strand 44, before it has finished 6 generating tokens for a local operation so that the auxiliary processor 21 (i) will begin processing of 7 the remote operation before it begins processing in connection with some of the vector elements of
18 the prior local operation. This can occur, for example, if the local operation has no load or store
19 operation, in which case the memory interface 31 will not be used during processing of the local 0 operation. 1 Fig. 3 depicts the details of context logic 67. With reference to Fig. 3, the context logic 2 includes the vector mask register 104, vector mask mode register, vector mask buffer register 106, 3 and the vector mask direction register 107. In particular, the context logic 67 includes separate 4 vector mask registers 104(A) and 104(B) [generally identified by reference numeral 104(j), with index
25 "j" referencing "A" or "B"] each of which is associated with a separate vector mask buffer register
26 106(A) and 106(B) [generally identified by reference numeral 106(j)]. As described above, the
27 register file 34 is divided into two register banks, each of which loads data from a memory bank
28 24(i)(j), and from which data is stored to a memory bank 24(i)(j), having the same index "j." Each
29 vector register 104 ( j ) and each vector mask register 106(j) is used in connection with auxiliary
30 processing instructions involving operands from the correspondingly-indexed register bank 34(j).
31 Each vector mask register 104(j) is essentially a bi-directional shift register having a number
32 of stages corresponding to a predetermined maximum number "N" of vector elements, for each
33 register bank 34(j), that the auxiliary processor 21(i) can process in response to an auxiliary
34 processing instruction. Each vector mask register 104(j) stores a vector mask that determines, if the 35. auxiliary processing instruction calls for processing series of operands as vectors, whether, for each 36 successive vector element or corresponding ones of the vector elements, the operations to be performed will be performed for particular vector elements. The node processor 21(i), prior to providing an auxiliary processing instruction, enable a vector mask to be loaded into the vector mask register by initiating a remote operation identifying one or more of the vector mask registers 104(j) and providing the vector mask as P DATA processor data signals (Fig. 2A), or by enabling the contents of a register in register file 34 or the vector mask buffer register 106(j) to be copied into the vector mask register 104(j). The control interface 30 will latch the P DATA processor data signals in the data register 41, couple them onto A bus 35, and will assert a LD VM PAR -"j" load vector mask parallel bank "j" signal to enable the vector mask register 104(j) to latch the signals on the A bus 35 representing the vector mask. Each vector mask register 104(j) generates at its low-order stage a VM-j(O) signal and at its high-order stage a VM-j(N-l) signal (index "j" corresponding to "A" or "B"), one of which will be used to condition, for the corresponding vector element, the load/store operation if an L/S mode flag 105(B) is set, and processing by the ALU and multiplier circuit 66 of operands from the register file 34 if the ALU mode flag 105(A) is set. Each vector mask register 104(j) can shift its contents in a direction determined by a ROT DIR rotation direction signal corresponding to the condition of the vector mask direction flag controlled by an auxiliary processing instruction. Each vector mask register 104(j) shifts in response to a ROTATE EN rotate enable signal from the control interface 30, which asserts the signal as each successive vector element is processed so that the VM-A(0) or VM-A(N-l) signal is provided corresponding to the bit of the vector mask appropriate to the vector element being processed. The VM-A(0) and VM-A(N-l) signals are coupled to a multiplexer 320 which selectively couples one of them in response to the ROT DIR signal as a SEL VM-A selected vector mask (bank "A") signal. The SEL VM-A signal is coupled to one input terminal of an exclusive-OR gate 324, which under control of a VM COMP vector mask complement signal of an auxiliary processing instruction, generates a MASKED VE masked vector element signal. It will be appreciated that, if the VM COMP signal is negated, the MASKED VE signal will have the same asserted or negated condition as the SEL VM-A signal, but if the VM COMP signal is asserted the exclusive-OR gate 324 will generate the MASKED VE signal as the complement of the SEL VM-A signal. In either case, the MASKED VE signal will control the conditioning of the FUNC/NOP SEL function/no-operation select signal and the DEST WRT COND destination write condition signal by the context logic 67 (Fig. 2B), as well as the generation of the 'j' WE write enable signal by the memory control circuit 50 to control storage in memory banks 24(i)(j) in connection with the corresponding vector element. During processing of vector elements by the ALU and multiplier circuit 66, the circuit 66 generates conventional ALU/MULT STATUS status signals indicating selected information concerning the results of processing, such as whether an overflow or underflow occurred, whether the result was zero, whether a carry was generated, and the like. The context logic 67 uses such status information to generate a status bit that is stored in the vector mask register 104(j) so that, when the contents of the register 104Q) have been fully rotated, the bit will be in the stage corresponding to the vector element for which the status information was generated. That is, if the status bit was generated during processing of operands comprising a vector element "k," the context logic 67 will enable the status bit to be stored in a stage of the vector mask register 104(j) so that, after all of the vector elements have been processed, the status bit will be in stage "k" of the vector mask 104(j). Accordingly, the status bit can be used to control processing of the "k"-th elements of one or more vectors in response to a subsequent auxiliary processing instruction; this may be useful in, for example, processing of exceptions indicated by the generated status information. To generate the status bit for storage in the vector mask register 104(j), the context logic 67 includes an AND circuit 321 that receives the ALU/MULT STATUS status signals from the ALU and multiplier circuit 66 and STATUS MASK signals generated in response to an auxiliary processing instruction. The AND circuit 321 generates a plurality of MASKED STATUS signals, whose asserted or negated condition corresponds to the logical AND of one of the ALU/MULT STATUS signal and an associated one of the STATUS MASK signals. The MASKED STATUS signals are directed to an OR gate 322, which asserts a SEL STATUS selected status signal if any of the MASKED STATUS signals is asserted. The SEL STATUS signal is coupled to the vector mask register 104(j) and provides the status bit that is loaded into the appropriate stage of the vector mask register 104(j) as described above. The particular stage of the vector mask register 104(j) into which the bit is loaded is determined by a vector mask store position select circuit 323 (j) (index "j" corresponding to "A" or "B") which, under control of VECTOR LENGTH signals indicating the length of a vector, and the ROTATE EN rotate enable and ROT DIR rotate direction signals from the control interface 30, generates -"j" POS ID position identification signals to selectively direct the SEL STATUS signal for storage in a particular stage of the correspondingly-indexed vector mask register 104(j). The vector mask register 104(j) stores the bit in the stage identified by the -"j" POS ID position identification signals in response to the assertion of a LD VM SER -"j" load vector mask serial bank "j" signal by the control interface 30. The control interface 30 asserts the LD VM SER -"j" signal to enable the vector mask register 104(j) to store the status bit for each vector element when the SEL STATUS signal representing the status bit appropriate for the particular vector element has been generated. It will be appreciated that the vector mask store position select circuit will, for a particular vector length and rotation direction, enable the vector mask register 104(j) to latch the SEL STATUS selected status signal in the same stage. The particular stage that is selected will be determined only by the vector length and rotation direction, as indicated by the VECTOR LENGTH and ROT DIR signals, respectively. The vector mask buffer registers 106(A) and 106(B) are used to buffer the vector mask in the correspondingly-indexed vector mask register 104(A) and 104(B). For example, the node processor 20 may load a vector mask into a vector mask register 104(j) of an auxiliary processor 21(i), enable the auxiliary processor 21(i) to buffer the vector mask to the vector mask buffer 106(j), and thereafter issue an auxiliary processing instruction to initiate processing of operands in the form of vectors using the vector mask in the vector mask register 104(j). While executing the auxiliary processing instruction, the ALU and multiplier circuit 66 generates status information which is used to create a vector mask in vector mask register 104(i) as described above. The node processor may then enable the auxiliary processor to use the newly-created vector mask in connection with, for example, processing of exception conditions as indicated by the bits of that vector mask. Thereafter, the node processor 20 may enable the auxiliary processor to restore the original vector mask, currently in the vector mask buffer 106(j) to the vector mask 104(j) for subsequent processing. To accomplish this, each vector mask register 104(j) and the correspondingly-indexed vector mask buffer register 106(j) are interconnected so as to permit the contents of each to be loaded into the other. When enabled by the node processor 20 to buffer a vector mask in a vector mask register 104(j), the control interface 30 asserts a SAVE VMB-"j" vector mask buffer save signal (index "j" corresponding to "A" or "B") which enables the contents of the correspondingly-indexed vector mask register 104(j) to be saved in the vector mask buffer register 106(j). Similarly, when enabled by the node processor 20 to restore a vector mask from a vector mask buffer register 106(j), the control interface 30 asserts a RESTORE VMB-"j" vector mask restore signal (index "j" corresponding to "A" or "B") which enables the contents of the correspondingly-indexed vector mask buffer register 106(j) to be loaded into the vector mask register 104(j). The foregoing description has been limited to a specific embodiment of this invention. It will be apparent, however, that various variations and modifications may be made to the invention, with the attainment of some or all of the advantages of the invention. It is the object of the appended claims to cover these and such other variations and modifications as come within the true spirit and scope of the invention. What is claimed as new and desired to be secured by Letters Patent of the United States is:

Claims

Clainis 1. A massively-parallel computer comprising: A. a plurality of processing nodes (11), each processing node comprising: i. a network interface (22) for generating and receiving messages; ii. at least one memory module (24) for storing data; iii. a vector processor (21) connected to said memory module for performing vector data processing operations in connection with data in said memory module in response to vector instructions; and iv. a node processor (20) being responsive to commands to (i) process data in said memory module (ii) generate vector instructions for controlling said auxiliary processor, and (iii) control the generation of messages by said network interface; B. a network (15) for transferring messages generated by said network interfaces among said processing nodes thereby to transfer information among said processing nodes; and C. a control arrangement (12, 14) for generating commands to control said processing nodes in parallel. 2. A computer as defined in claim 1 in which said control arrangement comprises: A. a control node (12) for generating commands; and B. a network (14) for transferring commands generated by said control node to said processing nodes to control operations thereof in parallel. 3. A massively-parallel computer comprising a plurality of processing nodes (11) and at least one control node (12) interconnected by a network (14, 15) for facilitating the transfer of data among the processing nodes and of commands from the control node to the processing nodes, each processing node comprising: A. a network interface (22) for transmitting data over, and receiving data and commands from, said network; B. at least one memory module (24) for storing data; C. a node processor (20) for receiving commands received by the network interface and for processing data in response thereto, said node processor generating memory access requests for facilitating the retrieval of data from or storage of data in said memory module, said node processor further controlling the transfer of data over said network by said network interface; and D. an auxiliary processor (21) connected to said memory module for: (i) in response to memory access requests from said node processor, performing a memory access operation to store data received from said node processor in said memory module, or to retrieve data from said memory module for transfer to said node processor, and (ii) in response to auxiliary processing instructions from said node processor, performing data processing operations in connection with data in said memory module. 4. A computer as defined in claim 3 in which said auxiliary processor includes: A. a memory interface (31) connected to said memory module for performing memory access operations in connection with said memory module in response to memory access control signals; B. a data processor (32) for performing data processing operations in response to data processing control signals; and C. a control interface (30) for receiving memory access requests from said node processor and for generating memory access control signals in response thereto, and auxiliary processing instructions from said node processor and for generating data processing control signals in response thereto. 5. A computer as defined in claim 4 in which said control interface further selectively generates memory access control signals in response to receipt of auxiliary processing instructions to thereby enable said memory interface to perform a memory access operation to selectively retrieve data from said memory module for transfer to said data processor or to transfer data from said data processor to said memory module for storage. 6. A computer as defined in claim 5 in which: A. said memory module stores data in a plurality of storage locations each identified by an address; and B. said control interface, in connection with an auxiliary processing instruction, receives an address and a data processing operation identifier identifying one of a plurality of data processing operations, said control interface enabling said memory interface to perform a memory access operation to selectively transfer data between the storage location and the data processor, said control interface further enabling said data processor to perform a data processing operation as identified by said data processing operation identifier. 7. A computer as defined in claim 6 in which said control interface, in connection with an auxiliary processing instruction, further receives a load/store identifier identifying a load operation or a store operation, said control interface in response to a load/store identifier identifying a load operation enabling said memory module to retrieve data from a storage location identified by the received address for transfer to said data processor, and in response to a load/store identifier identifying a store operation enabling said memory module to store data received from said data processor in a storage location identified by the received address. 8. A computer as defined in claim 7 in which: A. said data processor includes a register file (34) including a plurality of registers each identified by a register identification and a data processing circuit (66), said load/store identifier further including a register identifier; and B. said control interface enabling said data processor to i. store data retrieved from said memory module in a register identified by said register identifier if said load/store identifier identifies a load operation, and ii. retrieve data from a register identified by said register identifier for transfer to said memory module if said load/store identifier identifies a store operation. 9. A computer as defined in claim 8 in which, in response to data processing control signals from said control circuit, said register file transfers input data representing contents of selected ones of said registers to said data processing circuit, said data processing circuit generating in response processed data representing a selected function as selected by said data processing control signals of the input data, said data processing circuit transferring the processed data to said register file for storage in a selected register. 10. A computer as defined in claim 9 in which, in response to an auxiliary processing instruction, said control circuit generates data processing control signals to enable, for each of a plurality of successive elemental operations, A. said register file to transfer input data items representing the contents of selected registers to said data processing circuit, and receive processed data items from said data processing circuit for storage in selected registers, the input data items provided for each elemental operation and processed data items received for each elemental operation representing vector elements of corresponding vectors; and B. said data processing circuit to, in response to said input data items from said register file, generate processed data items for transfer to the register file for storage. 11. A computer as defined in claim 10 in which said control circuit further includes a conditionalizing circuit (67) for selectively disabling storage of processed data items in said register file for selected elemental operations. 12. A computer as defined in claim 11 in which said conditionalizing circuit includes: A. a vector mask register (104) including a plurality of vector mask bits, each vector mask bit being associated with an elemental operation, and each bit having a selected condition; B. a mask bit selection circuit for selecting a vector mask bit of said vector mask register for an elemental operation; and C. a storage control circuit for controlling storage of processed data items by said register file for an elemental operation in response to the condition of the selected vector mask bit. 13. A computer as defined in claim 12 in which said conditionalizing circuit further includes a processor mode flag (105(A)) having a selected condition, the storage control circuit further operating in response to the condition of said processor mode flag, in response to the processor mode flag having one selected condition the storage control circuit controlling storage of processed data by said register file in response to the condition of the selected vector mask bit, and in response to the processor mode flag having a second selected condition the storage control circuit enabling storage of processed data items by said register file. 14. A computer as defined in claim 13 in which, in response to a load/store instruction, said control circuit generates memory access control signals to enable, for each of a plurality of successive elemental operations, said memory interface to perform a memory access operation and said register file to perform a register access operation to selectively facilitate the transfer of data between a selected storage location of said memory module and a selected register of said register file. 15. A computer as defined in claim 14 in which said conditionalizing circuit further selectively disables transfer of data by said register file and memory interface for selected elemental operations in response to the conditions of the vector mask bits, said conditionalizing circuit including a load/store mode flag (105(B) having selected conditions for selectively controlling use of said vector mask bits to disable such transfers. 16. A computer as defined in claim 10 in which operations for each elemental operation in response to an auxiliary processing instruction proceed through a sequence of processing stages, said control circuit in each stage generating processing stage control signals for enabling said register file and said data processing circuit to perform predetermined operations in said stage, said control circuit including: A. a token generator (43) for, in response to receipt of an auxiliary processing instruction, generating a series of data processing enabling tokens corresponding to the number of elemental operation to be performed; B. a data processing control signal generator (45 and 46) comprising a series of data processing token shift register stages corresponding to the number of processing stages, said data processing token shift register stages iteratively receiving data processing enabling tokens from said token generator and shifting them therethrough, at each stage generating processing stage control signals for enabling said register file and said data processing circuit to perform predetermined operations for the associated processing stage. 17. A computer as defined in claim 16 in which said token generator controls the initial generation of data processing enabling tokens in response to receipt by the auxiliary processor of an auxiliary processing instruction so as to have a selected spacing relationship in said data processing token shift register stages with data processing enabling tokens for a preceding auxiliary processing instruction. 18. A computer as defined in claim 17 in which: A. said token generator further .generates memory access tokens in response to the receipt of memory access requests; and B. said control circuit further comprises a memory access control signal generator (44) comprising a series of memory access token shift register stages each corresponding a stage in a memory access operation, said token shift register iteratively receiving data processing enabling tokens from said token generator and shifting them through said memory access token shift register stages, at each stage generating memory access stage control signals for controlling said memory interface to perform a memory access. 19. A computer as defined in claim 18 in which said token generator controls the initial generation of memory access enabling tokens in response to receipt by the auxiliary processor of a memory access request so as to have a selected spacing relationship in said memory access token shift register stages with memory access tokens for a preceding memory access request. 20. A computer as defined in claim 19 in which said token generator: A. further controls the initial generation of memory access enabling tokens in response to receipt by the auxiliary processor of a memory access request so as to have a selected spacing relationship with a corresponding data processing token shift register stage of said data processing control signal generator, and B. further controls the initial generation of data processing enabling tokens in response to receipt by the auxiliary processor of a data processing enabling token so as to have a selected spacing relationship with a corresponding memory access token shift register stage of said memory access control signal generator.
PCT/US1993/007415 1992-08-07 1993-08-06 Massively parallel computer including auxiliary vector processor WO1994003860A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU48044/93A AU4804493A (en) 1992-08-07 1993-08-06 Massively parallel computer including auxiliary vector processor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US92698092A 1992-08-07 1992-08-07
US07/926,980 1992-08-07

Publications (1)

Publication Number Publication Date
WO1994003860A1 true WO1994003860A1 (en) 1994-02-17

Family

ID=25453980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/007415 WO1994003860A1 (en) 1992-08-07 1993-08-06 Massively parallel computer including auxiliary vector processor

Country Status (3)

Country Link
US (2) US5872987A (en)
AU (1) AU4804493A (en)
WO (1) WO1994003860A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0734139A2 (en) * 1995-03-22 1996-09-25 Nec Corporation A data transfer device with cluster control
US20220197993A1 (en) * 2022-03-11 2022-06-23 Intel Corporation Compartment isolation for load store forwarding

Families Citing this family (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994003860A1 (en) * 1992-08-07 1994-02-17 Thinking Machines Corporation Massively parallel computer including auxiliary vector processor
KR100584964B1 (en) * 1996-01-24 2006-05-29 선 마이크로시스템즈 인코퍼레이티드 Apparatuses for stack caching
US5956518A (en) * 1996-04-11 1999-09-21 Massachusetts Institute Of Technology Intermediate-grain reconfigurable processing device
US8225003B2 (en) * 1996-11-29 2012-07-17 Ellis Iii Frampton E Computers and microchips with a portion protected by an internal hardware firewall
US7634529B2 (en) 1996-11-29 2009-12-15 Ellis Iii Frampton E Personal and server computers having microchips with multiple processing units and internal firewalls
US6167428A (en) * 1996-11-29 2000-12-26 Ellis; Frampton E. Personal computer microprocessor firewalls for internet distributed processing
US7926097B2 (en) 1996-11-29 2011-04-12 Ellis Iii Frampton E Computer or microchip protected from the internet by internal hardware
US6725250B1 (en) * 1996-11-29 2004-04-20 Ellis, Iii Frampton E. Global network computers
US7805756B2 (en) * 1996-11-29 2010-09-28 Frampton E Ellis Microchips with inner firewalls, faraday cages, and/or photovoltaic cells
US20050180095A1 (en) 1996-11-29 2005-08-18 Ellis Frampton E. Global network computers
US8312529B2 (en) 1996-11-29 2012-11-13 Ellis Frampton E Global network computers
US7506020B2 (en) 1996-11-29 2009-03-17 Frampton E Ellis Global network computers
US7024449B1 (en) * 1996-11-29 2006-04-04 Ellis Iii Frampton E Global network computers
DE69806812T2 (en) * 1997-12-19 2003-03-13 Unilever Nv FOOD COMPOSITION CONTAINING OLIVE OIL
US6405273B1 (en) * 1998-11-13 2002-06-11 Infineon Technologies North America Corp. Data processing device with memory coupling unit
US7529907B2 (en) * 1998-12-16 2009-05-05 Mips Technologies, Inc. Method and apparatus for improved computer load and store operations
US7779236B1 (en) * 1998-12-31 2010-08-17 Stmicroelectronics, Inc. Symbolic store-load bypass
AUPQ668500A0 (en) * 2000-04-04 2000-05-04 Canon Kabushiki Kaisha Accessing items of information
US6665768B1 (en) * 2000-10-12 2003-12-16 Chipwrights Design, Inc. Table look-up operation for SIMD processors with interleaved memory systems
US6732253B1 (en) * 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6931518B1 (en) 2000-11-28 2005-08-16 Chipwrights Design, Inc. Branching around conditional processing if states of all single instruction multiple datapaths are disabled and the computer program is non-deterministic
US6922716B2 (en) * 2001-07-13 2005-07-26 Motorola, Inc. Method and apparatus for vector processing
US7921188B2 (en) * 2001-08-16 2011-04-05 Newisys, Inc. Computer system partitioning using data transfer routing mechanism
US20100274988A1 (en) * 2002-02-04 2010-10-28 Mimar Tibet Flexible vector modes of operation for SIMD processor
TWI289789B (en) * 2002-05-24 2007-11-11 Nxp Bv A scalar/vector processor and processing system
US7155525B2 (en) * 2002-05-28 2006-12-26 Newisys, Inc. Transaction management in systems having multiple multi-processor clusters
US7103636B2 (en) * 2002-05-28 2006-09-05 Newisys, Inc. Methods and apparatus for speculative probing of a remote cluster
US7251698B2 (en) * 2002-05-28 2007-07-31 Newisys, Inc. Address space management in systems having multiple multi-processor clusters
US7281055B2 (en) * 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US6970985B2 (en) 2002-07-09 2005-11-29 Bluerisc Inc. Statically speculative memory accessing
US7793084B1 (en) 2002-07-22 2010-09-07 Mimar Tibet Efficient handling of vector high-level language conditional constructs in a SIMD processor
US7577755B2 (en) * 2002-11-19 2009-08-18 Newisys, Inc. Methods and apparatus for distributing system management signals
US7418517B2 (en) * 2003-01-30 2008-08-26 Newisys, Inc. Methods and apparatus for distributing system management signals
US7673118B2 (en) 2003-02-12 2010-03-02 Swarztrauber Paul N System and method for vector-parallel multiprocessor communication
US7386626B2 (en) * 2003-06-23 2008-06-10 Newisys, Inc. Bandwidth, framing and error detection in communications between multi-processor clusters of multi-cluster computer systems
US7577727B2 (en) * 2003-06-27 2009-08-18 Newisys, Inc. Dynamic multiple cluster system reconfiguration
US7159137B2 (en) * 2003-08-05 2007-01-02 Newisys, Inc. Synchronized communication between multi-processor clusters of multi-cluster computer systems
US7117419B2 (en) * 2003-08-05 2006-10-03 Newisys, Inc. Reliable communication between multi-processor clusters of multi-cluster computer systems
US7395347B2 (en) * 2003-08-05 2008-07-01 Newisys, Inc, Communication between and within multi-processor clusters of multi-cluster computer systems
US7103823B2 (en) 2003-08-05 2006-09-05 Newisys, Inc. Communication between multi-processor clusters of multi-cluster computer systems
US20050114850A1 (en) 2003-10-29 2005-05-26 Saurabh Chheda Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US7996671B2 (en) * 2003-11-17 2011-08-09 Bluerisc Inc. Security of program executables and microprocessors based on compiler-architecture interaction
US8607209B2 (en) 2004-02-04 2013-12-10 Bluerisc Inc. Energy-focused compiler-assisted branch prediction
US7873812B1 (en) 2004-04-05 2011-01-18 Tibet MIMAR Method and system for efficient matrix multiplication in a SIMD processor architecture
US7370170B2 (en) * 2004-04-27 2008-05-06 Nvidia Corporation Data mask as write-training feedback flag
JP2006215611A (en) * 2005-02-01 2006-08-17 Sony Corp Arithmetic unit
US7933405B2 (en) * 2005-04-08 2011-04-26 Icera Inc. Data access and permute unit
US20070106883A1 (en) * 2005-11-07 2007-05-10 Choquette Jack H Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction
US20070294181A1 (en) * 2006-05-22 2007-12-20 Saurabh Chheda Flexible digital rights management with secure snippets
US20080126766A1 (en) 2006-11-03 2008-05-29 Saurabh Chheda Securing microprocessors against information leakage and physical tampering
US20080154379A1 (en) * 2006-12-22 2008-06-26 Musculoskeletal Transplant Foundation Interbody fusion hybrid graft
US8125796B2 (en) 2007-11-21 2012-02-28 Frampton E. Ellis Devices with faraday cages and internal flexibility sipes
US9513905B2 (en) * 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8755515B1 (en) 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US9213665B2 (en) * 2008-10-28 2015-12-15 Freescale Semiconductor, Inc. Data processor for processing a decorated storage notify
US8627471B2 (en) 2008-10-28 2014-01-07 Freescale Semiconductor, Inc. Permissions checking for data processing instructions
US9672019B2 (en) * 2008-11-24 2017-06-06 Intel Corporation Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads
US10621092B2 (en) 2008-11-24 2020-04-14 Intel Corporation Merging level cache and data cache units having indicator bits related to speculative execution
US20100274972A1 (en) * 2008-11-24 2010-10-28 Boris Babayan Systems, methods, and apparatuses for parallel computing
US9189233B2 (en) 2008-11-24 2015-11-17 Intel Corporation Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads
US8793426B2 (en) * 2009-02-11 2014-07-29 Microchip Technology Incorporated Microcontroller with linear memory access in a banked memory
US8321655B2 (en) * 2009-06-13 2012-11-27 Phoenix Technologies Ltd. Execution parallelism in extensible firmware interface compliant systems
US8429735B2 (en) 2010-01-26 2013-04-23 Frampton E. Ellis Method of using one or more secure private networks to actively configure the hardware of a computer or microchip
US8589867B2 (en) 2010-06-18 2013-11-19 Microsoft Corporation Compiler-generated invocation stubs for data parallel programming model
US20110314256A1 (en) * 2010-06-18 2011-12-22 Microsoft Corporation Data Parallel Programming Model
US8688957B2 (en) 2010-12-21 2014-04-01 Intel Corporation Mechanism for conflict detection using SIMD
CN103502935B (en) * 2011-04-01 2016-10-12 英特尔公司 The friendly instruction format of vector and execution thereof
US9417855B2 (en) 2011-09-30 2016-08-16 Intel Corporation Instruction and logic to perform dynamic binary translation
CN103988173B (en) * 2011-11-25 2017-04-05 英特尔公司 For providing instruction and the logic of the conversion between mask register and general register or memorizer
KR101974483B1 (en) * 2012-12-03 2019-05-02 삼성전자주식회사 Display apparatus having pattern and method for detecting pixel position in display apparatus
US9411584B2 (en) * 2012-12-29 2016-08-09 Intel Corporation Methods, apparatus, instructions, and logic to provide vector address conflict detection functionality
US9411592B2 (en) 2012-12-29 2016-08-09 Intel Corporation Vector address conflict resolution with vector population count functionality
US9880842B2 (en) 2013-03-15 2018-01-30 Intel Corporation Using control flow data structures to direct and track instruction execution
US20140289502A1 (en) * 2013-03-19 2014-09-25 Apple Inc. Enhanced vector true/false predicate-generating instructions
US9239801B2 (en) * 2013-06-05 2016-01-19 Intel Corporation Systems and methods for preventing unauthorized stack pivoting
US9891936B2 (en) 2013-09-27 2018-02-13 Intel Corporation Method and apparatus for page-level monitoring
US9424039B2 (en) * 2014-07-09 2016-08-23 Intel Corporation Instruction for implementing vector loops of iterations having an iteration dependent condition
GB2580151B (en) * 2018-12-21 2021-02-24 Graphcore Ltd Identifying processing units in a processor
EP4002106A4 (en) * 2020-03-18 2022-11-16 NEC Corporation Information processing device and information processing method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4435765A (en) * 1980-11-21 1984-03-06 Fujitsu Limited Bank interleaved vector processor having a fixed relationship between start timing signals
US4891751A (en) * 1987-03-27 1990-01-02 Floating Point Systems, Inc. Massively parallel vector processing computer
US5006978A (en) * 1981-04-01 1991-04-09 Teradata Corporation Relational database system having a network for transmitting colliding packets and a plurality of processors each storing a disjoint portion of database
US5008882A (en) * 1987-08-17 1991-04-16 California Institute Of Technology Method and apparatus for eliminating unsuccessful tries in a search tree
US5010477A (en) * 1986-10-17 1991-04-23 Hitachi, Ltd. Method and apparatus for transferring vector data between parallel processing system with registers & logic for inter-processor data communication independents of processing operations
US5212773A (en) * 1983-05-31 1993-05-18 Thinking Machines Corporation Wormhole communications arrangement for massively parallel processor
US5230079A (en) * 1986-09-18 1993-07-20 Digital Equipment Corporation Massively parallel array processing system with processors selectively accessing memory module locations using address in microword or in address register
US5239629A (en) * 1989-12-29 1993-08-24 Supercomputer Systems Limited Partnership Dedicated centralized signaling mechanism for selectively signaling devices in a multiprocessor system
US5247613A (en) * 1990-05-08 1993-09-21 Thinking Machines Corporation Massively parallel processor including transpose arrangement for serially transmitting bits of data words stored in parallel
US5247694A (en) * 1990-06-14 1993-09-21 Thinking Machines Corporation System and method for generating communications arrangements for routing data in a massively parallel processing system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4048593A (en) * 1974-05-13 1977-09-13 Zillman Jack H Electrical component for providing integrated inductive-capacitive networks
JPS58220513A (en) * 1982-06-16 1983-12-22 Murata Mfg Co Ltd Electronic parts
US4727474A (en) * 1983-02-18 1988-02-23 Loral Corporation Staging memory for massively parallel processor
US4647130A (en) * 1985-07-30 1987-03-03 Amp Incorporated Mounting means for high durability drawer connector
US5226170A (en) * 1987-02-24 1993-07-06 Digital Equipment Corporation Interface between processor and special instruction processor in digital data processing system
US4786258A (en) * 1987-05-13 1988-11-22 Amp Incorporated Electrical connector with shunt
US5123095A (en) * 1989-01-17 1992-06-16 Ergo Computing, Inc. Integrated scalar and vector processors with vector addressing by the scalar processor
US5326272A (en) * 1990-01-30 1994-07-05 Medtronic, Inc. Low profile electrode connector
US5316486A (en) * 1990-05-29 1994-05-31 Kel Corporation Connector assembly for film circuitry
JPH0739190Y2 (en) * 1990-10-02 1995-09-06 古河電気工業株式会社 Rotating connector
US5218602A (en) * 1991-04-04 1993-06-08 Dsc Communications Corporation Interprocessor switching network
US5239748A (en) * 1992-07-24 1993-08-31 Micro Control Company Method of making high density connector for burn-in boards
WO1994003860A1 (en) * 1992-08-07 1994-02-17 Thinking Machines Corporation Massively parallel computer including auxiliary vector processor
US5334057A (en) * 1993-02-19 1994-08-02 Blackwell Larry R Connectors for electrical meter socket adapters

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4435765A (en) * 1980-11-21 1984-03-06 Fujitsu Limited Bank interleaved vector processor having a fixed relationship between start timing signals
US5006978A (en) * 1981-04-01 1991-04-09 Teradata Corporation Relational database system having a network for transmitting colliding packets and a plurality of processors each storing a disjoint portion of database
US5212773A (en) * 1983-05-31 1993-05-18 Thinking Machines Corporation Wormhole communications arrangement for massively parallel processor
US5230079A (en) * 1986-09-18 1993-07-20 Digital Equipment Corporation Massively parallel array processing system with processors selectively accessing memory module locations using address in microword or in address register
US5010477A (en) * 1986-10-17 1991-04-23 Hitachi, Ltd. Method and apparatus for transferring vector data between parallel processing system with registers & logic for inter-processor data communication independents of processing operations
US4891751A (en) * 1987-03-27 1990-01-02 Floating Point Systems, Inc. Massively parallel vector processing computer
US5008882A (en) * 1987-08-17 1991-04-16 California Institute Of Technology Method and apparatus for eliminating unsuccessful tries in a search tree
US5239629A (en) * 1989-12-29 1993-08-24 Supercomputer Systems Limited Partnership Dedicated centralized signaling mechanism for selectively signaling devices in a multiprocessor system
US5247613A (en) * 1990-05-08 1993-09-21 Thinking Machines Corporation Massively parallel processor including transpose arrangement for serially transmitting bits of data words stored in parallel
US5247694A (en) * 1990-06-14 1993-09-21 Thinking Machines Corporation System and method for generating communications arrangements for routing data in a massively parallel processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TSEUNG et al., "Guaranteed, Reliable, Secure, Broadcast Networks", IEEE, 05/1990, pages 576-583. *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0734139A2 (en) * 1995-03-22 1996-09-25 Nec Corporation A data transfer device with cluster control
EP0734139A3 (en) * 1995-03-22 2001-03-14 Nec Corporation A data transfer device with cluster control
US20220197993A1 (en) * 2022-03-11 2022-06-23 Intel Corporation Compartment isolation for load store forwarding

Also Published As

Publication number Publication date
AU4804493A (en) 1994-03-03
US5872987A (en) 1999-02-16
US6219775B1 (en) 2001-04-17

Similar Documents

Publication Publication Date Title
WO1994003860A1 (en) Massively parallel computer including auxiliary vector processor
US5056000A (en) Synchronized parallel processing with shared memory
Kuehn et al. The Horizon supercomputing system: architecture and software
CA1176757A (en) Data processing system for parallel processings
US6581152B2 (en) Methods and apparatus for instruction addressing in indirect VLIW processors
US5758176A (en) Method and system for providing a single-instruction, multiple-data execution unit for performing single-instruction, multiple-data operations within a superscalar data processing system
JP2647315B2 (en) Arrays that dynamically process in multiple modes in parallel
US5513366A (en) Method and system for dynamically reconfiguring a register file in a vector processor
EP0623875B1 (en) Multi-processor computer system having process-independent communication register addressing
US6088783A (en) DPS having a plurality of like processors controlled in parallel by an instruction word, and a control processor also controlled by the instruction word
US5293500A (en) Parallel processing method and apparatus
US5822606A (en) DSP having a plurality of like processors controlled in parallel by an instruction word, and a control processor also controlled by the instruction word
JP2519226B2 (en) Processor
US5423009A (en) Dynamic sizing bus controller that allows unrestricted byte enable patterns
US3573851A (en) Memory buffer for vector streaming
US5418970A (en) Parallel processing system with processor array with processing elements addressing associated memories using host supplied address value and base register content
US3943494A (en) Distributed execution processor
US5689677A (en) Circuit for enhancing performance of a computer for personal use
US5165038A (en) Global registers for a multiprocessor system
US5960209A (en) Scaleable digital signal processor with parallel architecture
US4812972A (en) Microcode computer having dispatch and main control stores for storing the first and the remaining microinstructions of machine instructions
EP0295646B1 (en) Arithmetic operation processing apparatus of the parallel processing type and compiler which is used in this apparatus
JPH04336378A (en) Information processor
US6327648B1 (en) Multiprocessor system for digital signal processing
Vick et al. Adptable Architectures for Supersystems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase