WO1997045795A1 - Parallel processor with redundancy of processor pairs - Google Patents

Parallel processor with redundancy of processor pairs Download PDF

Info

Publication number
WO1997045795A1
WO1997045795A1 PCT/IT1997/000121 IT9700121W WO9745795A1 WO 1997045795 A1 WO1997045795 A1 WO 1997045795A1 IT 9700121 W IT9700121 W IT 9700121W WO 9745795 A1 WO9745795 A1 WO 9745795A1
Authority
WO
WIPO (PCT)
Prior art keywords
bus
processor
control
pair
migration
Prior art date
Application number
PCT/IT1997/000121
Other languages
French (fr)
Inventor
Antonio Esposito
Rosario Esposito
Original Assignee
Antonio Esposito
Rosario Esposito
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Antonio Esposito, Rosario Esposito filed Critical Antonio Esposito
Priority to US09/194,459 priority Critical patent/US6363453B1/en
Priority to AU30471/97A priority patent/AU714681B2/en
Priority to EP97925270A priority patent/EP0901659B1/en
Priority to DE69701802T priority patent/DE69701802T2/en
Priority to CA002255634A priority patent/CA2255634C/en
Priority to JP09541974A priority patent/JP2000511309A/en
Publication of WO1997045795A1 publication Critical patent/WO1997045795A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8023Two dimensional arrays, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component

Definitions

  • This invention relates to a general purpose electronic numeric parallel computer with multiple processors, MIMD (Multiple Instruction stream Multiple Data stream) in the Flynn's classification model, latency reduction oriented, and relates also to its composing processors. Replication of regularly interconnected and cooperating processing elements can improve performances, reliability and costs of computers .
  • MIMD Multiple Instruction stream Multiple Data stream
  • MIMD with multiple processors also called MULTI
  • MULTI consists of a collection of processors that, through an interconnection structure, either share a global memory or only communicate without memory sharing.
  • the former are also called multiprocessors, the latter multicomputers.
  • Utilized interconnection structures determine topology, node or connection degree, and several performance characteristics of MULTI. Any direct connection among nodes also requires a node interface. If the node degree increases with growing of nodes, the interconnection cost rapidly prevails on the machine cost. With current technologies, the node degree must be necessarily hold low, even if this increases probability of congestions/conflicts, it makes incostant the communication latency, and performances depending on the spatial-temporal distribution of the traffic and on the application.
  • MULTI usually employ normal microprocessors available on the market and also used in SISD machine Or they employ devoted processors wrth special mechanisms for fast interrupt handling, fast context switching, with several register banks or windows, or mey integrate communication/routing message interfaces and units
  • processors are equipped widi full and autonomous fetch/execution capability, and configured as memory bus master, mat once activated continuously fetch and execute instructions, but normally they do not allow accessing to their own internal registers from outside, except for debug purposes
  • Computing nodes in multicomputers usually are multiprocessors with application processor and one or more communication and/or switching/routing processors, to overlap communication time with processing time, even if that increases the parallelism cost.
  • Aim of the invention is to find an optimal combination of processor replication and inter ⁇ connection, as well as modalities of process execution/cooperation, and to devise die appropriate structural and functional processor modifications, in way to realize a parallel processor or MULTI, witiiout said inconveniences, having an optimized and very performing interconnection structure to allow an efficient communication and synchronization among parallel processes, to reduce easily single task execution and completion time (latency).
  • the posed technical problem is large and hard one for the high number of possible choices at physical and logical level, concerning several aspects of die parallelism, investigated for long time but difficult to understand and to resolve.
  • the found solution contained in Claim 1, consists in the direct pairing between processors of separate memory buses, in way diat two tightly coupled processors can reciprocally synchronize themselves and share the internal register files, for allowing an easy communication and synchronization between die two adjacent parallel processes of die pair, and in adopting die process migration among redundantly replicated processors on die same memory bus, to allow each process to communicate/synchronize itself with several adjacent parallel processes.
  • the pairing is accomplished dirough mutual extension of internal buses from the one processor to die other one's functional units.
  • So die single processor also becomes a pair communication unit, normally connected to memory and peripherals, but mainly connected to another processor. More processors are connected on the same memory bus, for accessing equally radier men concurrently to die same instructions and data in die shared memory.
  • Each memory bus is managed as single master bus, wherein processors cooperate to die execution of a single sequential migrating process.
  • processors also share a process migration structure diat allows to transfer process control and "context" contained within state registers, from one processor to anotiier one of die bus. Thus the run-time process migration among processors is achieved easily preserving identity and continuity of each process.
  • Processors are modified to eliminate concurrent access conflicts to die shared memory They are formed to be, on die memory bus, erther master-active like a traditional processor, erther slave-inactive like a pe ⁇ pheral which does not perform processing activity, but diat allows accessing and loading of its internal registers by die outside
  • a slave processor remains inactive an indefinite time, awaiting to receive control and to resume processing activity starting upon received context
  • Processors of die same bus are individually paired wrdi a processor belonging to a separate memory bus in way to form pairs between distinct memory buses
  • the outcome processor architecture offers new instructions in two categories
  • the parallelism comes out die plurality of processes which simultaneously run on as many memory buses, and migrate on tiieir own bus among paired processors to communicate /interact and synchronize diemselves
  • a multicomputer/multiprocessor formed in accordance witii die mvention has many advantages Congestions and access conflicts to global shared resources, which become “bottlenecks" as parallelism increases, are eliminated entirely
  • Parallel processes communicate dirough high performance local devoted buses which do not require interface controllers and input/output operations
  • the communication among processors is based on local registers sharing, normally and efficiently achieved by hardware, and easily controlled by special pair communication and process migration instructions
  • the dual access to die processor's registers by botii die units of a pair allows a va ⁇ able time interaction among parallel adjacent processes, and also to program die synchronization points
  • Processors have direct access to die sequence control of die adjacent processors, and dus allows die programmer to control mutual proceedings and time relations among all adjacent processes in parallel
  • Communication time between adjacent processes is mainly influenced by die process migration operation mat requires a definite constant tune Therefore, owing to the lack of conflicts/congestions also, communication latency among adjacent processes is constant and can be on average lower tiien that in a multiprocessor Additional devices to mask communication latency or to overlap communication and computation tunes, are no more needed Synchronization is possible witiiout global traffic generation and even wrthout explicit communication It is possible to program
  • connection degree is given by the number of processors per bus, that is only limited by physical parameters which constrain the bus and processor dimensions, but bus bandwiddi no more constitutes the main obstacle to the numenc growtii of processors attachable to it
  • the degree of parallelism matches die number of memory buses, and it can be freely augmented, witii proportional increment of the total power, independently by the single processor power.
  • the reducible dimensions of the pairing connections and opportunities offered by the microelectronic (VLSI) technology allow to design and to build a single logical biprocessor unit, whose integration leads to further advantages in terms of modularity, resource sharing and different part numbers.
  • the invention collects about all the advantages, but the disadvantages, of current MULTI in both category, with low, medium and high degree of parallelism.
  • the realized process migration constitutes a "context/process switching" wherem more processors share the single process diat controls the switching There aren't formatted data packets,nor maximum time interval exists witiiin which a processor will surely receive control No computer has never adopted a processor relay executing for a single sequential process
  • the expensive functional inefficiency given by the redundancy of inactive processors is only justifiable by die rake-off gained witii parallelism
  • die processors aren't neither standard nor all simultaneously act ⁇ ve-masters,and they don't compete for resources and don't engage conflicts in casual and asynchronous way Migrations take place tidily under die software control
  • Figure 1 shows a partial block scheme of a computer realized accordmg to the mvention, in a regular "matnx" topology, reporting memory modules M, memory buses C-BUS, processors SPU, pair interconnection structures P-P, migration structures A-S, and biprocessors DPU
  • Figure 2 shows the smallest configuration of multicomputer realized accordmg the invention, capable to execute in parallel only two processes which cannot migrate
  • Figure 3 shows a regular "linear chain” topology, connectable at the ends as a ring, in which are also reported dual port memo ⁇ es or communication memo ⁇ es (2-fifo) DPM, connectable like biprocessors between bus C-BUS pairs
  • Figures 4,5,6 show other processing nets R-P with regular topology, of multicomputers fulfilled accordmg die invention, wherein has been leaved out the distinction of the structures A-S and P-P and the representation of the others processing units (M,IU,DPM), which are supposed to be connected, on each bus C-BUS, according to die already descnbed c ⁇ te ⁇ on and depicted by figures 1, 2 and 3
  • Figure 7 shows the main external connections, shared A-S, C-BUS, and p ⁇ vate P-P, of a processor SPU pair forming a biprocessor DPU
  • Figure 8 shows die simplified block diagram of die main functional units of botii die two processors SPU of a pair, fulfilled and connected in accordance with the mvention
  • Figure 9 shows a block scheme of the migration structure A-S of a smgle bus C-BUS, compnsmg processor selection means P-SEL, a migration bus MIG-B, and interconnection structure interfaces S and A dis ⁇ ubbed in the pair procesors SPU
  • Figure 10 shows the existence of address decoders ADEC witiiin die selection means P-SEL of die biprocessors DPU on one same bus C-BUS •
  • Figure 11 shows with greater detail die mtemal connections of die interfaces A and S of die migration structure A-S, and presence of arbitrating means A-R between die access control ports MC, SC to processor register file REG-F
  • FIG. 12 shows eventual circuit switchmg means M-X interposed on some internal pair buses, in accordance witii the mvention •
  • Figure 13 shows said switchmg means M-X, composed of circuit switchmg elements IXI of 2x2 type, arranged accordmg to die mvention, and details of data connections between registers REG-F and execution units EX-U compnsmg antiimetic and logic unit ALU
  • Figure 14 shows the connection asset among the functional units of the two SPU processors in pair, when said switchmg means M-X are "cross switched"
  • Figure 15 shows a scheme of die command and synchronization circuit XAND of die switchmg blocks DQ composing said switchmg means M-X
  • FIG. 16 shows the biprocessor DPU containmg only one timing circuitry TGEN-U for both processors SPU of the pair
  • the mam memory M is dist ⁇ aded among die buses C-BUS, each one wrth its separate address space, giving ⁇ se to multiple p ⁇ vate address spaces
  • the memory M eventually modular and hierarchical (caching) Ll/2, is shared by the attached processors SPU and locally accessible in a smgle shared address space
  • the whole interconnection structure is fractured in several different structures local shared structures, A-S and C-BUS, allowing process migration, pnvate structure P-P allowing communication and synchronization of parallel processes
  • Each processor SPU is connected only to one memory bus C-BUS, to one migration structure A-S, and to one pair interconnection P-P Input/output controllers IU can be connected to each bus C-BUS and/or directly to each memory module M, m die usual ways
  • Two processors SPU of a pair DPU are almost independent each other, and when the one is inactive-slave, the other one can continue processing
  • Two parallel processes are "adjacent" if they can directly mteract within a pair through a pair interconnection P-P
  • any two adjacent processes must migrate on their bus C-BUS to the processor SPU which puts them in communication
  • they can communicate and synchronize themselves by usmg said pair communication instructions
  • the network of all processors SPU interconnected by memory buses C-BUS, by migration A-S and pair interconnection P-P structures forms a smgle processmg organ R-P, or parallel processor, able to process simultaneously an independent instruction and data stream on each memory bus C-BUS, and capable to synchronize die parallel processes by executing devoted instructions opportunely programmed within processes themselves
  • DECO-U and connected to said bus unit BU by data buses DATA-MB and someone also by address buses ADDR, for executing as usual, at least the anthmetic and logic operations, the memory data loading and storing operations, the address generation and branch operations or also the floating point ones,etc , and sending die computed addresses and data to die registers REG-F or to the bus unit BU, and the conditions/ exceptions to the control unit DECO-U,
  • a decode and control unit DECO-U asking/receiving via an mput instruction bus IST-B a smgle sequential instruction stream, and decoding and controlling the instruction executions and sending die immediate data to the executive units EX-U, and controlling the data flow to/from the registers bank REG -F and also coordinating the others functional units
  • the processor SPU can also compnse registers which do not belong to die state, eventual buffers, data and/or instructions caches, and eventual prefetch and branch prediction units, memory management units (MMU.TLB), interconnected and operating in die usual ways
  • the registers bank REG-F usually has two output and one mput data ports, but it can also have a greater or lower number of data ports
  • the bank's Program Counter register can be directly connected even to the prefetch unit
  • the bus unit BU compnses at least the bus transferring control unit BC and die external bus C-BUS d ⁇ vers Usually it also includes other units as bus C-BUS arbitrating units, interrupts reception/control unit, external/internal cache's control units, etc
  • the bus unit BU is designed to be either master or slave of the memory bus C-BUS It includes means for interfacing die migration structure A-S to die processor's internal registers REG-F and control unit It is connected internally by a selection /control bus LDR-B to die near register bank REG-F for context registers REG-F loading from die outside, and it is connected to the control unit DECO-U by a further control transfemng bus R/R-B for signaling the release/resume of processmg activity
  • the bus unit BU arranges itself as master or as slave accordmg to die active or mactive state of its control unrt DECO-U In slave mode, beyond the external access/loading of internal registers REG-F, upon receivmg of a control migration cntenon from the migration structure A-S, it allows with the control transferring bus R/R-B to wake up the decode and control unit
  • the control unit DECO-U suspend any host processor activity and sets the connected bus unit BU m slave mode, upon receivmg the special slave reset signal RESET-P by the receivmg reset unit RST
  • the decode unit DECO-U doesn't process instructions, but allows registers access by the outside bus C-BUS and by die pair unit
  • the mactive control unit DECO-U can restart only if it receives the wake up cntenon by the control transfemng bus R/R-B, or the special master reset signal RESET-A by the reset receivmg unit RST
  • it assumes the registers REG-F current content as the transferred process context, and resumes processmg starting on such state
  • it initializes the registers REG-F and the other processor's functional units and resumes processmg on a defined state
  • the decode and control unit DECO-U of each pair processor SPU is directly and reciprocally connected to die ether decode/control pair unit by several control/synchronism signals SINC-B
  • the decode/control unit DECO-U is also connected wrth a further selection/control bus XREG-B to the registers bank REG-F of the pair unit, m a way that rt can access them concurrently wrth that, and it can control die transferring to/from its own executive units EX-U and its own registers REG-F
  • the decode and control unit DECO-U is also conceived to remain mactive witiiout instructions processmg, signalling its state with at least one signal IDLE, to any other processor's functional units, and holdmg them suspended
  • the register bank REG-F is a dual access one, havmg a further and dual data SD
  • the processmg activity mside die active-master processor SPU take place normally
  • the instructions fetched from program memory M of die external bus C-BUS flow inside die processor through the interface unrt BU and eventual caches and buffers until to arrive via an mstruction bus IST-B to die decode unit DECO-U mput Thanks to said exchanged synchronism signals SINC-B their execution can be synchronized in unison with the mstruction stream m the pair unit
  • the external data stream flows m across the bus unit BU and via the data multi bus DATA-MB to/from the executive units EX-U and die registers banks REG-F of both pair processors
  • the executive units EX-U can dialogue via said data muki bus DATA-MB with both registers banks REG-F
  • the control unrt DECO-U controls only the executive units EX-U located within the same host processor SPU, and manages as executmg process context the registers REG-F connected wrth die own register control bus RE
  • the most seemly way to transfer a process still consists m utilizing the shared bus C-BUS and m connecting to it processors selection means P-SEL, realized m die usual way to allow a normal CPU to select memory components or penpherals
  • the selection means P-SEL shared with the bus C-BUS, reduce the communication by diffusion to a pomt-to-pomt dialogue between die active unit and die selected one only Driving die address and die control buses of the out shared bus C-BUS, master processor can select any one else, including itself
  • a process context, or the state vanables set that allows process suspension, transferring and resuming with continuity, is entirely contained within state registers REG-F of the processor that executes rt
  • This context can be transmitted on the shared C-BUS data bus DATA-B
  • the selected mactive slave processor receives such context directly m its state registers REG-F, or m other interface BU register file transferable to state registers REG-F at the wake-up tune
  • the control migration can be associated to a particular context vanable, for example one bit of the Status Register SR
  • the migration structure A-S compnses communication means devoted to the explicit control transferring
  • Such means can be implemented in several ways, but a suitable one for modulanty is tiiat they be of decentralized type, composed with a plurality of identical control communication cells A, singularly dist ⁇ aded wrthin each bus processor SPU interface BU, and connected by the mput signal SELT to the aforesaid selection means P-SEL and via control transfemng bus R/R-B to die host control unit DECO-U, and all interconnected by a shared devoted migration bus MIG-B for exchanging among themselves, under the bus migrant process's control, c ⁇ tenons or messages concerning control migration, and each able to communicate with its control unrt DECO-U dunng the control releasing/acquiring phases
  • said migration bus MIG-B with bidirectional level signals, compnsmg at least • one signal to send/receive die control migration cntenon,
  • the decode/control unit DECO-U Upon execution of one control migration mstruction SIDE, ⁇ dest>, the decode/control unit DECO-U sets the addressmg to select on die external bus C-BUS die processor SPU identified by the mstruction operand ⁇ dest>, and it issues on the control transferring bus R/R-B a release cntenon to its cell A, that m turn sends on the migration bus MIG-B the migration cntenon, then the decode unrt DECO-U controls die defined release cycle, at die end of which it arranges wrth a signal IDLE the bus unit BU in the slave mode and die cell A in receivmg, and releases the shared bus C-BUS control to the new selected processor, suspending indefinitely its own process activity awaiting to be waked up On the opposrte, the selected unit's cell A connects itself in receivmg, and on the amval of the control migration cntenon from the migration bus MIG-B
  • the register bank REG-F m cludes at least one special register for containing dynamically the current executing mstruction address
  • Such register can be die Program Counter, provided that it is not incremented at fetch time, but at the end of the mstruction of which it holds die address Its content is transferred to die other requestmg pair unit wherem it is compared for the wanted address to identify die corresponding mstruction and to recognize the synchronism condition
  • process explicit synchronization mstructions which, wrthin the operands, refer said special or Program Counter register of the pair unit, and specify exactly one mstruction instance of the adjacent process, through the (relocatable) memory M address taken on the instance
  • the control unit DECO-U accesses with said pair selection/control bus XREG-B to said modified Program Counter register m the pair unit, and controls the register content transfemng via data bus DATA-MB m one its own executive unrt, wherem rt does a companson check between die captured address and the address specified within the mstruction operand itself If equality is satisfied die mstruction terminates, otherwise the executmg unit's program counter register is update (not incremented) so that the same mstruction is soon executed agam on next cycle, or it stalls This mstruction finishes to repeat itself or to stall, only when its execution occurs in parallel with the given address mstruction in the pair unit
  • the wasted time (busy wartmg) for repeating executions represents the synchronization cost, which can be optimized (wrth task switchmg) by the Operative System
  • the mvention allows also "process implicit synchronization" mstructions that work analogously, but they
  • selection means P-SEL presumes a logic scheme of processors addressmg, achievable m several ways
  • the most simple scheme consists in to assign to each processor SPU, unequivocally among those attached on the same bus C-BUS, a Processor Identification Number or NIP, usable also as operand m the migration mstructions Since to address the process migration correspond to route the communication among parallel processes, to avoid indirect address handling at any computing level, it's necessary to identify the most suitable logic scheme of processors addressmg So m the mvention the memory buses C-BUS also are logically and unequivocally numbered with a Bus Identification Number or NIB The assignment is accomplished in way that each processor had a NIP correspondmg to the NIB of its pair processor's bus, and no processor had a NIP correspondmg to its own bus's NIB With this scheme, by addressmg a processor, a migrant process directly address itself to die correspondmg adjacent process's bus On die several buses C-BUS die addressmg schemes are
  • the decode and control unrt DECO-U is also connected reciprocally to die pair control unrt,at least with one signal IDLE,to communicate the operative state active-master or inactive-slave respect to its own migrant process, which is eventually reported in a field of die Status Register SR, in way diat the one can know the other one's activity state
  • the interrupt structures relevant to each bus C-BUS can be interconnected in pairs with local p ⁇ vate structures, following the processors's pairing scheme
  • An interrupt communication unit, working and accessible as any genenc interrupting device, is normally connected to the interrupt structure of the one bus C-BUS, but characteristically connected with pomt-to-pomt buses to a correspondmg identical unit belonging to the interrupt structure of the other one's bus C-BUS, m way to transfer interrupt messages between the paired bus C-BUS
  • One such interrupt communication unit can be integrated witiiin the pair processor SPU, by connecting it internally wrth a complete data-address-control bus to the respective bus unit BU, or to the memory management unit (MMU) if any, and making it accessible by the external C-BUS.
  • MMU memory management unit
  • the bus C-BUS private memories M can also be shared by all parallel adjacent processes in no concurrent way.
  • die two processors SPU of a pair exchange reciprocally the fetched instruction stream and the accompanying context.
  • circuit switching means M-X for reconfiguring the pairing asset.
  • such means M-X may switch in a way that instantaneously each unit could receive instructions fetched by the other one and could see as own context the pair unit's register file REG-F.
  • switching elements M-X are interposed only on the bus pairs REGS-B,XREG-B connecting each decode and control unit DECO-U to both register banks REG-F, and on die two buses IST-B allowing the instruction flow input to the decode units DECO-U.
  • the switching elements need only two switching states or functions: "straight”, wherein the switching means M-X are transparent and do not modify bus connections seen so far; and "cross”, wherein the first input part and the second output part of each said bus pairs results inversely connected to one another within the pair.
  • each control unrt DECO-U is directly connected to said switching means M-X with at least one output signal X-S and one input signal C.
  • Control mechanism is the same for both switching states.
  • the decode and control unit DECO-U commands witii the output signal X-S the switching state required by die boolean operand ⁇ b>, then checks the operand wrth the input signal C that carries back the means's M-X switching state. If they coincide (i.e. switching done), the instruction ends, after eventual exchange of some register; otherwise it stalls or the Program Counter register is adjusted (decreased or not increased) in such way that the instruction will be executed again on next cycle.
  • the processor SPU stalls in busy-waiting state by executing again indefinitely die switching instruction XEX until die switching of said means M-X occur when the pair unit also requires the same switching state. At switching time it takes place also the two processes synchronization. Dunng the "cross" switching state, mstructions fetched from one bus C-BUS directly act on vanables located in the p ⁇ vate memory M of the other one adjacent bus, so that data streams of the two running processes result switched on die memory M buses C-BUS of correspondmg mstruction streams Within this asset the two mvolved processes cannot migrate and any execution of migration mstructions causes an exception handling
  • switchmg involves twin bus pairs, it is natural to use switchmg blocks 1X1 made up of the usual 2X2 type elements, havmg two inputs and two outputs controlled by an mput control port CP,and capable to assume,between mputs and outputs, one of said switchmg states (straight or cross) in accordance with the logic value put on the control port CP
  • At least three switchmg blocks DO are needed, one for each said bus pair, havmg all control ports CP interconnected, m way to be able to switch simultaneously as reaction to one smgle control signal C
  • Eventual processor's SPU status register SR besides the usual informations concemmg hos processor state, can contain further fields concemmg die migration activity, the own and pair processor synchronism mode, the switchmg means M-X mode, etc
  • the mvention allows industrial scale production of the same economic microelectronic (VLSI) component that integrates both processors SPU of a pair DPU and their interconnections P-P, eventually including also said switchmg means M-X, a dual port or communication memory DPM and a pair of interrupt communication units, in a smgle biprocessor unit, usable on its turn for manufacturing parallel and compatible computers m a large range of configurations, powers and costs, addressed toward different market segments, from personal to supercomputers
  • VLSI microelectronic

Abstract

General purpose parallel computer, latency reduction MIMD, with multiple processors and multiple memory address spaces, wherein processors (SPU) are redundantly replicated on each memory (M) bus (C-BUS) and, formed/connected as either master-active or slave-inactive of the bus and to interface a suitable communication structure (A-S) for transferring among themselves the process context and the bus control, in such a way to execute in turn a unique migrant sequential process per bus (C-BUS), and wherein each processor is also directly and tightly coupled with devoted private buses (P-P) to one corresponding processor of another one bus (C-BUS) in a way to form, between distinct buses (C-BUS), biprocessor pairs (DPU) capable of allowing communication and synchronization of the parallel migrant processes.

Description

DESCRIPTION
Parallel Processor with Redundancy of Processor Pairs
TECHNICAL FIELD
This invention relates to a general purpose electronic numeric parallel computer with multiple processors, MIMD (Multiple Instruction stream Multiple Data stream) in the Flynn's classification model, latency reduction oriented, and relates also to its composing processors. Replication of regularly interconnected and cooperating processing elements can improve performances, reliability and costs of computers .
MIMD with multiple processors, also called MULTI, consists of a collection of processors that, through an interconnection structure, either share a global memory or only communicate without memory sharing. The former are also called multiprocessors, the latter multicomputers.
BACKGROUND ART
Beyond advantages, current MULTI still have inconveniences and disadvantages. To communicate among parallel processes, multiprocessors adopt the same processor-memory communication mechanism, which results flexible and suitable for whatever computation, but the shared memory becomes a "bottleneck" as the number of approaching processors increases. Complexity and cost of die muki-ported memory can be bounded only at the expense of increased memory latency, or reducing memory traffic by using local cache memories, which introduce again complexity and costs to menage their coherency protocols. Within multicomputers each processor has a private memory that allows less latency and more scalability, but existing communication mechanisms do not allow efficient communication among parallel processes. The communication of a message requires an input/output operation. Even associating messages with high priority interrupts, their average latency remains greater then shared memory accesses.
Utilized interconnection structures determine topology, node or connection degree, and several performance characteristics of MULTI. Any direct connection among nodes also requires a node interface. If the node degree increases with growing of nodes, the interconnection cost rapidly prevails on the machine cost. With current technologies, the node degree must be necessarily hold low, even if this increases probability of congestions/conflicts, it makes incostant the communication latency, and performances depending on the spatial-temporal distribution of the traffic and on the application. To have flexible and accessible networks at acceptable costs, optimal topologies are used as well as switching and message combining elements, buffers, routing and flow control technics, all of which make the current interconnection structures hard to realize, and still too expensive and inefficient by the performance point of view The degree of parallelism matches the number of processors, but the total computing power also depends upon the power of the single processors Actual realizations have constraints by which these two power's factors are not independent The parallel processes communicate on globally shared resources with limited capacity, and this creates congestions and/or access conflicts which degrade die expected performances either with growing of the processor number either with growing of the single processor's power Within MULTI the difficulty to synchronize the parallel processes strongly reduces the number of applications that can take advantage of a parallel execution Problems do not reside in distributing a common ιso-frequentιal tuning signal to all processors, as ordinarily it is done within SIMD too, but mainly in the impossibility to predict the exact execution time of a process Each processor has its own autonomous sequence control, and as time passing, parallel processes become timely unrelated one another, in a way that is not controllable by the programmer Synchronization is achieved indirectly through communication Current methods are based on message passing in the multicomputers and on access control to memory shared variables within multiprocessors These operations, realized mostly at software level with many instructions of the ordinary repertoire and few specialized instructions (test&set, fetch&add, etc ), they still result too slow, penalizing the communication time Moreover they generate messages that increase the traffic congestion Therefore most of MULTI realized so far are unsuitable for synchronizing a large number of small processes, and for reducing strongly the execution tune (latency) of a single task
Within MULTI it also exists the load balancing problem that amis to optimize resource utilization by uniformly distributing the load among processors Mιgratιon,or movement of allocation to resources after the initial decision, has been taken into account as a solution to die dynamic load balancing problem, though it has been noticed its validity also for reducing the network load, making the communication partners closer Wrth multicomputers the process migration is more burden because it also requires to copy memory, therefore the migration of simpler entities is used Convenience of run-time migration is doubtful because die transferring overload is hardly balanced by performance increments, tiierefore process migration from processor to processor is seldom used in highly parallel computers
MULTI usually employ normal microprocessors available on the market and also used in SISD machine Or they employ devoted processors wrth special mechanisms for fast interrupt handling, fast context switching, with several register banks or windows, or mey integrate communication/routing message interfaces and units However utilized processors are equipped widi full and autonomous fetch/execution capability, and configured as memory bus master, mat once activated continuously fetch and execute instructions, but normally they do not allow accessing to their own internal registers from outside, except for debug purposes Computing nodes in multicomputers usually are multiprocessors with application processor and one or more communication and/or switching/routing processors, to overlap communication time with processing time, even if that increases the parallelism cost.
Aim of the invention is to find an optimal combination of processor replication and inter¬ connection, as well as modalities of process execution/cooperation, and to devise die appropriate structural and functional processor modifications, in way to realize a parallel processor or MULTI, witiiout said inconveniences, having an optimized and very performing interconnection structure to allow an efficient communication and synchronization among parallel processes, to reduce easily single task execution and completion time (latency). The posed technical problem is large and hard one for the high number of possible choices at physical and logical level, concerning several aspects of die parallelism, investigated for long time but difficult to understand and to resolve.
DISCLOSURE OF INVENTION
The found solution, contained in Claim 1, consists in the direct pairing between processors of separate memory buses, in way diat two tightly coupled processors can reciprocally synchronize themselves and share the internal register files, for allowing an easy communication and synchronization between die two adjacent parallel processes of die pair, and in adopting die process migration among redundantly replicated processors on die same memory bus, to allow each process to communicate/synchronize itself with several adjacent parallel processes.
The pairing is accomplished dirough mutual extension of internal buses from the one processor to die other one's functional units. So die single processor also becomes a pair communication unit, normally connected to memory and peripherals, but mainly connected to another processor. More processors are connected on the same memory bus, for accessing equally radier men concurrently to die same instructions and data in die shared memory. Each memory bus is managed as single master bus, wherein processors cooperate to die execution of a single sequential migrating process. Beyond die memory bus, processors also share a process migration structure diat allows to transfer process control and "context" contained within state registers, from one processor to anotiier one of die bus. Thus the run-time process migration among processors is achieved easily preserving identity and continuity of each process. Processors are modified to eliminate concurrent access conflicts to die shared memory They are formed to be, on die memory bus, erther master-active like a traditional processor, erther slave-inactive like a peπpheral which does not perform processing activity, but diat allows accessing and loading of its internal registers by die outside A slave processor remains inactive an indefinite time, awaiting to receive control and to resume processing activity starting upon received context Processors of die same bus are individually paired wrdi a processor belonging to a separate memory bus in way to form pairs between distinct memory buses The outcome processor architecture offers new instructions in two categories
• migration or intra-bus communication instructions, for handling die (sequential) interaction among processors on die same memory bus and allowing die run-tune process migration,
• pair communication or inter-bus communication instructions, for handling die (parallel) interaction wrthin die pair, and allowing communication and synchronization among parallel processes
The parallelism comes out die plurality of processes which simultaneously run on as many memory buses, and migrate on tiieir own bus among paired processors to communicate /interact and synchronize diemselves
A multicomputer/multiprocessor formed in accordance witii die mvention has many advantages Congestions and access conflicts to global shared resources, which become "bottlenecks" as parallelism increases, are eliminated entirely
Parallel processes communicate dirough high performance local devoted buses which do not require interface controllers and input/output operations The communication among processors is based on local registers sharing, normally and efficiently achieved by hardware, and easily controlled by special pair communication and process migration instructions The dual access to die processor's registers by botii die units of a pair, allows a vaπable time interaction among parallel adjacent processes, and also to program die synchronization points Processors have direct access to die sequence control of die adjacent processors, and dus allows die programmer to control mutual proceedings and time relations among all adjacent processes in parallel Communication time between adjacent processes is mainly influenced by die process migration operation mat requires a definite constant tune Therefore, owing to the lack of conflicts/congestions also, communication latency among adjacent processes is constant and can be on average lower tiien that in a multiprocessor Additional devices to mask communication latency or to overlap communication and computation tunes, are no more needed Synchronization is possible witiiout global traffic generation and even wrthout explicit communication It is possible to program synchronizing barπers and to achieve, wrthin die short execution time of some new specialized instructions, die explicit synchronization of many small processes witii dimensions of few instructions, preserving also die asynchronous and efficient process execution and die others implicit synchronization modalities Thanks to tiiese capabilities tiiey can efficiently execute parallel and even synchronous algorithms. The interconnection structure, composed by inexpensive, low latency, wide bandwidth buses of ordinary realization, results optimized in complexity, costs and performances. It doesn't force topology and machine connection degree, on die contrary it allows to obtain different topologies witii high connection degree, witiiout to require switching or buffering components. Within regular machines, the connection degree is given by the number of processors per bus, that is only limited by physical parameters which constrain the bus and processor dimensions, but bus bandwiddi no more constitutes the main obstacle to the numenc growtii of processors attachable to it
The degree of parallelism matches die number of memory buses, and it can be freely augmented, witii proportional increment of the total power, independently by the single processor power. The reducible dimensions of the pairing connections and opportunities offered by the microelectronic (VLSI) technology allow to design and to build a single logical biprocessor unit, whose integration leads to further advantages in terms of modularity, resource sharing and different part numbers In summary, the invention collects about all the advantages, but the disadvantages, of current MULTI in both category, with low, medium and high degree of parallelism.
The realized process migration constitutes a "context/process switching" wherem more processors share the single process diat controls the switching There aren't formatted data packets,nor maximum time interval exists witiiin which a processor will surely receive control No computer has never adopted a processor relay executing for a single sequential process The expensive functional inefficiency given by the redundancy of inactive processors is only justifiable by die rake-off gained witii parallelism On each memory bus the situation is only structurally similar to that of shared bus multiprocessors, but functionally very different Besides die pair connections, in die invention die processors aren't neither standard nor all simultaneously actιve-masters,and they don't compete for resources and don't engage conflicts in casual and asynchronous way Migrations take place tidily under die software control
BRIEF DESCRIPTION OF DRAWINGS
Further characteristics of the mvention will result by the following descπption, accompanied witii the attached drawings, relative to no restrictive examples of topologies and of realizations, depicted in the following list of figures, wherein any drawn connection line represents one or more buses, and die shown direction is the prevailing one, but it doesnt exclude bidirectionaltty and/or existence of connections with opposite direction
• Figure 1 shows a partial block scheme of a computer realized accordmg to the mvention, in a regular "matnx" topology, reporting memory modules M, memory buses C-BUS, processors SPU, pair interconnection structures P-P, migration structures A-S, and biprocessors DPU
• Figure 2 shows the smallest configuration of multicomputer realized accordmg the invention, capable to execute in parallel only two processes which cannot migrate
• Figure 3 shows a regular "linear chain" topology, connectable at the ends as a ring, in which are also reported dual port memoπes or communication memoπes (2-fifo) DPM, connectable like biprocessors between bus C-BUS pairs
• Figures 4,5,6 show other processing nets R-P with regular topology, of multicomputers fulfilled accordmg die invention, wherein has been leaved out the distinction of the structures A-S and P-P and the representation of the others processing units (M,IU,DPM), which are supposed to be connected, on each bus C-BUS, according to die already descnbed cπteπon and depicted by figures 1, 2 and 3
• Figure 7 shows the main external connections, shared A-S, C-BUS, and pπvate P-P, of a processor SPU pair forming a biprocessor DPU
• Figure 8 shows die simplified block diagram of die main functional units of botii die two processors SPU of a pair, fulfilled and connected in accordance with the mvention • Figure 9 shows a block scheme of the migration structure A-S of a smgle bus C-BUS, compnsmg processor selection means P-SEL, a migration bus MIG-B, and interconnection structure interfaces S and A disϋbuted in the pair procesors SPU
• Figure 10 shows the existence of address decoders ADEC witiiin die selection means P-SEL of die biprocessors DPU on one same bus C-BUS • Figure 11 shows with greater detail die mtemal connections of die interfaces A and S of die migration structure A-S, and presence of arbitrating means A-R between die access control ports MC, SC to processor register file REG-F
• Figure 12 shows eventual circuit switchmg means M-X interposed on some internal pair buses, in accordance witii the mvention • Figure 13 shows said switchmg means M-X, composed of circuit switchmg elements IXI of 2x2 type, arranged accordmg to die mvention, and details of data connections between registers REG-F and execution units EX-U compnsmg antiimetic and logic unit ALU
• Figure 14 shows the connection asset among the functional units of the two SPU processors in pair, when said switchmg means M-X are "cross switched" • Figure 15 shows a scheme of die command and synchronization circuit XAND of die switchmg blocks DQ composing said switchmg means M-X
• Figure 16 shows the biprocessor DPU containmg only one timing circuitry TGEN-U for both processors SPU of the pair
The mam memory M is distπbuted among die buses C-BUS, each one wrth its separate address space, giving πse to multiple pπvate address spaces On the smgle bus C-BUS, the memory M, eventually modular and hierarchical (caching) Ll/2, is shared by the attached processors SPU and locally accessible in a smgle shared address space The whole interconnection structure is fractured in several different structures local shared structures, A-S and C-BUS, allowing process migration, pnvate structure P-P allowing communication and synchronization of parallel processes Each processor SPU is connected only to one memory bus C-BUS, to one migration structure A-S, and to one pair interconnection P-P Input/output controllers IU can be connected to each bus C-BUS and/or directly to each memory module M, m die usual ways
In each memory bus C-BUS only one migrant process takes place at once, and only one processor SPU per bus C-BUS is active-master at any instant, while all others are inactive- slave, awaiting to partake in processing activity This one-master-many-slaves situation is initially forced by reset To recognise die different initial condition, the processor's SPU reset receivmg unit RST has one more mput signal M/S strapped to a different logic value for the master unit compared to slave ones An active-master processor SPU behaves on its bus C-BUS like a traditional central unit (CPU), processing, wrth complete control of connected resources, instructions and data contained in die local memory M of die bus C-BUS All the modalities already contemplated in a CPU, among which in particular the hold disconnected state reached by die active unit following and dunng a Direct Memory Access (DMA), belong to the active-master operating mode Wrthin an inactive-slave processor SPU die instruction fetch-execution activity is suspended but ready to restart Its registers are accessible from die external bus C-BUS An inactive-slave processor can receive from the active-master one the context data, and, upon request, resume processing activity starting on received context The inactive-slave operative mode includes all those modalities already contemplated in a penpheral or in a coprocessor when tiiey do not access the memory No processor SPU autonomously goes out the inactive-slave state unless it is explicitly required so by the master processor dirough die migration structure A-S or by die reset The program running on a bus decides to which bus processor to transfer its process by executing process migration instructions that control the block transferring of the state registers content from active processor into die corresponding registers of die destination inactive-slave one, and also synchronize control transferring between sender and receiver Migration wakes up die receivmg processor as master and sends to sleep die sender as inactive-slave Under program control, the sharing bus processors transfer among themselves the running process's context and control, and in turn become active-master of the bus C-BUS, m such a way to continue the migrant process without jumps or instructions loss
Two processors SPU of a pair DPU are almost independent each other, and when the one is inactive-slave, the other one can continue processing Two parallel processes are "adjacent" if they can directly mteract within a pair through a pair interconnection P-P To mteract, any two adjacent processes must migrate on their bus C-BUS to the processor SPU which puts them in communication When both reside in the same pair DPU, they can communicate and synchronize themselves by usmg said pair communication instructions The network of all processors SPU interconnected by memory buses C-BUS, by migration A-S and pair interconnection P-P structures, forms a smgle processmg organ R-P, or parallel processor, able to process simultaneously an independent instruction and data stream on each memory bus C-BUS, and capable to synchronize die parallel processes by executing devoted instructions opportunely programmed within processes themselves
BEST MODE FOR CARRYING OUT THE INVENTION The mvention can be earned out in several processor's realization modes, general purpose CISC or modem RISC for executing sequential programs To illustrate the characteπstics of the invented pair processor SPU, in the following we will consider a simple but not restnctive processor's representation scheme, compnsmg die usual functional units, normally interconnected and operating among which at least • a bus unit BU, for interfacing the processor to the memory bus C-BUS and to die other external structures, and for assuring physical compatibility and signal dπvmg on the bus, receivmg from the mtemal executive units EX-U, under the control unit DECO-U control, data and addresses of die memory locations to access, and able to control die external bus cycles and data transferring to/from the outside and die internal registers, and able to transfer to the decode and control unit DECO-U die cπteπons concerning interrupts, exception conditions, arbitrations etc , and the instructions coming from the memory,
• a set of executive units EX-U, all connected and controlled by die decode and control unit
DECO-U, and connected to said bus unit BU by data buses DATA-MB and someone also by address buses ADDR, for executing as usual, at least the anthmetic and logic operations, the memory data loading and storing operations, the address generation and branch operations or also the floating point ones,etc , and sending die computed addresses and data to die registers REG-F or to the bus unit BU, and the conditions/ exceptions to the control unit DECO-U,
• a bank of general and special purpose registers REG-F, for containing the processor state or context, wrth a set of data ports MD normally connected by data multi bus DATA-MB to said executive units EX-U and to the bus unrt BU,and controlled by at least one control port MC connected wrth a control bus REGS-B to the control unit DECO-U,
• a decode and control unit DECO-U, asking/receiving via an mput instruction bus IST-B a smgle sequential instruction stream, and decoding and controlling the instruction executions and sending die immediate data to the executive units EX-U, and controlling the data flow to/from the registers bank REG -F and also coordinating the others functional units
The processor SPU can also compnse registers which do not belong to die state, eventual buffers, data and/or instructions caches, and eventual prefetch and branch prediction units, memory management units (MMU.TLB), interconnected and operating in die usual ways The registers bank REG-F usually has two output and one mput data ports, but it can also have a greater or lower number of data ports The bank's Program Counter register can be directly connected even to the prefetch unit
The bus unit BU compnses at least the bus transferring control unit BC and die external bus C-BUS dπvers Usually it also includes other units as bus C-BUS arbitrating units, interrupts reception/control unit, external/internal cache's control units, etc
Accordmg to the ιnventιon,the bus unit BU is designed to be either master or slave of the memory bus C-BUS It includes means for interfacing die migration structure A-S to die processor's internal registers REG-F and control unit It is connected internally by a selection /control bus LDR-B to die near register bank REG-F for context registers REG-F loading from die outside, and it is connected to the control unit DECO-U by a further control transfemng bus R/R-B for signaling the release/resume of processmg activity The bus unit BU arranges itself as master or as slave accordmg to die active or mactive state of its control unrt DECO-U In slave mode, beyond the external access/loading of internal registers REG-F, upon receivmg of a control migration cntenon from the migration structure A-S, it allows with the control transferring bus R/R-B to wake up the decode and control unit
The control unit DECO-U suspend any host processor activity and sets the connected bus unit BU m slave mode, upon receivmg the special slave reset signal RESET-P by the receivmg reset unit RST In the inactive-slave state the decode unit DECO-U doesn't process instructions, but allows registers access by the outside bus C-BUS and by die pair unit The mactive control unit DECO-U can restart only if it receives the wake up cntenon by the control transfemng bus R/R-B, or the special master reset signal RESET-A by the reset receivmg unit RST In the first case it assumes the registers REG-F current content as the transferred process context, and resumes processmg starting on such state In the second case, it initializes the registers REG-F and the other processor's functional units and resumes processmg on a defined state
To allow synchronization of mstruction pairing, the decode and control unit DECO-U of each pair processor SPU is directly and reciprocally connected to die ether decode/control pair unit by several control/synchronism signals SINC-B To synchronize the two parallel processes, the decode/control unit DECO-U is also connected wrth a further selection/control bus XREG-B to the registers bank REG-F of the pair unit, m a way that rt can access them concurrently wrth that, and it can control die transferring to/from its own executive units EX-U and its own registers REG-F To cooperate for process migration, the decode and control unit DECO-U is also conceived to remain mactive witiiout instructions processmg, signalling its state with at least one signal IDLE, to any other processor's functional units, and holdmg them suspended, To allow independent access by botii pair units, the register bank REG-F is a dual access one, havmg a further and dual data SD and control SC ports sets, connected with the pair processor, former to the executive EX-U and bus BU units and latter to the control unit DECO-U Arbitrating means A-R are located between die dual control ports, MC and SC, of each register bank, for resolving concurrent access conflicts to die same register
The processmg activity mside die active-master processor SPU take place normally The instructions fetched from program memory M of die external bus C-BUS flow inside die processor through the interface unrt BU and eventual caches and buffers until to arrive via an mstruction bus IST-B to die decode unit DECO-U mput Thanks to said exchanged synchronism signals SINC-B their execution can be synchronized in unison with the mstruction stream m the pair unit The external data stream flows m across the bus unit BU and via the data multi bus DATA-MB to/from the executive units EX-U and die registers banks REG-F of both pair processors The executive units EX-U can dialogue via said data muki bus DATA-MB with both registers banks REG-F The control unrt DECO-U controls only the executive units EX-U located within the same host processor SPU, and manages as executmg process context the registers REG-F connected wrth die own register control bus REGS-B Upon execution of pair communication instructions it accesses to the pair unit registers REG-F by the pair register control bus XREG-B, and controls their transfemng via data bus DATA-MB to/from its own executive EX-U and/or bus BU units Upon execution of migration instructions, the decode/control unit DECO-U sets up die addressmg to select on die external bus C-BUS the processor SPU identified by the mstruction operand, and either control the block transfemng of the own context register bank REG-F to the destination processor, or it issues on the control transferring bus R/R-B a control release cntenon to starts a control migration cycle, at the end of which it turn itself mactive slave
The most seemly way to transfer a process still consists m utilizing the shared bus C-BUS and m connecting to it processors selection means P-SEL, realized m die usual way to allow a normal CPU to select memory components or penpherals The selection means P-SEL, shared with the bus C-BUS, reduce the communication by diffusion to a pomt-to-pomt dialogue between die active unit and die selected one only Driving die address and die control buses of the out shared bus C-BUS, master processor can select any one else, including itself A process context, or the state vanables set that allows process suspension, transferring and resuming with continuity, is entirely contained within state registers REG-F of the processor that executes rt This context can be transmitted on the shared C-BUS data bus DATA-B The selected mactive slave processor receives such context directly m its state registers REG-F, or m other interface BU register file transferable to state registers REG-F at the wake-up tune The bus unit BU mcludes a registers selection/control unit S, connected to the external bus C-BUS dnvers/receivers and also, via a selection/control bus LDR-B,to the register bank REG-F When the bus unit BU is in the inactive-slave mode, said selection unit S feels the mput selection signal SELT coming from said external means P-SEL, and connects the bus C-BUS receivers to mput also the external address ADRS-B and control CNTL-B buses, and controls die registers REG-F selection and loading by the data buses To allow block transferring of the state register file and speed-up the process context migration, the processor's accessible registers are consecutively allocated m a unique address space segment Moreover smgle instructions are conceived for transferring the whole context On the shared bus C-BUS each processor SPU takes a different memory address space segment (memory mapped) or, by die nght presetting of die SPU processor's migration instructions and control bus CNTL-B signals, within an address space reserved for penpherals (I/O mapped) or for processors (Processor mapped) only
The control migration can be associated to a particular context vanable, for example one bit of the Status Register SR To achieve greater flexibility and availability it's preferable to transmit separately the control migration message on a special structure To this aim, the migration structure A-S compnses communication means devoted to the explicit control transferring Such means can be implemented in several ways, but a suitable one for modulanty is tiiat they be of decentralized type, composed with a plurality of identical control communication cells A, singularly distπbuted wrthin each bus processor SPU interface BU, and connected by the mput signal SELT to the aforesaid selection means P-SEL and via control transfemng bus R/R-B to die host control unit DECO-U, and all interconnected by a shared devoted migration bus MIG-B for exchanging among themselves, under the bus migrant process's control, cπtenons or messages concerning control migration, and each able to communicate with its control unrt DECO-U dunng the control releasing/acquiring phases
It is also preferable to realize said migration bus MIG-B with bidirectional level signals, compnsmg at least • one signal to send/receive die control migration cntenon,
• one opposrte signal to receive/send the grant/acknowledge cntenon as reply to such control migration cntenon,
• one eventual signal to assert/recognize shared bus/resources possessιon,constantly asserted by the active-master, by which said communication cells A mteract one another with the usual hardware handshake request-acknowledge protocol
Upon execution of one control migration mstruction SIDE,<dest>, the decode/control unit DECO-U sets the addressmg to select on die external bus C-BUS die processor SPU identified by the mstruction operand <dest>, and it issues on the control transferring bus R/R-B a release cntenon to its cell A, that m turn sends on the migration bus MIG-B the migration cntenon, then the decode unrt DECO-U controls die defined release cycle, at die end of which it arranges wrth a signal IDLE the bus unit BU in the slave mode and die cell A in receivmg, and releases the shared bus C-BUS control to the new selected processor, suspending indefinitely its own process activity awaiting to be waked up On the opposrte, the selected unit's cell A connects itself in receivmg, and on the amval of the control migration cntenon from the migration bus MIG-B, sends via control trasfernng bus R/R-B the wake-up cntenon The waked-up control unit DECO-U restores the internal state accordmg to the context register's REG-F current content, eventually adjusts die Program Counter register and, upon the external bus C-BUS releasing, arranges the interface unit BU m master mode and die cell A m transmitting, and takes up again the process control
The register bank REG-F mcludes at least one special register for containing dynamically the current executing mstruction address Such register can be die Program Counter, provided that it is not incremented at fetch time, but at the end of the mstruction of which it holds die address Its content is transferred to die other requestmg pair unit wherem it is compared for the wanted address to identify die corresponding mstruction and to recognize the synchronism condition There are conceived "process explicit synchronization" mstructions which, wrthin the operands, refer said special or Program Counter register of the pair unit, and specify exactly one mstruction instance of the adjacent process, through the (relocatable) memory M address taken on the instance
Upon execution of one such mstruction, the control unit DECO-U accesses with said pair selection/control bus XREG-B to said modified Program Counter register m the pair unit, and controls the register content transfemng via data bus DATA-MB m one its own executive unrt, wherem rt does a companson check between die captured address and the address specified within the mstruction operand itself If equality is satisfied die mstruction terminates, otherwise the executmg unit's program counter register is update (not incremented) so that the same mstruction is soon executed agam on next cycle, or it stalls This mstruction finishes to repeat itself or to stall, only when its execution occurs in parallel with the given address mstruction in the pair unit The wasted time (busy wartmg) for repeating executions represents the synchronization cost, which can be optimized (wrth task switchmg) by the Operative System The mvention allows also "process implicit synchronization" mstructions that work analogously, but they specify within the operand a generic waited condition on the value of one of the pair unit registers Agam, execution of one such mstruction is repeated indefinitely until the selected register does not satisfy the required condition Process synchronization mstructions can be used erther unilaterally by only one of the parallel adjacent processes, or bilaterally by both, and reciprocally refer one anotiier with the one's address in the other one's operand, to form a synchronizing banner They can also cause stalls and deadlocks Inside a processors pair DPU, the two mstruction streams are independent, but they can also be paired m unison as well as proceed with different synchronism asset In isochronous mode the two paired processors proceed tuned in In the asynchronous mode die one is disconnected wrth synchronisms SINC-B coming from the other one, like when the partner unit is mactive It's possible to modify the synchronism relationship in a pair usmg special mstructions Upon execution of a "pair synchronism setting" mstruction SINC <modo>, die control unit DECO-U sets itself and die processor in the "mode" required by the mstruction operand, respect to the exchanged synchronism signals SINC-B, and eventually records die operand value m the assigned field of the status register SR
All new mstructions are combined with the usual operand addressmg modes Several mstruction versions are possible by subdivision or aggregation of mstructions in the same category For instance, it is possible to combine the Program Counter and/or the Status Register SR transferring with a control migration mstruction rather then a context migration mstruction, or to have a smgle mstruction implementing a control and context migration atomic action
The use of selection means P-SEL presumes a logic scheme of processors addressmg, achievable m several ways The most simple scheme consists in to assign to each processor SPU, unequivocally among those attached on the same bus C-BUS, a Processor Identification Number or NIP, usable also as operand m the migration mstructions Since to address the process migration correspond to route the communication among parallel processes, to avoid indirect address handling at any computing level, it's necessary to identify the most suitable logic scheme of processors addressmg So m the mvention the memory buses C-BUS also are logically and unequivocally numbered with a Bus Identification Number or NIB The assignment is accomplished in way that each processor had a NIP correspondmg to the NIB of its pair processor's bus, and no processor had a NIP correspondmg to its own bus's NIB With this scheme, by addressmg a processor, a migrant process directly address itself to die correspondmg adjacent process's bus On die several buses C-BUS die addressmg schemes are analogous but never equals Relationship among NIP and NIB characterizes the scheme on each bus as well as the configuration and/or the content of the address decode units ADEC of said selection means P-SEL The processor pairing allows also internal optimizations by sharing functional units wrthin the pair For example the biprocessor DPU may mclude only one mtemal time generation unit TGEN-U for both processors of the pair
The decode and control unrt DECO-U is also connected reciprocally to die pair control unrt,at least with one signal IDLE,to communicate the operative state active-master or inactive-slave respect to its own migrant process, which is eventually reported in a field of die Status Register SR, in way diat the one can know the other one's activity state
In considering the biprocessor DPU as a communication unrt between a bus pair C-BUS, one soon see the advantage to integrate wrthin it dualport or communication memory DPM (or 2-fifo), and also a pair of interrupt communication units The interrupt structures relevant to each bus C-BUS can be interconnected in pairs with local pπvate structures, following the processors's pairing scheme An interrupt communication unit, working and accessible as any genenc interrupting device, is normally connected to the interrupt structure of the one bus C-BUS, but characteristically connected with pomt-to-pomt buses to a correspondmg identical unit belonging to the interrupt structure of the other one's bus C-BUS, m way to transfer interrupt messages between the paired bus C-BUS One such interrupt communication unit can be integrated witiiin the pair processor SPU, by connecting it internally wrth a complete data-address-control bus to the respective bus unit BU, or to the memory management unit (MMU) if any, and making it accessible by the external C-BUS.
With the invention, the bus C-BUS private memories M, even wrth physically disjoint address spaces, can also be shared by all parallel adjacent processes in no concurrent way. To the purpose it's enough that die two processors SPU of a pair exchange reciprocally the fetched instruction stream and the accompanying context. To this aim some buses of both processors of the pair can be made dynamic by introducing circuit switching means M-X for reconfiguring the pairing asset. Upon synchronized execution of devoted switching instructions XEX on both pair units, such means M-X may switch in a way that instantaneously each unit could receive instructions fetched by the other one and could see as own context the pair unit's register file REG-F. There are several ways to insert switching elements within the pair structure P-P, if one includes also the data multi bus DATA-MB, but it's better to switch the lower number of buses. In accordance with the invention, switching elements M-X are interposed only on the bus pairs REGS-B,XREG-B connecting each decode and control unit DECO-U to both register banks REG-F, and on die two buses IST-B allowing the instruction flow input to the decode units DECO-U. The switching elements need only two switching states or functions: "straight", wherein the switching means M-X are transparent and do not modify bus connections seen so far; and "cross", wherein the first input part and the second output part of each said bus pairs results inversely connected to one another within the pair.
To control their switching, each control unrt DECO-U is directly connected to said switching means M-X with at least one output signal X-S and one input signal C. Control mechanism is the same for both switching states. Upon execution of a switching instruction XEX <b>, the decode and control unit DECO-U commands witii the output signal X-S the switching state required by die boolean operand <b>, then checks the operand wrth the input signal C that carries back the means's M-X switching state. If they coincide (i.e. switching done), the instruction ends, after eventual exchange of some register; otherwise it stalls or the Program Counter register is adjusted (decreased or not increased) in such way that the instruction will be executed again on next cycle. The processor SPU stalls in busy-waiting state by executing again indefinitely die switching instruction XEX until die switching of said means M-X occur when the pair unit also requires the same switching state. At switching time it takes place also the two processes synchronization. Dunng the "cross" switching state, mstructions fetched from one bus C-BUS directly act on vanables located in the pπvate memory M of the other one adjacent bus, so that data streams of the two running processes result switched on die memory M buses C-BUS of correspondmg mstruction streams Within this asset the two mvolved processes cannot migrate and any execution of migration mstructions causes an exception handling
Since switchmg involves twin bus pairs, it is natural to use switchmg blocks 1X1 made up of the usual 2X2 type elements, havmg two inputs and two outputs controlled by an mput control port CP,and capable to assume,between mputs and outputs, one of said switchmg states (straight or cross) in accordance with the logic value put on the control port CP At least three switchmg blocks DO are needed, one for each said bus pair, havmg all control ports CP interconnected, m way to be able to switch simultaneously as reaction to one smgle control signal C A switchmg enable unit XAND with at least two mputs connected respectively to the two command signals X-S coming from botii control units DECO-U, enables the switchmg with at least one output control signal C.connected to the control ports CP of all switchmg blocks Dfl and to the control units themselves, upon concurrent receivmg on both mputs of agree switchmg requests
Eventual processor's SPU status register SR, besides the usual informations concemmg hos processor state, can contain further fields concemmg die migration activity, the own and pair processor synchronism mode, the switchmg means M-X mode, etc
INDUSTRIAL APPLICABILITY
The mvention allows industrial scale production of the same economic microelectronic (VLSI) component that integrates both processors SPU of a pair DPU and their interconnections P-P, eventually including also said switchmg means M-X, a dual port or communication memory DPM and a pair of interrupt communication units, in a smgle biprocessor unit, usable on its turn for manufacturing parallel and compatible computers m a large range of configurations, powers and costs, addressed toward different market segments, from personal to supercomputers

Claims

1 General purpose electronic numeric parallel processor, multicomputer multiprocessor, compnsmg a collection of processors and memory modules (M) all mterconnected and cooperating for executmg parallelized programs, and for speedmg up the completion tune or latency of smgle large tasks, charactenzed in that processors (SPU) are redundantly replicated on each memory (M) bus (C-BUS), and formed for bemg erther master-active or slave-inactive of the own shared bus (C-BUS), and for interfacing directly a migration interconnection structure (A-S), and for accepting, from the outside in own mtemal registers, the process state, and for sending/receiving to/from the other processors of die same bus (C-BUS) the process context and control, and for suspending/resuming processing activity as execution effect of special process migration instructions, in way to cooperate with programmed relay m executmg a smgle sequential migrant process on each bus (C-BUS), and also charactenzed m that for each memory bus (C-BUS) compπses a migration structure (A-S) connecting all the bus processors (SPU) and the shared bus (C-BUS), for allowmg process migration from processor to processor, and for receivmg by the current active-master processor the destination processor address and the process context contained within processor state registers, and delivering it to registers of the destination inactive-slave processor, and for synchronizmg the control migration between the releasmg and receivmg processor, and further charactenzed in that each processor (SPU) is also tightly and reciprocally connected to another processor (SPU) belonging to a separate memory bus (C-BUS), by direct extensions of mtemal pnvate buses (P-P) to the other one's functional units, m way to access at least the pair processor's register file (REG-F), and to exchange data and control/synchronization signals within the pair, so to form a smgle biprocessor unit (DPU) for synchronizmg die pair of executmg processes by executmg special pair communication mstructions
2 Parallel computer as claimed in claim 1, wherem the smgle pair processor (SPU) compπses the usual functional units, executive units (EX-U) including at least arithmetic and logic (ALU), decode and control (DECO-U) and external bus mterface (BU) units, mterconnected and operating in the usual ways, but charactenzed m that
• the bus unit (BU) interfaces also said migration structure (A-S), and it is connected to the control unrt (DECO-U) with a further control transfemng bus (R/R-B) for conversing about control migration, and connected to the register file (REG-F) with a further selection/control bus (LDR-B) for loading of process external context data, and formed to be either master or slave of the external bus (C-BUS) and structures, and for arranging itself m master or slave mode in accordance wrth at least one signal (IDLE) issued by the control unit, in way that, as master it can receive by the control unit control/ releasmg cnteπons and route them on the migration structure (A-S), and m slave mode it can allow state registers (REG-F) access and loading by the external bus (C-BUS) and receive/recognize control migration cnteπons or messages arriving from the migration structure (A-S) and transfer wake-up cπtenons to the control unit (DECO-U),
• the register bank (REG-F) is a dual access one, compnsmg a dual data port set (SD) connected via data muki bus (DATA-MB) to the pair processor's executive units (EX-U), controlled by a further dual control port (SC) connected to die pair processor's control unrt (DECO-U),and arbitrating means (A-R) between control ports (MC,SC) for resolving concurrent access conflicts to the same register, in way to be symmetrically and concurrently accessible by both processmg units of the pair,
• the decode/control unit (DECO-U) is also reciprocally connected to the pair processor's control unit (DECO-U) via a synchronism bus (SINC-B), for mstruction execution paiπng, and connected wrth further control bus (XREG-B) to the register bank (REG-F) located wrthin pair unit, and connected with further control transfemng bus (R/R-B) to the bus unit (BU) for releasing/resuming the process activity, and formed for resetting the own registers (REG-F) and arranging itself as active-master or inactive-slave, in reaction to the reception of a corresponding different reset signal (RESET- A,RESET-P) issued by reset receivmg unit (RST), and
• in inactive-slave mode, for holding suspended die own and any host processor activity, and arranging the connected bus unit (BU) in slave mode, and allowmg access to own register bank (REG-F) by both the outside bus (C-BUS) and by the pair unit, and for arranging itself as active-master upon receivmg of wake-up cntenons amvmg on said control transferring bus (R/R-B), and
• in active-master mode, formed for settmg the bus unit (BU) as master, and resummg the processmg activity starting on the own registers (REG-F) content (controlled by bus REGS-B), and for synchronizmg itself with the pair unit in a way to parallelize instructions m unison with that, and eventually modifying the execution synchronism upon execution of "pair synchronism settmg" mstructions (SINC,<modo>), and
• as reacUon to execution of "process migration" mstructions, commanding on the external bus (C-BUS) the processor addressmg specified by the mstruction operand, and eventually adjusting the own Program Counter register content, and block transferring, through the mterface (BU) on the bus (C-BUS), orderly the own context registers (REG-F) content, including Program Counter register, and/or for emitting on said control transferring bus (R/R-B) release controls/criterions for starting on the external buses (C-BUS,MIG-B) a control migration cycle, and suspending any host processor activity, in a way that, at the end of the migration cycle, to arrange itself as inactive-slave awaiting to be waked up, and • as reaction to execution of "pair communication" instructions, for accessing to the pair processor's registers (REG-F), and for transferring their content in own units/registers or viceversa, and in particular,
• as reaction to execution of "process synchronization" instructions, for accessing in the pair unit's bank (REG-F) to the register specified within the operand, including the program counter register, and transferring its content into the own executive unit (ALU) for comparing it with the condition and/or the data specified by the operand, and for modifying die own program counter register, in way to execute again die same instruction as long as the condition is not satisfied.
3. Parallel computer as in claim 1, wherein said migration structure (A-S) of each memory-processors bus (C-BUS) is characterized in that comprises:
• selection means (P-SEL) for selecting the bus processors, input connected to the address and control buses of the external bus (C-BUS), and output connected with at least one separate selection signal (SELT) to each processor (SPU) on the bus, suitable for decoding part of said address bus and, witii control bus validation times, asserting a single output signal toward the selected processor;
• a plurality of register selection units (S) for selecting/accessing processor registers, each unit (S) being singularly distributed in each bus processor (SPU) interface, and input connected wrth part of the shared bus (C-BUS) address (ADRS-B) and control (CNTL-B) buses and with the selection signal (SELT) coming from said selection means (P-SEL), and connected with a control/selection bus (LDR-B) to the processor's internal context registers (REG-F) and also wrth the external bus (C-BUS) drivers (transceivers), and input connected with at least one processor internal state signal (IDLE), for recognizing that the processor has been selected, and for allowing external access control and also block loading of said registers (REG-F) from the outside data bus (DATA-B) when the processor is inactive-slave;
• an eventual plurality of migration communication cells (A) for synchronizing control migration, singularly distributed in each bus processor (SPU) and interconnected by a shared migration bus (MIG-B), each cell (A) being connected to the host control unit (DECO-U) wrth a control transferring bus (R/R-B) for conversing on control migration, and input connected with the selection signal (SELT) coming from said selection means (P-SEL), and wrth at least one state signal (IDLE) issued by the control unit for arranging itself as master- transmitting or as slave receiving on said migration bus (MIG-B) according to the active or inactive state of the control unit, in way that, in transmitting mode, receiving a releasing criterion by the crontrol transferring bus R/R-B and sending out on the shared migration bus (MIG-B) a control migration criterion /message, and in receiving mode, once it has been selected, for receiving by the migration bus (MIG-B) a control migration criterion or message, and emitting on the control transferring bus (R/R-B) a wake-up criterion;
4. Parallel computer as in claim 3, wherem said selection means (P-SEL) of a single memory bus (C-BUS), comprise at least one address decoders (ADEC) characterized to be formed/arranged differently/unequivocally on each bus compared to those means (P-SEL) of all the other buses
(C-BUS), in a way:
• to assign unequivocally within the machine to each shared bus (C-BUS) a bus address code or Bus Identification Number (NIB), chosen in an integer set, for example: 0,1,2,3,.. ; • to select unequivocally one own bus's processor (SPU) by responding to a address code or Processor Identification Number (NIP), which never equals the own bus's NIB but orresponds to the bus's NIB of the pair processor whom the selected processor is paired wrth.
5. Parallel computer as claimed in claim 1, wherein said pair (DPU) of processors (SPU), is characterized in that it comprises a single time generation unit (TGEN-U) for both processors
(SPU) of the pair, input connected with the external clock signals (CLOCK), and connected with output signals (CLK) to both control units (DECO-U) and to die other synchronous circuits of the pair, for providing internal timing signals, with different frequencies also, to all the requiring units of the whole pair.
6. Parallel computer as in claim 2, wherein said pair (DPU) of processors (SPU) is characterized in that comprises within the pair interconnection (P-P), eventual circuit switching means (M-X) symmetrically placed between said registers control buses (REGS-B.XREG-B) of both pair units, and between buses (IST-B) carrying the instruction streams to the decode units (DECO-U) inputs, for switching the pair of instruction buses (IST-B) wrth the pair of decode units (DECO-U), and each pair of register control buses (REGS-B,XREG-B) wrth the respective pair of control ports (MC,SC) of both register banks (REG-F), in way that the context registers and the instruction streams turn out exchanged between the two processors (SPU), and each one can continue processing wrth the instructions fetched by the other one, on data contained in own memory (M); and characterized in that the decode and control unit (DECO-U) is even connected to said switching means (M-X) with at least one output signal (X-S) and at least one input signal (C), and formed for commanding, upon the execution of special switching instructions (XEX,<b>) the switching of said means (M-X), and synchronizing/delaying the executing process wrth that running in the pair unit, for example acting on the own Program Counter register and executing again the same instruction, until the switching takes place.
7. Parallel computer as in claim 6, wherein said switching means (M-X) are characterized in that comprise: • a plurality of switching elements (IXI) of the 2X2 type, with two inputs, two outputs and one control port (CP), placed between the signal pairs of each bus pair to switch, and interconnected together on all the control ports (CP), for switching simultaneously in one of two possible connection states: "straight" wherein the two inputs are directly connected to the corresponding outputs; or "criss-cross", wherein the two inputs are inversely connected with the two outputs;
• one switching enable unit (XAND) for conunanding/synchronizing the swrtching, connected to both control unit (DECO-U) with at least one input signal (X-S) for each one, and with at least one output signal (C) to all said control ports (CP) and to both the same control units, for enabling the switching by emitting on the output (C) the switching criterion in accordance to the synchronized equal requests of both its inputs (X-S).
8. Parallel computer as in claims 2 or 6, wherein the eventual processor's (SPU) Status Register (SR) is characterized for being extended in way to comprise, singularly or combined, the following register readable fields: • a field (I), depending on at least one signal (IDLE) coming from the pair processor, for making accessible the operative state, active-master or inactive-slave, of the pair processor;
• a field (modo), writable by the "pair synchronism setting" instructions (SINC,<modo>), for recording the processor synchronism mode and that of the pair unit;
• a field (X) modifiable wrth said switching instructions (XEX <b>), for showing the current asset of said switching means (M-X).
PCT/IT1997/000121 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs WO1997045795A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US09/194,459 US6363453B1 (en) 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs
AU30471/97A AU714681B2 (en) 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs
EP97925270A EP0901659B1 (en) 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs and method
DE69701802T DE69701802T2 (en) 1996-05-30 1997-05-28 PARALLEL PROCESSOR WITH REDUNDANCY OF PROCESSOR PAIRS AND METHOD
CA002255634A CA2255634C (en) 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs
JP09541974A JP2000511309A (en) 1996-05-30 1997-05-28 Parallel computer with processor pair redundancy

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT96NA000032A IT1288076B1 (en) 1996-05-30 1996-05-30 ELECTRONIC NUMERICAL MULTIPROCESSOR PARALLEL MULTIPROCESSOR WITH REDUNDANCY OF COUPLED PROCESSORS
ITNA96A000032 1996-05-30

Publications (1)

Publication Number Publication Date
WO1997045795A1 true WO1997045795A1 (en) 1997-12-04

Family

ID=11387885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IT1997/000121 WO1997045795A1 (en) 1996-05-30 1997-05-28 Parallel processor with redundancy of processor pairs

Country Status (8)

Country Link
US (1) US6363453B1 (en)
EP (1) EP0901659B1 (en)
JP (1) JP2000511309A (en)
AU (1) AU714681B2 (en)
CA (1) CA2255634C (en)
DE (1) DE69701802T2 (en)
IT (1) IT1288076B1 (en)
WO (1) WO1997045795A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002059767A1 (en) * 2001-01-25 2002-08-01 Xelerated Ab Apparatus and method for processing pipelined data
EP1436724A1 (en) * 2001-09-28 2004-07-14 Tidal Networks, Inc. Multi-threaded packet processing engine for stateful packet pro cessing
DE10319903B4 (en) * 2003-04-29 2007-05-31 Siemens Ag Intrinsically safe computer arrangement
ES2303742A1 (en) * 2005-08-01 2008-08-16 Universidad De Cordoba Communications system to perform parallel tasks through personal computers. (Machine-translation by Google Translate, not legally binding)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19802364A1 (en) * 1998-01-22 1999-07-29 Siemens Ag Computer system with parallel processing
JP3780732B2 (en) * 1999-03-10 2006-05-31 株式会社日立製作所 Distributed control system
US6438671B1 (en) * 1999-07-01 2002-08-20 International Business Machines Corporation Generating partition corresponding real address in partitioned mode supporting system
US6738845B1 (en) * 1999-11-05 2004-05-18 Analog Devices, Inc. Bus architecture and shared bus arbitration method for a communication device
US7191310B2 (en) * 2000-01-19 2007-03-13 Ricoh Company, Ltd. Parallel processor and image processing apparatus adapted for nonlinear processing through selection via processor element numbers
DE10059026A1 (en) * 2000-11-28 2002-06-13 Infineon Technologies Ag Unit for the distribution and processing of data packets
JP2002297556A (en) * 2001-03-29 2002-10-11 Fujitsu Ltd Multiprocessor system, control method and program for multiprocessor, and computer readable recording medium with the program recorded thereon
US20020161453A1 (en) * 2001-04-25 2002-10-31 Peltier Michael G. Collective memory network for parallel processing and method therefor
DE60124324T2 (en) * 2001-09-07 2007-05-24 Nokia Corp. METHOD AND DEVICE FOR SERVICE-QUALITY-BASED DIMENSIONING OF THE CAPACITY OF A CELL
US20060218556A1 (en) * 2001-09-28 2006-09-28 Nemirovsky Mario D Mechanism for managing resource locking in a multi-threaded environment
FI20021287A0 (en) * 2002-06-28 2002-06-28 Nokia Corp Balancing load in telecommunication systems
US7634640B2 (en) * 2002-08-30 2009-12-15 Infineon Technologies Ag Data processing apparatus having program counter sensor
GB0425860D0 (en) * 2004-11-25 2004-12-29 Ibm A method for ensuring the quality of a service in a distributed computing environment
US8279886B2 (en) * 2004-12-30 2012-10-02 Intel Corporation Dataport and methods thereof
US20060200278A1 (en) * 2005-03-02 2006-09-07 Honeywell International Inc. Generic software fault mitigation
US8527741B2 (en) * 2005-07-05 2013-09-03 Viasat, Inc. System for selectively synchronizing high-assurance software tasks on multiple processors at a software routine level
US8190877B2 (en) * 2005-07-05 2012-05-29 Viasat, Inc. Trusted cryptographic processor
US7739470B1 (en) * 2006-10-20 2010-06-15 Emc Corporation Limit algorithm using queue depth to control application performance
FR2918190B1 (en) 2007-06-26 2009-09-18 Thales Sa ADDRESSING DEVICE FOR PARALLEL PROCESSOR.
US20110179255A1 (en) * 2010-01-21 2011-07-21 Arm Limited Data processing reset operations
US8051323B2 (en) * 2010-01-21 2011-11-01 Arm Limited Auxiliary circuit structure in a split-lock dual processor system
US8108730B2 (en) * 2010-01-21 2012-01-31 Arm Limited Debugging a multiprocessor system that switches between a locked mode and a split mode
US9525647B2 (en) 2010-07-06 2016-12-20 Nicira, Inc. Network control apparatus and method for creating and modifying logical switching elements
US8817621B2 (en) 2010-07-06 2014-08-26 Nicira, Inc. Network virtualization apparatus
US9465670B2 (en) 2011-12-16 2016-10-11 Intel Corporation Generational thread scheduler using reservations for fair scheduling
EP3662474B1 (en) 2017-07-30 2023-02-22 NeuroBlade Ltd. A memory-based distributed processor architecture
EP3966936B1 (en) * 2019-05-07 2023-09-13 Silicon Mobility SAS Spatial segregation of flexible logic hardware
US11789896B2 (en) * 2019-12-30 2023-10-17 Star Ally International Limited Processor for configurable parallel computations

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4358823A (en) * 1977-03-25 1982-11-09 Trw, Inc. Double redundant processor
EP0298396A2 (en) * 1987-07-08 1989-01-11 Hitachi, Ltd. Function-distributed control apparatus
GB2251964A (en) * 1991-01-15 1992-07-22 Sony Corp Processor arrays

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60229160A (en) * 1984-04-26 1985-11-14 Toshiba Corp Multiprocessor system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4358823A (en) * 1977-03-25 1982-11-09 Trw, Inc. Double redundant processor
EP0298396A2 (en) * 1987-07-08 1989-01-11 Hitachi, Ltd. Function-distributed control apparatus
GB2251964A (en) * 1991-01-15 1992-07-22 Sony Corp Processor arrays

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002059767A1 (en) * 2001-01-25 2002-08-01 Xelerated Ab Apparatus and method for processing pipelined data
US7010673B2 (en) 2001-01-25 2006-03-07 Xelerated Ab Apparatus and method for processing pipelined data
EP1436724A1 (en) * 2001-09-28 2004-07-14 Tidal Networks, Inc. Multi-threaded packet processing engine for stateful packet pro cessing
EP1436724A4 (en) * 2001-09-28 2007-10-03 Consentry Networks Inc Multi-threaded packet processing engine for stateful packet pro cessing
US7360217B2 (en) 2001-09-28 2008-04-15 Consentry Networks, Inc. Multi-threaded packet processing engine for stateful packet processing
DE10319903B4 (en) * 2003-04-29 2007-05-31 Siemens Ag Intrinsically safe computer arrangement
ES2303742A1 (en) * 2005-08-01 2008-08-16 Universidad De Cordoba Communications system to perform parallel tasks through personal computers. (Machine-translation by Google Translate, not legally binding)

Also Published As

Publication number Publication date
DE69701802T2 (en) 2000-11-16
AU3047197A (en) 1998-01-05
CA2255634A1 (en) 1997-12-04
EP0901659A1 (en) 1999-03-17
US6363453B1 (en) 2002-03-26
JP2000511309A (en) 2000-08-29
CA2255634C (en) 2002-02-12
AU714681B2 (en) 2000-01-06
EP0901659B1 (en) 2000-04-26
ITNA960032A0 (en) 1996-05-30
DE69701802D1 (en) 2000-05-31
IT1288076B1 (en) 1998-09-10
ITNA960032A1 (en) 1997-11-30

Similar Documents

Publication Publication Date Title
EP0901659B1 (en) Parallel processor with redundancy of processor pairs and method
KR920008458B1 (en) Computer system architecture
US7577822B2 (en) Parallel task operation in processor and reconfigurable coprocessor configured based on information in link list including termination information for synchronization
US9052957B2 (en) Method and system for conducting intensive multitask and multiflow calculation in real-time
JP2501419B2 (en) Multiprocessor memory system and memory reference conflict resolution method
JPH04348451A (en) Parallel computer
US9280513B1 (en) Matrix processor proxy systems and methods
US5960209A (en) Scaleable digital signal processor with parallel architecture
Simmendinger et al. A PGAS-based implementation for the unstructured CFD solver TAU
JPH0668053A (en) Parallel computer
US8578130B2 (en) Partitioning of node into more than one partition
Vick et al. Adptable Architectures for Supersystems
US8131975B1 (en) Matrix processor initialization systems and methods
US20030229721A1 (en) Address virtualization of a multi-partitionable machine
US7870365B1 (en) Matrix of processors with data stream instruction execution pipeline coupled to data switch linking to neighbor units by non-contentious command channel / data channel
JP2620487B2 (en) Computer package
Tuazon et al. Mark IIIfp hypercube concurrent processor architecture
Weems Asynchronous SIMD: an architectural concept for high performance image processing
JPS61136157A (en) Multi-microprocessor module
Somani et al. Minimizing overhead in parallel algorithms through overlapping communication/computation
KR100279744B1 (en) Single-Chip Multiprocessor Microprocessor with Synchronized Dedicated Register File
JP2965133B2 (en) Processor system
Tudruj Multi-layer reconfigurable transputer systems with distributed control of link connections
JPS62286155A (en) Multi cpu control system
Somani et al. Achieving robustness and minimizing overhead in parallel algorithms through overlapped communication/computation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU BR CA CN HU IL JP KR MX NO NZ PL RU US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2255634

Country of ref document: CA

Ref country code: CA

Ref document number: 2255634

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 09194459

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1997925270

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1997925270

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1997925270

Country of ref document: EP