WO1987001485A1 - A data processing device - Google Patents

A data processing device Download PDF

Info

Publication number
WO1987001485A1
WO1987001485A1 PCT/GB1986/000514 GB8600514W WO8701485A1 WO 1987001485 A1 WO1987001485 A1 WO 1987001485A1 GB 8600514 W GB8600514 W GB 8600514W WO 8701485 A1 WO8701485 A1 WO 8701485A1
Authority
WO
WIPO (PCT)
Prior art keywords
computers
network
connections
switch
transputers
Prior art date
Application number
PCT/GB1986/000514
Other languages
French (fr)
Inventor
Christopher Roger Jesshope
Patrick Seymore Pope
Anthony John Grenville Hey
Denis Alan Nicole
Edward Keith Lloyd
Original Assignee
The University Of Southampton
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The University Of Southampton filed Critical The University Of Southampton
Publication of WO1987001485A1 publication Critical patent/WO1987001485A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17337Direct connection machines, e.g. completely connected computers, point to point communication networks
    • G06F15/17343Direct connection machines, e.g. completely connected computers, point to point communication networks wherein the interconnection is dynamically configurable, e.g. having loosely coupled nearest neighbor architecture

Definitions

  • the present invention relates to data processing devices comprising a plurality of computers.
  • Known parallel processing devices can be broadly classified into three types, according to what provision is made for communication between computers. Some devices use a bus to which all computers are connected. Others provide a fixed network of connections between computers, often between each device and its nearest neighbours. Communication between unconnected computers, where necessary, is performed by passing a message along a line of connected computers until the message reaches its destination. Finally, other devices provide a memory common to all the computers, so that messages may be sent by storing them in the memory, for retrieval by another computer. In a common arrangement, the memory is partitioned into blocks and a switch network is provided for connecting any block of memory to any of the computers.
  • the provision for data movement between computers can present problems which limit the processing speed attainable by the device.
  • the bus width determines how long a single message takes to be transmitted, and so determines how long another computer may have to wait before it can send a message.
  • the second type of device a large number of connections are used for messages between distant, unconnected computers and transmission time can become excessive unless the bandwith of the connections is large.
  • the switch network presents a bottleneck to data flow in the device, unless the bandwith of connections to the memory is exceptionally wide.
  • a data processing device comprising a plurality of computers, a switch network for effecting connections between the computers, and control means which, in use, receives an instruction defining an algorithm to be executed by the device, translates the algorithm into sub- algorithms for execution in parallel by respective computers, instructs the computers to execute the sub-algorithms, and controls the switch network to provide direct connections between any computers which must communicate for the execution of their respective sub-algorithms.
  • control means assigns tasks to the computers and links them together so that the connections accurately reflect the data flow necessary for the solution of the problem.
  • a network in which this is the case is called an "algorithmic network" in this specification.
  • connections In an algorithmic network, connections only exist where they are needed and so data movements are efficient. Accordingly, connections can be narrow, for instance bit-serial links, without seriously reducing the processing speed of the device.
  • a device according to the invention could be used as part of a larger system.
  • each of the computers of the known parallel processing devices described above could be replaced by a group of the same type of computers forming a device according to the invention.
  • the processing power of the known device would then be significantly increased.
  • devices according to the invention could be used as the computers in a larger device which in itself is a device according to the invention.
  • devices according to the invention can be thought of not only as independent data processing devices, but as building blocks for larger systems, and these larger systems can themselves be used as building blocks for still larger systems.
  • each computer is a transputer, so that the device can be compact.
  • Transputer is a term recognised in the art, and used here, to mean a self-contained device having a processor, memory and input and output interface facilities.
  • Modern transputers are single chip devices such as the device sold by the INMOS Corporation under the device number IMS T424.
  • the instruction received by the device defines an OCCAM process and the control means translates the OCCAM process into component OCCAM processes for execution by respective computers.
  • the IMS T424 transputer is particularly intended to operate under the OCCAM programming language.
  • OCCAM is a trademark of the INMOS Group of companies. Briefly, OCCAM treats an operation as being made up of "processes" which involve a sequence of actions on data and which use data from and provide data for other processes. The means by which two processes communicate is referred to as a "channel". In the formalism of OCCAM, a number of processes may together form a process, in that the group of processes also involves a sequence of actions and also requires input and provides output. Equally, a process may be thought of as being formed of sub-processes, each being a process within the formal definition.
  • the invention provides a data processing device comprising a plurality of computers operable in parallel, means for providing communication between the computers, and control means controlling the computers and the communication means, each computer being a device according to the first aspect of the invention.
  • devices according to the first aspect have use both as independent processing devices and as components for the construction of larger devices.
  • Fig. 1 is a schematic diagram of a first embodiment of a device according to the first aspect of the invention
  • Fig. 2 is a more detailed schematic diagram of the device of Fig. 1;
  • Fig. 3 is ⁇ a diagram of the switch network 12a of Fig. 2;
  • Fig. 4 is a circuit diagram of one switch of the network of Fig. 3;
  • Figs. 5 and 6 are block diagrams showing how the control transputer controls the switch network of the device and peripherals;
  • Fig 7 is a schematic diagram of a second embodiment
  • Figs 8 to 11 show circuits helpful for understanding the operation of the embodiment of Fig.7;
  • Fig 12 shows the embodiment of Fig 7, extended to allow external communication to the device
  • Fig 13 shows an alternative switching arrangement for the embodiment of fig 7.
  • Fig 14 shows a simplified version of the arrangement of Fig 13.
  • Fig 15 shows a preferred network for connecting devices to form a larger device.
  • Fig. 1 shows a data processing device 10 comprising a plurality of transputers T-, T2 ... T,g, only two of which are indicated, labelled T, and ,g.
  • the device 10 also comprises a switch network 12 for effecting connections between transputers, and control means 14.
  • the control means 14 comprises a transputer T mem , which, in use, receives an instruction defining an algorithm to be executed by the device 10 and translates the algorithm into sub-algorithms.
  • Sub- algorithms are algorithms which, when executed in combination, produce results equivalent to the results of execution of the main algorithm.
  • the sub-algorithms are for execution in parallel by respective transputers T, etc.
  • the control means 14 programs the transputers T, etc. to execute the sub-algorithms, and controls the switch network 12 to provide direct connections between any transputers which must communicate for the execution of their respective sub-algorithms.
  • transputers T , T , T and t are also connected to the switch network, and provide interfacing between the device and external circuits.
  • the device 10 is therefore divided into three distinct sets of transputers. Firstly, the transputer is responsible for all control functions within the device 10, including controlling the swit network 12 to provide connections within the device.
  • the transp r uter T_m, consultationejonm also has associated bulk memory J whose use it controls.
  • the bulk memory may comprise disc or RAM or any other type of storage.
  • the device shown uses a disc store 16 with a capacity of lOOMbyte and a solid state store 18, preferably a RAM, with a capacity of 16Mbyte.
  • control transputer T Although only a single control transputer T is shown, several co-operating transputers may be required in a device which is required to perform particularly complex tasks, or which consists of a large number of transputers.
  • the second set of transputers, the transputers T ⁇ to Tig perform the data processing within the device. These transputers operate in parallel and are connected by the switch network, under the control of the transputer mem to form an algorithmic network.
  • the third set is. the interface transputers T ⁇ , T , T z and T t which each have an associated 64kbyte memory, M , M , M , M fc so that interfacing including buffering is possible.
  • Each transputer T , T formulate, T_, T.. provides two outputs, labelled 20 . 20__, 20_, 20,..
  • All of the transputers used in the preferred embodiment are INMOS T424 transputer devices.
  • Each transputer has four bit- serial, duplex input/output ports known as "links".
  • the four links of each device are designated North, South, East and West, respectively.
  • two connections are necessary, one for data and one for acknowledgements.
  • the simplicity of the necessary connections makes practicable a switch network which can provide the wide variety of connections necessary to Implement an algorithmic network for a useful range of algorithms.
  • a numeral adjacent a connection indicates the number of duplex channels provided by the connection.
  • similar numerals indicate the number of single-bit connections (single wires) provided.
  • the switch network 12 is shown as four distinct switch circuits 12a, 12b, 12c and 12d.
  • the outputs from the North links of the sixteen transputers T,, etc. (shown as a group 22 in Fig. 2) are applied as inputs to the switch circuit 12a, which provides outputs to the inputs of the North links.
  • the switch circuits 12b, 12c and 12d are connected between the East, South and West link inputs and outputs respectively.
  • Fig. 3 shows the switch circuit 12a in more detail.
  • Sixteen inputs 24 are applied in pairs to a first column of 2-way switching circuits 26.
  • the switching circuits 26 have two outputs to which the inputs may be passed in either permutation.
  • the outputs of each switch 26 are connected to respective inputs of a second column of identical switches 28 whose outputs are passed through further columns of identical switches until the final output of the switch circuit is provided from the righthandmost column of switches 30.
  • the state of each of the switches is controlled by a control circuit 32.
  • the design of the switch circuit is based on that of a Benes network.
  • the circuits 12b, 12c, 12d provide the same possibilities for connection for the East, South and West links, respectively.
  • a Benes network and an algorithm for determining the necessary switch states are described in the article "Parallel Algorithms to set up the Benes permutation network", IEEE Transactions on Computers, February 1982.
  • the control circuit 32 is instructed by the control transputer Tmem as to the req ⁇ planetaryd states of the switches, and the circuit converts this instruction into instructions for each switch.
  • the size and cost of a switch network of this type increases rapidly with an increase in the number of inputs, the cost varying approximately as the square of the number of inputs.
  • the number of inputs and outputs can only be increased by factors of two.
  • This embodiment seeks to maximise the processing power of the device 10 by using all of the inputs and outputs of the switch networks 12a, 12b, 12c, 12d for transputers Ti etc, and to accommodate the remaining transputers as shown in Fig 3.
  • An additional column of switches 34 is incorporated in the Benes network.
  • the upper and lower inputs of the switches 34 would be directly connected to the upper and lower outputs respectively.
  • One link of each of the interface transputers T , T , T , T fc is connected between one input of a respective switch 34 and one output of the corresponding switch 35 in the neighbouring column of switches.
  • a full link can be provided between a transputer T, to T,,. and a transputer T , T , T or T t by setting the circuit 12a to connect the link of the transputer T, etc. to itself, by way of a path which incorporates the connection between the appropriate pair of switches 34, 35.
  • a data path for output is provided between an input of the circuit 12a and the interface transputer, and an acknowledge path is provided from the interface transputer to an output of the circuit 12a.
  • the path from the input 24 to the interface transputer Is the acknowledge path and the path from the interface transputer to the output 30 is the data path.
  • the other output of the switches 35 is directly connected to the other input of the corresponding switch 34. This and other direct connections between the columns of switches 34, 35 enable connections to be made which do not involve the transputers Tx, Ty,
  • connections to the circuit 12a use one link of each of the interface transputers.
  • a second link is used for connections to the circuit 12b, in the same way.
  • the remaining two links of each of the interface transputers are available for connection to devices external to the device 10.
  • Connections between the transputers T, to T,g and the control transputer can be provided by the Benes networks 12c and 12d.
  • Each of these networks includes an extra column of switches, as described above in relation to the circuit 12a, but, only two internal connections are broken to provide connections to transputer links.
  • ' two links of the control transp r uter T_m,e__m are connected into the circuit 12c, and can be connected to a South link of any of the transputers T, to T,,.
  • Another two links of the control transp r uter T are connected into the circuit 12d,' for connection to the transputers T, to T,g through their West links.
  • switch network 12 so far described places some restrictions on the connections which can be made.
  • a link of a transputer from the group 22 can only be connected to the link with the same designation (North, South, East or West) of another member of the group 22.
  • this restriction will be acceptable for many applications.
  • the replication of the switch circuits 12a, 12b, 12c, 12d which this restriction makes possible provides practical advantages of ease of manufacture, which can be offset against the restriction.
  • the four circuits 12a, 12b, 12c, 12d can be manufactured as identical, single-chip devices each having forty connections, namely, 16 inputs, 16 outputs and 8 connections between switches 34, 35 for connection to control or interface transputers.
  • control transp r uter T must communicate directly with the interface transputers Tx, Ty,
  • a simple logic circuit is shown for use as a switch in the Benes networks 12a, 12b, 12c, 12d when they are manufactured in semiconductor technology.
  • the switch circuit has two data inputs INO and INI, a control input PASS and two outputs OUTO and 0UT1.
  • gates 36, 37 which are AND NOR gates, that is, composite gates each consisting of a 2-input NOR gate fed by the outputs of two 2-input AND gates.
  • the gate 36 receives INO and PASS as the inputs to one of its component AND gates, and INI and PASS (provided by an inverter 38) as the inputs to its other AND gate.
  • the gate 36 provides OUTO.
  • the gate 37 receives INO and PASS at one AND gate and INI and PASS at the other AND gate, and provides OUT1.
  • the outputs are inverted so that distortions in the shape and timing of signals being transmitted are compensated for.
  • the slew time for a real circuit is usually different for rising and falling signals.
  • a rising input and a rising output applied simultaneously to one of the switch networks 12a, 12b, 12c, 12d would not arrive together at the network outputs.
  • the inversion provided in each switch by the circuit of Fig 4 ensures that delays caused by slew rates are substantially independent of the input signal and of the route taken through the circuit 12a, 12b, 12c, 12d.
  • OCCAM instructions as has already been described, defines an OCCAM process which is itself formed by a number of less complex, inter ⁇ communicating OCCAM processes. Each of these less complex processes may be formed by even simpler OCCAM processes, and the depth of this hierarchy of complexity is arbitrary, depending on the complexity of the instruction received by the control transputer T .
  • control transputer Upon receipt of an instruction, the control transputer breaks the instruction down into component OCCAM processes, which are then allocated to respective transputers T ⁇ to T ⁇ g. The control transputer T then configures the switch network 12 so that instructions defining the component processes can be sent to the transputers T, to T,,-.
  • control transp r uter Tmem sets the state of the switch network 12 so that the network 12 and the transputers T, to T,g form an algorithmic network, and the necessary connections are made to the interface and control transputers T ,
  • the necessary J connections can be determined from the originally received OCCAM instruction, which defines the necessary data movements between the composite processes.
  • the device 10 can begin processing data, with the transputers T, to T,g operating in parallel.
  • OCCAM instructions means that a process to be performed by one of the transputers T, to T,, may itself be a composite of simpler processes, and the transputer will have internal means for determining how to effect performance of these processes, by alternating between them.
  • Each switch circuit 12a, 12b, 12c, 12d contains 64 switches.
  • the state of each switch can be set by one bit, and so the state of one of the circuits 12a, 12b, 12c, 12d can be written as eight bytes.
  • control transputer sends appropriate PASS signals to the switches in the circuit.
  • the control transputer Tmem may also be in control of peripherals, such as a screen, a keyboard and a floppy disc controller.
  • Figs 5 and 6 indicate how control of these and the switch circuits 12a, 12b, 12c, 12d is effected. 64 words of the memory 16, 18 associated with the control transputer are reserved for the peripherals.
  • the control transputer T mem applies 4 byte address/data words to a bus 39.
  • the top 3 address bytes are used by a peripheral address decoder 40 to determine when the memory reserved for peripherals is active.
  • the remaining, lowest order address byte is supplied to the switch circuits 12a, 12b, 12c, 12d.
  • the lowest order address byte is compared with a hard-wired address 41 by a decoder 42 to determine whether data to follow is intended for that circuit. If so, the output of the decoder 42 is applied to gates 44, 46 to allow control signals STROBE and ALE to operate an eight stage, eight bit shift register forming the control circuit 32. This receives and stores 8 bytes, each of which determines the state of one column of switches in the corresponding Benes network. 64 outputs from the circuit 32 go to respective switches, to provide the PASS inputs.
  • FIG. 7 An alternative embodiment of the data processing device is shown in Fig 7.
  • the device 100 has only two switch circuits 102, 104.
  • Switch circuit 102 makes connections to provide full duplex links between the East and West links of the sixteen transputers TI ... T,i, corresponding to the transputers T, to T of Figs 1 etc.
  • the second switch circuit 104 provides full duplex links between the North and South links of the transputers TJ.
  • connection arrangement shown in Fig 7 has the property of "universality"; that is, that the sixteen computers can be connected to form any theoretically possible network of sixteen nodes and four connections to each node.
  • each transputer T has a North link N connected to a switch circuit 106 and a South link S connected to a switch circuit 108.
  • Each switch circuit 106, 108 can connect pairs of its inputs together in any combination.
  • FIG. 1 There is only one topologically distinct connected network for a given number of two link transputers. That network has the topology of a simple ring. Thus the general (possibly disconnected) network of a given number of two link transputers has the topology of a set of disconnected rings. Rings of various sizes can be formed by the circuits 106 and 108 of Fig 8, which is equivalent to part of the circuit of Fig 2. Here, each switch circuit 106 or 108 is capable of pairing the links connected to it in any combination.
  • Fig 9 shows the connections made by the circuits 106, 108 but not the circuits themselves. In Fig 9, two rings of two transputers and one ring or four transputers are shown.
  • Fig 10 shows an arrangement of transputers T using a single switch circuit 109.
  • the North links N of the transputers arrive at eight terminals 110 of the circuit 109.
  • the eight links N can be connected by the circuit 109 in any permutation to eight further terminals 112 of the circuit 109.
  • the terminals 112 are connected to the respective South links S. Consequently, the North link N of any transputer T can be connected to the South links S of any other transputer T.
  • North links cannot be connected to North Links, and South links cannot be connected to South Links.
  • a possible set of connections is shown in Fig 11.
  • Fig.ll shows that rings of odd and even numbers of transputers can be formed by the arrangement of
  • Fig 10 In this sense, the arrangement of Fig 10 is universal for transputers with two links, whereas the arrangement of Fig 8 is not, because some networks cannot be formed.
  • the arrangement ofFig 10 permits each individual transputer to be placed in any position in any ring; this arrangement permits the transputers to be labelled, and each labelled transputer to be placed at any specified location in the network.
  • the links cannot be labelled in this sense; the arrangement of Fig 10 does not permit an arbitary choice of the individual links used to connect a pair of transputers in the network. One cannot, for example, insist that a pair of transputers be connected by their two North Links.
  • the "universality" of the arrangement if Fig 10 can be utilised to provide a universal arrangement for connecting transputers with four (or indeed any higher power of two) links by considering various theorems of Topology concerning Eulerian cycles.
  • a cycle is a closed path along links which visits transputers in turn, arriving, along one link and departing along another.
  • An Eulerian cycle traverses each link exactly once; thus in the case of four-link transputers it visits each transputer exactly twice.
  • any network of connection between the transputers T, etc, and which makes connections to all four links of each transputer can be reduced to two derived (possibly disconnected) networks having the same number of transputers and having two connections to each transputer.
  • Each derived network can be created by an arrangement like that of Fig 10, which is universal. Consequently, the links of a group of transputers each having four links can be joined In any theoretically possible way which uses all of the links, if the corresponding pairs of links from each transputer are connected to respective switch circuits having the properties of the circuit 109.
  • the network can be scaled up to create universal switching networks for transputers which each have a number of links equal to eight, sixteen, or any power of two.
  • Fig 12 shows schematically a device 120 comprising sixteen transputers T connected as shown in Fig 7 by two switch circuits 122, 124, and in which provision for external communication is made.
  • Numerals next to connections indicate numbers of duplex links carried by that connection.
  • the numbers of transputers and of external connections are examples only; it is however, in general preferable that the number of external connections from each side of the circuits 122, 124 be approximately equal to half the total numbers of transputers.
  • the switch circuits 122, 124 are extended to provide for connections 126 to other transputers t and connections 128 direct to corresponding switch circuits in other, similar devices.
  • the transputers t correspond to the transputers T ⁇ , T , T z , T fc of Fig 1. Connections through the switch circuits 122, 124 are also made to a control transputer C having associated bulk memory and corresponding to the transputer T mem of Fig 1.
  • the additional connections can be provided in the circuits 122, 124 by providing additional terminals to the circuits, the circuits being able to connect the links in any permutation to the links on the other side.
  • These switch circuits could be pairs of Benes or cross-bar networks. The switching networks would preferably be controlled by a further transputer not shown in Fig 12.
  • the transputers t have two links connected to the circuit 122 or 124, for connections within the device 120, and two links available for external connection. Alternatively, all four links could be connected to the circuit 122 or 124, with external connections being available only over the connections 128, but controlled by the transputers t.
  • the control transputer C receives an instruction for the device.
  • the instruction is broken down by the transputer C into instructions for the transputers T, and the necessary network of connections between the transputers T is determined.
  • the network itself is then broken down as described above to find derived networks determining the necessary settings of the circuits 122 and 124.
  • Instructions to the transputers T and, if appropriate, the transputers t are then sent through the circuits 122, 124. Finally, the circuits 122, 124 are instructed to construct the derived networks. Execution of the instruction can then begin.
  • Means by which the control transputer C controls the switch circuits 122, 124 are not shown, but may be similar to those described above in relation to the first embodiment.
  • circuit of Fig 10 may be replaced by the simpler circuit of Fig 13, in which the switching is provided by replacing the switch 109 with a set of simple bidirectional two way interchange switches 130.
  • the resultant circuit is still universal, but does not permit the transputers to be "labelled" in the manner described above; the order of two-link transputers In the ring is fixed.
  • One circuit 130 is provided corresponding to each transputer T.
  • the transputers T form a notional ring.
  • Each circuit 130 has four terminals labelled a, b, c and d.
  • Terminal a is connected to the South link of the associated transputer and terminal c to the North link of the next Transputer in the notional ring.
  • Each terminal b is connected to the terminal d of the next switch in the notional ring.
  • Each circuit 130 has two settings, it may either connect terminal a to c and b to d or may connect a to d and b to c, preferably under the control of a controlling transputer.
  • Fig 13 may be further simplified to that of Fig 14, which has several circuits 130 absent as compared with Fig 13, but is otherwise identical. Similar simplifications can be achieved in networks using different numbers of transputers, by considering the possible partitioning of the set of transputers into rings.
  • the circuits of Fig 13 and 14 may be used to replace the switch 102 or 104 (but not both) in Fig 10 without loss of universality, but with loss of labelling. If, however, the switch circuit of Fig 13 is used, it remains possible to place any single transputer at an arbitary chosen point in the network.
  • a number of devices according to any embodiment described above can be combined to form a larger device, by using a network of connections between the devices.
  • a preferred network 152 is shown in Fig 15. It comprises 16 devices 150 connected to form two cubes, one inside the other. Each device 150 is at the vertex of one of the cubes and is connected by connections 154 to three other devices along cube edges, and to the device 150 at the corresponding position on the other cube.
  • the use here of geometrical terms such as "cube", "edge” etc is figurative.
  • the geometry of the larger device can be varied, without changing the topology of the connections.

Abstract

The device (10) comprises a large number of transputers T1 to T16 (only T1 and T16 are shown), Tmem, Tx, Ty, Tz, Tt. These are divided into a set of working transputers T1 to T16, and a set of interface transputers Tx, Ty, Tz, Tt providing input/output facilities for the device, both sets being under the control of a transputer Tmem. The transputer Tmem receives instructions for the device and breaks them down into programs for parallel processing by the transputers T1 to T16. These transputers will normally need to communicate, and the necessary connections are provided by a switch network (12), under the control of the transputer Tmem. The programs are so allocated to the transputers T1 to T16 and the switch network (12) is so arranged that direct connections are provided between any transputers which must communicate for the execution of their respective programs. Other connection arrangements are described, including a universal circuit capable of connecting the transputers T1 to T16 to form any theoretically possible network.

Description

A DATA PROCESSING DEVICE
The present invention relates to data processing devices comprising a plurality of computers.
In the field of data processing devices, great attention has recently been concentrated on multiple computer networks, which have the capability of performing parallel processing. During parallel processing, each of the computers in the network is acting to produce a solution to part of a problem to be solved by the network, and the partial solutions produced by the computers are combined to produce the solution to the whole problem. Parallel processing devices can act more quickly than a single computer executing a single sequence of steps, because of the overlapping in time of the necessary operations. At some stage during processing, one computer may require a result produced by another computer in order to complete its operations, and accordingly, provision is made for the computers to communicate with one another.
Known parallel processing devices can be broadly classified into three types, according to what provision is made for communication between computers. Some devices use a bus to which all computers are connected. Others provide a fixed network of connections between computers, often between each device and its nearest neighbours. Communication between unconnected computers, where necessary, is performed by passing a message along a line of connected computers until the message reaches its destination. Finally, other devices provide a memory common to all the computers, so that messages may be sent by storing them in the memory, for retrieval by another computer. In a common arrangement, the memory is partitioned into blocks and a switch network is provided for connecting any block of memory to any of the computers.
In each of these types of device, the provision for data movement between computers can present problems which limit the processing speed attainable by the device. In the first type, the bus width determines how long a single message takes to be transmitted, and so determines how long another computer may have to wait before it can send a message. In the second type of device, a large number of connections are used for messages between distant, unconnected computers and transmission time can become excessive unless the bandwith of the connections is large. In the third type, the switch network presents a bottleneck to data flow in the device, unless the bandwith of connections to the memory is exceptionally wide.
It is an object of the present invention to provide an improved data; processing device in which data flow between its components is minimised and more efficient, and does not seriously retard processing.
According to the present invention there is provided a data processing device comprising a plurality of computers, a switch network for effecting connections between the computers, and control means which, in use, receives an instruction defining an algorithm to be executed by the device, translates the algorithm into sub- algorithms for execution in parallel by respective computers, instructs the computers to execute the sub-algorithms, and controls the switch network to provide direct connections between any computers which must communicate for the execution of their respective sub-algorithms.
Thus, the control means assigns tasks to the computers and links them together so that the connections accurately reflect the data flow necessary for the solution of the problem. A network in which this is the case is called an "algorithmic network" in this specification. In an algorithmic network, connections only exist where they are needed and so data movements are efficient. Accordingly, connections can be narrow, for instance bit-serial links, without seriously reducing the processing speed of the device.
A device according to the invention could be used as part of a larger system. For instance, each of the computers of the known parallel processing devices described above could be replaced by a group of the same type of computers forming a device according to the invention. The processing power of the known device would then be significantly increased.
Furthermore, a number of devices according to the invention could be used as the computers in a larger device which in itself is a device according to the invention. Thus, devices according to the invention can be thought of not only as independent data processing devices, but as building blocks for larger systems, and these larger systems can themselves be used as building blocks for still larger systems.
Preferably, each computer is a transputer, so that the device can be compact. "Transputer" is a term recognised in the art, and used here, to mean a self-contained device having a processor, memory and input and output interface facilities. Modern transputers are single chip devices such as the device sold by the INMOS Corporation under the device number IMS T424. Preferably the instruction received by the device defines an OCCAM process and the control means translates the OCCAM process into component OCCAM processes for execution by respective computers. The IMS T424 transputer is particularly intended to operate under the OCCAM programming language. Full details of the device and of the language are available from the INMOS Corporation, Colorado Springs, USA, or from INMOS Ltd, Whitefriars, Lewins Mead, Bristol BS1 2NP, England. OCCAM is a trademark of the INMOS Group of companies. Briefly, OCCAM treats an operation as being made up of "processes" which involve a sequence of actions on data and which use data from and provide data for other processes. The means by which two processes communicate is referred to as a "channel". In the formalism of OCCAM, a number of processes may together form a process, in that the group of processes also involves a sequence of actions and also requires input and provides output. Equally, a process may be thought of as being formed of sub-processes, each being a process within the formal definition.
This formalism permits the processes to proceed concurrently, although a process which wishes to communicate with another process may have to wait until the other process has reached an appropriate stage.
The formalism of OCCAM is described in "OCCAM - an overview", "Microprocessors and microsystems", Vol 8, No 2, March 1984 (published by Butterworth & Co (Publishers) Ltd).
Preferred features of the invention in its first aspect are set out below in claims dependent on claim 1.
In a second aspect, the invention provides a data processing device comprising a plurality of computers operable in parallel, means for providing communication between the computers, and control means controlling the computers and the communication means, each computer being a device according to the first aspect of the invention. Thus, devices according to the first aspect have use both as independent processing devices and as components for the construction of larger devices.
Embodiments of the invention will now be described by way of example and with reference to the accompanying drawings in which:
Fig. 1 is a schematic diagram of a first embodiment of a device according to the first aspect of the invention;
Fig. 2 is a more detailed schematic diagram of the device of Fig. 1;
Fig. 3 is^a diagram of the switch network 12a of Fig. 2;
Fig. 4 is a circuit diagram of one switch of the network of Fig. 3;
Figs. 5 and 6 are block diagrams showing how the control transputer controls the switch network of the device and peripherals;
Fig 7 is a schematic diagram of a second embodiment;
Figs 8 to 11 show circuits helpful for understanding the operation of the embodiment of Fig.7;
Fig 12 shows the embodiment of Fig 7, extended to allow external communication to the device;
Fig 13 shows an alternative switching arrangement for the embodiment of fig 7; and
Fig 14 shows a simplified version of the arrangement of Fig 13; and
Fig 15 shows a preferred network for connecting devices to form a larger device.
Fig. 1 shows a data processing device 10 comprising a plurality of transputers T-, T2 ... T,g, only two of which are indicated, labelled T, and ,g. The device 10 also comprises a switch network 12 for effecting connections between transputers, and control means 14. The control means 14 comprises a transputer Tmem, which, in use, receives an instruction defining an algorithm to be executed by the device 10 and translates the algorithm into sub-algorithms. Sub- algorithms are algorithms which, when executed in combination, produce results equivalent to the results of execution of the main algorithm. The sub-algorithms are for execution in parallel by respective transputers T, etc. The control means 14 programs the transputers T, etc. to execute the sub-algorithms, and controls the switch network 12 to provide direct connections between any transputers which must communicate for the execution of their respective sub-algorithms.
Four transputers T , T , T and t are also connected to the switch network, and provide interfacing between the device and external circuits.
The device 10 is therefore divided into three distinct sets of transputers. Firstly, the transputer is responsible for all control functions within the device 10, including controlling the swit network 12 to provide connections within the device. The transp ruter T_m,„e„m also has associated bulk memory J whose use it controls. The bulk memory may comprise disc or RAM or any other type of storage. The device shown uses a disc store 16 with a capacity of lOOMbyte and a solid state store 18, preferably a RAM, with a capacity of 16Mbyte.
Although only a single control transputer T is shown, several co-operating transputers may be required in a device which is required to perform particularly complex tasks, or which consists of a large number of transputers.
The second set of transputers, the transputers T^ to Tig perform the data processing within the device. These transputers operate in parallel and are connected by the switch network, under the control of the transputer mem to form an algorithmic network.
The third set is. the interface transputers Tχ, T , Tz and Tt which each have an associated 64kbyte memory, M , M , M , Mfc so that interfacing including buffering is possible. Each transputer T , T„, T_, T.. provides two outputs, labelled 20 . 20__, 20_, 20,..
All of the transputers used in the preferred embodiment are INMOS T424 transputer devices. Each transputer has four bit- serial, duplex input/output ports known as "links". For simplicity, the four links of each device are designated North, South, East and West, respectively. In order that two devices can communicate, two connections are necessary, one for data and one for acknowledgements. The simplicity of the necessary connections makes practicable a switch network which can provide the wide variety of connections necessary to Implement an algorithmic network for a useful range of algorithms.
In Fig. 1, a numeral adjacent a connection indicates the number of duplex channels provided by the connection. In Fig. 2, similar numerals indicate the number of single-bit connections (single wires) provided.
Turning to Fig. 2, the switch network 12 is shown as four distinct switch circuits 12a, 12b, 12c and 12d. The outputs from the North links of the sixteen transputers T,, etc. (shown as a group 22 in Fig. 2) are applied as inputs to the switch circuit 12a, which provides outputs to the inputs of the North links. Similarly, the switch circuits 12b, 12c and 12d are connected between the East, South and West link inputs and outputs respectively.
Fig. 3 shows the switch circuit 12a in more detail. Sixteen inputs 24 are applied in pairs to a first column of 2-way switching circuits 26. The switching circuits 26 have two outputs to which the inputs may be passed in either permutation. The outputs of each switch 26 are connected to respective inputs of a second column of identical switches 28 whose outputs are passed through further columns of identical switches until the final output of the switch circuit is provided from the righthandmost column of switches 30.
The state of each of the switches is controlled by a control circuit 32. The design of the switch circuit is based on that of a Benes network. A Benes network has 2n inputs and outputs (here n = 4) and has the property that it can connect the inputs in any permutation to respective outputs. Therefore, the circuit 12a can connect the North link of any transputer in the group 22 to the North link of any other transputer. The circuits 12b, 12c, 12d provide the same possibilities for connection for the East, South and West links, respectively.
A Benes network and an algorithm for determining the necessary switch states are described in the article "Parallel Algorithms to set up the Benes permutation network", IEEE Transactions on Computers, February 1982. The control circuit 32 is instructed by the control transputer Tmem as to the req^uired states of the switches, and the circuit converts this instruction into instructions for each switch.
There is a symmetry in the connection requirements, for the following reason. In order to implement a full, duplex link between two transputers T, to T,^, two single bit, bit-serial connections must be made, one for data, and one, in the opposite direction, for acknowledgements. Thus, the transputers are paired, each transputer in a pair having the output line of one of its links connected to the input line of the same link of the other transputer in the pair.
Some provision must bε made for connecting the transputers of the group 22 to the interface transputers T A, T y, TZ, T,L. and to the control transputer T__em, or to other auxiliary apparatus. This could be done by making connections to inputs and outputs of the switch circuits 12a, 12b, 12c, 12d. However, the size and cost of a switch network of this type increases rapidly with an increase in the number of inputs, the cost varying approximately as the square of the number of inputs. Moreover, the number of inputs and outputs can only be increased by factors of two. This embodiment seeks to maximise the processing power of the device 10 by using all of the inputs and outputs of the switch networks 12a, 12b, 12c, 12d for transputers Ti etc, and to accommodate the remaining transputers as shown in Fig 3.
An additional column of switches 34 is incorporated in the Benes network. In a normal Benes network, the upper and lower inputs of the switches 34 would be directly connected to the upper and lower outputs respectively. One link of each of the interface transputers T , T , T , Tfc is connected between one input of a respective switch 34 and one output of the corresponding switch 35 in the neighbouring column of switches. Thus, a full link can be provided between a transputer T, to T,,. and a transputer T , T , T or Tt by setting the circuit 12a to connect the link of the transputer T, etc. to itself, by way of a path which incorporates the connection between the appropriate pair of switches 34, 35. A data path for output is provided between an input of the circuit 12a and the interface transputer, and an acknowledge path is provided from the interface transputer to an output of the circuit 12a.
During data Input to the device 10, the path from the input 24 to the interface transputer Is the acknowledge path and the path from the interface transputer to the output 30 is the data path.
The other output of the switches 35 is directly connected to the other input of the corresponding switch 34. This and other direct connections between the columns of switches 34, 35 enable connections to be made which do not involve the transputers Tx, Ty,
V
The connections to the circuit 12a use one link of each of the interface transputers. A second link is used for connections to the circuit 12b, in the same way. The remaining two links of each of the interface transputers are available for connection to devices external to the device 10.
Connections between the transputers T, to T,g and the control transputer can be provided by the Benes networks 12c and 12d. Each of these networks includes an extra column of switches, as described above in relation to the circuit 12a, but, only two internal connections are broken to provide connections to transputer links. Thus, ' two links of the control transp ruter T_m,e__m, are connected into the circuit 12c, and can be connected to a South link of any of the transputers T, to T,,. Another two links of the control transp ruter T„m„e_m. are connected into the circuit 12d,' for connection to the transputers T, to T,g through their West links. The design of switch network 12 so far described places some restrictions on the connections which can be made. A link of a transputer from the group 22 can only be connected to the link with the same designation (North, South, East or West) of another member of the group 22. However, since four full links are always available between any pair of transputers in the group 22, this restriction will be acceptable for many applications. Furthermore, the replication of the switch circuits 12a, 12b, 12c, 12d which this restriction makes possible, provides practical advantages of ease of manufacture, which can be offset against the restriction. The four circuits 12a, 12b, 12c, 12d can be manufactured as identical, single-chip devices each having forty connections, namely, 16 inputs, 16 outputs and 8 connections between switches 34, 35 for connection to control or interface transputers.
A further restriction is that the control transpruter T„mΛe_m_ cannot communicate directly with the interface transputers Tx, Ty,
Tz, t, although data can be passed through a transputer of the group 22. This is not a serious handicap because if wide bandwidth communication is required for speed of data transfer, two links (South and West) of a transputer T, to T,g could be connected to the control transp ruter T„me__m_ at the same time. Similarly J,' both the North and East links of the transputers T, to T^g can be connected simultaneously to the same interface transputer T , T , T , Tfc.
Turning to Fig 4, a simple logic circuit is shown for use as a switch in the Benes networks 12a, 12b, 12c, 12d when they are manufactured in semiconductor technology.
The switch circuit has two data inputs INO and INI, a control input PASS and two outputs OUTO and 0UT1.
The outputs are taken from gates 36, 37 which are AND NOR gates, that is, composite gates each consisting of a 2-input NOR gate fed by the outputs of two 2-input AND gates.
The gate 36 receives INO and PASS as the inputs to one of its component AND gates, and INI and PASS (provided by an inverter 38) as the inputs to its other AND gate. The gate 36 provides OUTO.
The gate 37 receives INO and PASS at one AND gate and INI and PASS at the other AND gate, and provides OUT1.
When PASS = 1, INO is passed, inverted, to OUTO, and INI is passed, inverted, to OUTl. When PASS = 0, INO is passed, inverted, to OUTl and INI is passed, inverted to OUTO.
The outputs are inverted so that distortions in the shape and timing of signals being transmitted are compensated for. The slew time for a real circuit is usually different for rising and falling signals. Thus, without the use of inversion, a rising input and a rising output applied simultaneously to one of the switch networks 12a, 12b, 12c, 12d would not arrive together at the network outputs. The inversion provided in each switch by the circuit of Fig 4 ensures that delays caused by slew rates are substantially independent of the input signal and of the route taken through the circuit 12a, 12b, 12c, 12d. During operation of the device 10, control of the components is effected in the following way. The control transputer Tmem receives instructions for the device. In the embodiment described above, these are expressed in the OCCAM language described above, a language to which the INMOS T424 device is particularly suited. The transpruter T„m„e_m, must determine from the instructions how the device should be configured to implement the instructions. The OCCAM instruction, as has already been described, defines an OCCAM process which is itself formed by a number of less complex, inter¬ communicating OCCAM processes. Each of these less complex processes may be formed by even simpler OCCAM processes, and the depth of this hierarchy of complexity is arbitrary, depending on the complexity of the instruction received by the control transputer T .
Upon receipt of an instruction, the control transputer breaks the instruction down into component OCCAM processes, which are then allocated to respective transputers T^ to T^g. The control transputer T then configures the switch network 12 so that instructions defining the component processes can be sent to the transputers T, to T,,-.
Once this has been done, ' the control transpruter Tmem sets the state of the switch network 12 so that the network 12 and the transputers T, to T,g form an algorithmic network, and the necessary connections are made to the interface and control transputers T ,
T.y_,' Tz_,' T_t,' T„me__m_. The necessary J connections can be determined from the originally received OCCAM instruction, which defines the necessary data movements between the composite processes.
When the processes have been allocated, and the switch network 12 set, the device 10 can begin processing data, with the transputers T, to T,g operating in parallel.
The hierarchical nature of OCCAM instructions means that a process to be performed by one of the transputers T, to T,, may itself be a composite of simpler processes, and the transputer will have internal means for determining how to effect performance of these processes, by alternating between them.
In some circumstances, it may be necessary for the connections made by the switch network 12 to be changed during execution of an operation. Each switch circuit 12a, 12b, 12c, 12d contains 64 switches.
The state of each switch can be set by one bit, and so the state of one of the circuits 12a, 12b, 12c, 12d can be written as eight bytes.
These bytes are sent by the control transputer to the control circuit 32 of each switch circuit 12a, 12b, 12c, 12d. The control circuits 32 send appropriate PASS signals to the switches in the circuit.
The control transputer Tmem may also be in control of peripherals, such as a screen, a keyboard and a floppy disc controller. Figs 5 and 6 indicate how control of these and the switch circuits 12a, 12b, 12c, 12d is effected. 64 words of the memory 16, 18 associated with the control transputer are reserved for the peripherals.
The control transputer Tmem applies 4 byte address/data words to a bus 39. The top 3 address bytes are used by a peripheral address decoder 40 to determine when the memory reserved for peripherals is active.
The remaining, lowest order address byte is supplied to the switch circuits 12a, 12b, 12c, 12d. In each switch circuit 12a, 12b, 12c, 12d, within the circuit 32, the lowest order address byte is compared with a hard-wired address 41 by a decoder 42 to determine whether data to follow is intended for that circuit. If so, the output of the decoder 42 is applied to gates 44, 46 to allow control signals STROBE and ALE to operate an eight stage, eight bit shift register forming the control circuit 32. This receives and stores 8 bytes, each of which determines the state of one column of switches in the corresponding Benes network. 64 outputs from the circuit 32 go to respective switches, to provide the PASS inputs.
An alternative embodiment of the data processing device is shown in Fig 7. The device 100 has only two switch circuits 102, 104. Switch circuit 102 makes connections to provide full duplex links between the East and West links of the sixteen transputers TI ... T,i, corresponding to the transputers T, to T of Figs 1 etc.
The second switch circuit 104 provides full duplex links between the North and South links of the transputers TJ.
The restrictions on connections described above in relation to the first embodiment are overcome by the connection arrangement shown in Fig 7. Indeed, it can be shown, as will be outlined below, that the arrangement of Fig 7 has the property of "universality"; that is, that the sixteen computers can be connected to form any theoretically possible network of sixteen nodes and four connections to each node.
Universality can be explained by first considering the simple case of eight transputers, each having two links, called North and South. Turning to Fig 8, each transputer T has a North link N connected to a switch circuit 106 and a South link S connected to a switch circuit 108. Each switch circuit 106, 108 can connect pairs of its inputs together in any combination.
There is only one topologically distinct connected network for a given number of two link transputers. That network has the topology of a simple ring. Thus the general (possibly disconnected) network of a given number of two link transputers has the topology of a set of disconnected rings. Rings of various sizes can be formed by the circuits 106 and 108 of Fig 8, which is equivalent to part of the circuit of Fig 2. Here, each switch circuit 106 or 108 is capable of pairing the links connected to it in any combination. One possibility for connecting the eight transputers T is shown in Fig 9, which shows the connections made by the circuits 106, 108 but not the circuits themselves. In Fig 9, two rings of two transputers and one ring or four transputers are shown.
Consideration of the connections available, taking into account that North links cannot be connected to South links, shows that rings of any even number of transputers can be formed, but not rings of odd numbers. Thus, the arrangement of Fig.2 cannot generate all networks of four-link transputers.
Fig 10 shows an arrangement of transputers T using a single switch circuit 109. The North links N of the transputers arrive at eight terminals 110 of the circuit 109. The eight links N can be connected by the circuit 109 in any permutation to eight further terminals 112 of the circuit 109. The terminals 112 are connected to the respective South links S. Consequently, the North link N of any transputer T can be connected to the South links S of any other transputer T. North links cannot be connected to North Links, and South links cannot be connected to South Links. A possible set of connections is shown in Fig 11. Fig.ll shows that rings of odd and even numbers of transputers can be formed by the arrangement of
Fig 10. In this sense, the arrangement of Fig 10 is universal for transputers with two links, whereas the arrangement of Fig 8 is not, because some networks cannot be formed. The arrangement ofFig 10 permits each individual transputer to be placed in any position in any ring; this arrangement permits the transputers to be labelled, and each labelled transputer to be placed at any specified location in the network. The links cannot be labelled in this sense; the arrangement of Fig 10 does not permit an arbitary choice of the individual links used to connect a pair of transputers in the network. One cannot, for example, insist that a pair of transputers be connected by their two North Links.
The "universality" of the arrangement if Fig 10 can be utilised to provide a universal arrangement for connecting transputers with four (or indeed any higher power of two) links by considering various theorems of Topology concerning Eulerian cycles. A cycle is a closed path along links which visits transputers in turn, arriving, along one link and departing along another. An Eulerian cycle traverses each link exactly once; thus in the case of four-link transputers it visits each transputer exactly twice.
For transputers with an even number of links at each node, it is known that all connected networks possess Eulerian cycles. Consequently, any network of connections between the transputers T^ etc as described above, and which makes connections between all four links of all transputers will have Eulerian cycles.
It is also a known property of Eulerian cycles that simpler cycles can be derived from them, each having fewer connections to each transputer. One may proceed around the Eulerian cycle assigning alternate links to each other of two sets of cycles. For transputers with four links, each of the two resultant sets of cycles will contain every transputer exactly once, and each set of cycles will consist of a set of rings of the type discussed above for two- link transputers.
It may thus be seen that any network of connection between the transputers T, etc, and which makes connections to all four links of each transputer, can be reduced to two derived (possibly disconnected) networks having the same number of transputers and having two connections to each transputer. Each derived network can be created by an arrangement like that of Fig 10, which is universal. Consequently, the links of a group of transputers each having four links can be joined In any theoretically possible way which uses all of the links, if the corresponding pairs of links from each transputer are connected to respective switch circuits having the properties of the circuit 109. This includes networks with multiple links joining a single pair of transputers. On this basis it will be apparent that the network can be scaled up to create universal switching networks for transputers which each have a number of links equal to eight, sixteen, or any power of two.
Turning again to Fig 7, it can be seen that this principle is embodied. The North links of the transputers Tj can be connected r
In any permutation to the South links of the transputers Ti etc by the circuit 102 to form the first set of derived cycles. The East links can be joined in any permutation to the West links by the' circuit 104 to form the second set of derived cycles. Consequently, by virtue of the topology theorems referred to above, the arrangement of Fig 7 is universal. Any theoretically possible connected network of connections between all four links of all the transputers can be provided by appropriately setting the circuits 102, 104.
Fig 12 shows schematically a device 120 comprising sixteen transputers T connected as shown in Fig 7 by two switch circuits 122, 124, and in which provision for external communication is made. Numerals next to connections indicate numbers of duplex links carried by that connection. The numbers of transputers and of external connections are examples only; it is however, in general preferable that the number of external connections from each side of the circuits 122, 124 be approximately equal to half the total numbers of transputers. The switch circuits 122, 124 are extended to provide for connections 126 to other transputers t and connections 128 direct to corresponding switch circuits in other, similar devices. The transputers t correspond to the transputers Tχ, T , Tz, Tfc of Fig 1. Connections through the switch circuits 122, 124 are also made to a control transputer C having associated bulk memory and corresponding to the transputer Tmem of Fig 1.
The additional connections can be provided in the circuits 122, 124 by providing additional terminals to the circuits, the circuits being able to connect the links in any permutation to the links on the other side. These switch circuits could be pairs of Benes or cross-bar networks. The switching networks would preferably be controlled by a further transputer not shown in Fig 12.
In Fig 12, the transputers t have two links connected to the circuit 122 or 124, for connections within the device 120, and two links available for external connection. Alternatively, all four links could be connected to the circuit 122 or 124, with external connections being available only over the connections 128, but controlled by the transputers t.
In use of the device, the control transputer C receives an instruction for the device. The instruction is broken down by the transputer C into instructions for the transputers T, and the necessary network of connections between the transputers T is determined. The network itself is then broken down as described above to find derived networks determining the necessary settings of the circuits 122 and 124. Instructions to the transputers T and, if appropriate, the transputers t are then sent through the circuits 122, 124. Finally, the circuits 122, 124 are instructed to construct the derived networks. Execution of the instruction can then begin. Means by which the control transputer C controls the switch circuits 122, 124 are not shown, but may be similar to those described above in relation to the first embodiment.
It has been found that the circuit of Fig 10 may be replaced by the simpler circuit of Fig 13, in which the switching is provided by replacing the switch 109 with a set of simple bidirectional two way interchange switches 130. The resultant circuit is still universal, but does not permit the transputers to be "labelled" in the manner described above; the order of two-link transputers In the ring is fixed.
One circuit 130 is provided corresponding to each transputer T. The transputers T form a notional ring. Each circuit 130 has four terminals labelled a, b, c and d. Terminal a is connected to the South link of the associated transputer and terminal c to the North link of the next Transputer in the notional ring. Each terminal b is connected to the terminal d of the next switch in the notional ring. Each circuit 130 has two settings, it may either connect terminal a to c and b to d or may connect a to d and b to c, preferably under the control of a controlling transputer.
The circuit of Fig 13 may be further simplified to that of Fig 14, which has several circuits 130 absent as compared with Fig 13, but is otherwise identical. Similar simplifications can be achieved in networks using different numbers of transputers, by considering the possible partitioning of the set of transputers into rings.
The circuits of Fig 13 and 14 may be used to replace the switch 102 or 104 (but not both) in Fig 10 without loss of universality, but with loss of labelling. If, however, the switch circuit of Fig 13 is used, it remains possible to place any single transputer at an arbitary chosen point in the network.
A number of devices according to any embodiment described above can be combined to form a larger device, by using a network of connections between the devices. A preferred network 152 is shown in Fig 15. It comprises 16 devices 150 connected to form two cubes, one inside the other. Each device 150 is at the vertex of one of the cubes and is connected by connections 154 to three other devices along cube edges, and to the device 150 at the corresponding position on the other cube. The use here of geometrical terms such as "cube", "edge" etc is figurative. The geometry of the larger device can be varied, without changing the topology of the connections.
Although several of the embodiments described above use modified Benes networks, many other types of switch network could be used, chosen according to the versatility which the network is required to have, and taking account of practical considerations such as manufacturing costs and the suitability of a particular circuit for implementation in a particular technology. A cross-bar switch network is a possible alternative to the Benes networks.

Claims

1. A data processing device comprising a plurality of computers, a switch network for effecting connections between the computers, and control means which, in use, receives an instruction defining an algorithm to be executed by the device, translates the algorithm into sub-algorithms for execution in parallel by respective computers, programs the computers to execute the sub-algorithms, and controls the switch network to provide direct connections between any computers which must communicate for the execution of their respective sub-algorithms.
2. A device according to claim 1, wherein each computer is a transputer.
3. A device according to claim 1 or 2, wherein the instruction defines an OCCAM process and wherein the control means translates the OCCAM process Into component OCCAM processes for execution by respective computers.
4. A device according to any preceding claim, wherein the switch network comprises a switch circuit having a first and a second plurality of connections for respective computer ports, and being operable under the control of the control means to connect the first plurality of connections in any permutation to the second plurality of connections.
5. A device according to claim 4, wherein at least some of the computers each have first and second data ports, all of the first ports being connected to respective connections of the said first plurality, and all of the second ports being connected to respective connections of the said second plurality.
6. A device according to claim 5, wherein at least some of the computers form a group each having a plurality of pairs of data ports, and wherein corresponding pairs of data ports from each computer of the group are connected to respective common switch circuits as aforesaid, the ports of each pair being connected, respectively, to one of the first plurality of connections and to one of the second plurality of connections.
7. A device according to claim 6, wherein each computer of the group has a number of ports for connection to other computers of the group, the said number being a power of two, and wherein the control means determines the network of connections required between the members of the group for the execution of the algorithm, derives from the said network a set of derived networks equivalent to the said network, each derived network requiring only two connections to any computer, and controls each switch circuit to connect the ports with the same topology as a respective derived network.
8. A device according to claim 4, wherein each computer has a plurality of ports for input and/or output, and the switch network comprises a like plurality of switch circuits for connecting corresponding ports of a computer to the corresponding port of any other computer.
9. A device according to any of claims 4 to 8, wherein the or each switch circuit comprises a Benes network.
10. A device according to claim 9, wherein the or at least one of the Benes networks is a modified Benes network which comprises an additional column of switches and wherein the auxiliary apparatus is connected into a connection between the additional column of switches and an adjacent column of switches in the Benes network, whereby computers may be connected to the interface means through the Benes network.
11. A device according to any preceding claim, further comprising interface means for providing input and output facilities for the device, and which may be connected to the transputers by means of the switch network.
12. A device according to claims 10 and 11, wherein the auxiliary apparatus comprises the interface means.
13. A device according to claim 11 or 12, wherein the interface means comprises a transputer.
14. A device according to any preceding claim, wherein the control means may be connected to the computers by means of a switch network.
15. A device according to claims 14 and 10, wherein the auxiliary apparatus comprises the interface means.
16. A device according to any of claims 1 to 3, wherein at least some of the computers each have first and second data ports and form a notional ring of connectable computers, the device further comprising a notional ring of switch means each having four terminals and being operable to connect the first and second terminals in either permutation to the third and fourth terminals, the first terminal being connected to a first port of an associated computer of the ring of computers, the second terminal being connected to the fourth terminal of the next switch means in the ring of switches, and the third terminal being connected to the second port of the
Figure imgf000021_0001
computer next in the ring of computers to the computer associated with that switch means.
17. A device according to claim 16, wherein a switch means as aforesaid is associated with every computer in the notional ring of computers.
18. A device according to any preceding claim, wherein each connection provided by the switch network is one bit wide.
19. A device according to any of claims 1 to 17, wherein each connection provided by the switch network is a duplex connection one bit wide in each direction.
20. A device according to any preceding claim, wherein the control means comprises bulk memory accessible to the computers by communication with the control means.
21. A device according to claim 20, wherein the memory of each computer used for the execution of sub-algorithms is used for program storage only.
22. A device according to any preceding claim, wherein the control means comprises a transputer.
23. A data processing device substantially as described above with reference to the accompanying drawings.
24. A data processing device comprising a plurality of computers operable in parallel, means for providing communication between the computers, and control means controlling the computers and the communication means, wherein each computer is a device according to any preceding claim.
25. A device according to claim 24, comprising sixteen devices according to any of claims 1 to 23, and wherein the communication means connects the sixteen devices by means of connections having the topology of two cubes, each of the sixteen devices being at a respective cube vertex and being connected to three neighbouring devices along cube edges and to the device at the corresponding vertex of the other cube.
PCT/GB1986/000514 1985-08-30 1986-08-29 A data processing device WO1987001485A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB8521672 1985-08-30
GB858521672A GB8521672D0 (en) 1985-08-30 1985-08-30 Data processing device

Publications (1)

Publication Number Publication Date
WO1987001485A1 true WO1987001485A1 (en) 1987-03-12

Family

ID=10584522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1986/000514 WO1987001485A1 (en) 1985-08-30 1986-08-29 A data processing device

Country Status (5)

Country Link
US (1) US5016163A (en)
EP (1) EP0271492A1 (en)
JP (1) JPS63501986A (en)
GB (1) GB8521672D0 (en)
WO (1) WO1987001485A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0304285A2 (en) * 1987-08-19 1989-02-22 Fujitsu Limited Network control system
GB2211638A (en) * 1987-10-27 1989-07-05 Ibm Simd array processor
GB2213620A (en) * 1985-11-13 1989-08-16 Sony Corp Data processing systems
GB2194085B (en) * 1986-07-24 1990-07-04 Gec Avionics Bus
EP0588021A2 (en) * 1992-09-17 1994-03-23 International Business Machines Corporation Switch-based personal computer interconnection apparatus
US5434972A (en) * 1991-01-11 1995-07-18 Gec-Marconi Limited Network for determining route through nodes by directing searching path signal arriving at one port of node to another port receiving free path signal

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5093920A (en) * 1987-06-25 1992-03-03 At&T Bell Laboratories Programmable processing elements interconnected by a communication network including field operation unit for performing field operations
FR2655169B1 (en) * 1989-11-30 1994-07-08 Bull Sa PROCESSOR WITH MULTIPLE MICROPROGRAMMED PROCESSING UNITS.
US5218709A (en) * 1989-12-28 1993-06-08 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Special purpose parallel computer architecture for real-time control and simulation in robotic applications
US5185860A (en) * 1990-05-03 1993-02-09 Hewlett-Packard Company Automatic discovery of network elements
US5734921A (en) * 1990-11-13 1998-03-31 International Business Machines Corporation Advanced parallel array processor computer package
US5708836A (en) * 1990-11-13 1998-01-13 International Business Machines Corporation SIMD/MIMD inter-processor communication
US5625836A (en) * 1990-11-13 1997-04-29 International Business Machines Corporation SIMD/MIMD processing memory element (PME)
US5794059A (en) * 1990-11-13 1998-08-11 International Business Machines Corporation N-dimensional modified hypercube
US5765015A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Slide network for an array processor
US5765011A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams
US5815723A (en) * 1990-11-13 1998-09-29 International Business Machines Corporation Picket autonomy on a SIMD machine
US5590345A (en) * 1990-11-13 1996-12-31 International Business Machines Corporation Advanced parallel array processor(APAP)
US5828894A (en) * 1990-11-13 1998-10-27 International Business Machines Corporation Array processor having grouping of SIMD pickets
US5963746A (en) * 1990-11-13 1999-10-05 International Business Machines Corporation Fully distributed processing memory element
US5963745A (en) * 1990-11-13 1999-10-05 International Business Machines Corporation APAP I/O programmable router
US5630162A (en) * 1990-11-13 1997-05-13 International Business Machines Corporation Array processor dotted communication network based on H-DOTs
US5809292A (en) * 1990-11-13 1998-09-15 International Business Machines Corporation Floating point for simid array machine
US5588152A (en) * 1990-11-13 1996-12-24 International Business Machines Corporation Advanced parallel processor including advanced support hardware
US5617577A (en) * 1990-11-13 1997-04-01 International Business Machines Corporation Advanced parallel array processor I/O connection
EP0485690B1 (en) * 1990-11-13 1999-05-26 International Business Machines Corporation Parallel associative processor system
US5966528A (en) * 1990-11-13 1999-10-12 International Business Machines Corporation SIMD/MIMD array processor with vector processing
US5765012A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Controller for a SIMD/MIMD array having an instruction sequencer utilizing a canned routine library
US5774698A (en) * 1991-02-22 1998-06-30 International Business Machines Corporation Multi-media serial line switching adapter for parallel networks and heterogeneous and homologous computer system
US5594918A (en) * 1991-05-13 1997-01-14 International Business Machines Corporation Parallel computer system providing multi-ported intelligent memory
US5347639A (en) * 1991-07-15 1994-09-13 International Business Machines Corporation Self-parallelizing computer system and method
JP2642039B2 (en) * 1992-05-22 1997-08-20 インターナショナル・ビジネス・マシーンズ・コーポレイション Array processor
US5404537A (en) * 1992-09-17 1995-04-04 International Business Machines Corp. Priority interrupt switching apparatus for real time systems
US5355364A (en) * 1992-10-30 1994-10-11 International Business Machines Corporation Method of routing electronic messages
US5640504A (en) * 1994-01-24 1997-06-17 Advanced Computer Applications, Inc. Distributed computing network
US5606666A (en) * 1994-07-19 1997-02-25 International Business Machines Corporation Method and apparatus for distributing control messages between interconnected processing elements by mapping control messages of a shared memory addressable by the receiving processing element
US5699536A (en) * 1995-04-13 1997-12-16 International Business Machines Corporation Computer processing system employing dynamic instruction formatting
US6041400A (en) * 1998-10-26 2000-03-21 Sony Corporation Distributed extensible processing architecture for digital signal processing applications
US20030220960A1 (en) * 2002-05-21 2003-11-27 Demoff Jeff S. System and method for processing data over a distributed network
EP1820117A2 (en) 2004-11-11 2007-08-22 International Business Machines Corporation Concurrent flashing of processing units by means of network restructuring
US8131909B1 (en) 2007-09-19 2012-03-06 Agate Logic, Inc. System and method of signal processing engines with programmable logic fabric
US7970979B1 (en) * 2007-09-19 2011-06-28 Agate Logic, Inc. System and method of configurable bus-based dedicated connection circuits
ES2357923B1 (en) 2009-10-16 2012-03-12 Starlab Barcelona Sl DATA PROCESSING SYSTEM AND COMPUTER DEVICE.
EP3634018A1 (en) * 2018-10-02 2020-04-08 Siemens Aktiengesellschaft System for data communication in a network of local devices

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0111399A2 (en) * 1982-11-26 1984-06-20 Inmos Limited Microcomputer

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3979728A (en) * 1973-04-13 1976-09-07 International Computers Limited Array processors
US4365292A (en) * 1979-11-26 1982-12-21 Burroughs Corporation Array processor architecture connection network
US4412303A (en) * 1979-11-26 1983-10-25 Burroughs Corporation Array processor architecture
US4314349A (en) * 1979-12-31 1982-02-02 Goodyear Aerospace Corporation Processing element for parallel array processors
US4344134A (en) * 1980-06-30 1982-08-10 Burroughs Corporation Partitionable parallel processor
US4466061A (en) * 1982-06-08 1984-08-14 Burroughs Corporation Concurrent processing elements for using dependency free code
ZA838877B (en) * 1982-12-02 1984-09-26 Gen Electric Telecommunication exchanges
US4523273A (en) * 1982-12-23 1985-06-11 Purdue Research Foundation Extra stage cube
US4546428A (en) * 1983-03-08 1985-10-08 International Telephone & Telegraph Corporation Associative array with transversal horizontal multiplexers
GB8329509D0 (en) * 1983-11-04 1983-12-07 Inmos Ltd Computer
US4636948A (en) * 1985-01-30 1987-01-13 International Business Machines Corporation Method for controlling execution of application programs written in high level program language

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0111399A2 (en) * 1982-11-26 1984-06-20 Inmos Limited Microcomputer

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Computer Design, Vol. 24, No. 11, 1 September 1985 (Littleton, US) R. ASBURY et al.: "Concurrent Computers Ideal for Inherently Parallel Problems", pages 99-102,104,106,107 see figure on page 101 *
I.E.E.E. Electro, Vol. 8, 1983 (New York, US) M.J. KNUDSEN: "Musec, a Powerful Data-Flow Network of Signal Microprocessors", paper 6/5, pages 1-5, see the entire document *
IEE Proceedings, Vol. 131, PtF, No. 6, October 1984 (Old Woking, GB) R, TAYLOR: "Signal Processing with Occam and the Transputter", pages 610-614, see the entire document *
IEEE Transactions on Computers, Vol. C-31, No. 2, February 1982 (New York, US) D. NASSIMI et al.: "Parellel Algorithms to set up the Benes Permutation Network", pages 148-154, see page 148, left-hand column, line 1- right-hand column, line 5; figures 1,2 (cited in the application) *
IEEE Transactions on Computers, Vol. C-33, no. 4, April 1984 (New York, US) L.N. BHUYAN et al.: "Generalized Hypercube and Hyperbus Structures for a Compruter Network", pages 323-333 see the entire document *
Proceedings of the 1983 International Conference on Parallel Processing, August 23-26, 1983 (New York, US) TSUTOMU HOSHINO et al.: "Highly Parallel Processor Array "PAX" for Wide Scientific Applications", pages 95-105, see page 96, left-hand column, lines 1-24; page 96, right-hand column lines 8-17 *
The 5th International Conference on Distributed Computing Systems, May 13-17, 1985 (New York, US) M.LEE et al.: "Network Facility for a Reconfigurable Computer Architecture", pages 264-271, see page 265, left-hand column, lines 1-5; figure 2; page 270, left-hand column, line 15- page 271, left-hand column, line 13; figures 12-14 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2213620A (en) * 1985-11-13 1989-08-16 Sony Corp Data processing systems
GB2213620B (en) * 1985-11-13 1990-04-25 Sony Corp Data processing systems
GB2183067B (en) * 1985-11-13 1990-04-25 Sony Corp Data processing
GB2194085B (en) * 1986-07-24 1990-07-04 Gec Avionics Bus
US4992933A (en) * 1986-10-27 1991-02-12 International Business Machines Corporation SIMD array processor with global instruction control and reprogrammable instruction decoders
EP0304285A2 (en) * 1987-08-19 1989-02-22 Fujitsu Limited Network control system
EP0304285A3 (en) * 1987-08-19 1989-08-09 Fujitsu Limited Network control system
US5420982A (en) * 1987-08-19 1995-05-30 Fujitsu Limited Hyper-cube network control system having different connection patterns corresponding to phase signals for interconnecting inter-node links and between input/output links
GB2211638A (en) * 1987-10-27 1989-07-05 Ibm Simd array processor
US5434972A (en) * 1991-01-11 1995-07-18 Gec-Marconi Limited Network for determining route through nodes by directing searching path signal arriving at one port of node to another port receiving free path signal
EP0588021A2 (en) * 1992-09-17 1994-03-23 International Business Machines Corporation Switch-based personal computer interconnection apparatus
EP0588021A3 (en) * 1992-09-17 1997-02-26 Ibm Switch-based personal computer interconnection apparatus

Also Published As

Publication number Publication date
EP0271492A1 (en) 1988-06-22
GB8521672D0 (en) 1985-10-02
US5016163A (en) 1991-05-14
JPS63501986A (en) 1988-08-04

Similar Documents

Publication Publication Date Title
US5016163A (en) Parallel processing system including control computer for dividing an algorithm into subalgorithms and for determining network interconnections
US5689661A (en) Reconfigurable torus network having switches between all adjacent processor elements for statically or dynamically splitting the network into a plurality of subsystems
US4635250A (en) Full-duplex one-sided cross-point switch
US6314487B1 (en) Adaptive routing controller of a crossbar core module used in a crossbar routing switch
US5797035A (en) Networked multiprocessor system with global distributed memory and block transfer engine
US6745317B1 (en) Three level direct communication connections between neighboring multiple context processing elements
EP1384158B1 (en) An apparatus for controlling access in a data processor
JP2000232354A (en) Programmable device
JP2004133781A (en) Array processor
Wittie Efficient message routing in mega-micro-computer networks
JP4154124B2 (en) Parallel processor system
US20220019552A1 (en) Routing in a Network of Processors
JP3987782B2 (en) Array type processor
US6823443B2 (en) Data driven type apparatus and method with router operating at a different transfer rate than system to attain higher throughput
EP0304285B1 (en) Network control system
JPH04113444A (en) Bidirectional ring bus device
JP2009059346A (en) Method and device for connecting with a plurality of multimode processors
US20040250047A1 (en) Method and apparatus for a shift register based interconnection for a massively parallel processor array
US11520726B2 (en) Host connected computer network
JP3375658B2 (en) Parallel computer and network for it
JPS63257052A (en) Multiprocessor system
JP3661932B2 (en) Parallel computer system and crossbar switch
US9081901B2 (en) Means of control for reconfigurable computers
JPH0282342A (en) Data communication equipment
JPH01207864A (en) Inter-module connection control circuit

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1986905338

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1986905338

Country of ref document: EP

WWR Wipo information: refused in national office

Ref document number: 1986905338

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1986905338

Country of ref document: EP