WO2002099685A1

WO2002099685A1 - Method and circuit arrangement advantageously provided for conducting parallel cyclically repeating data processing

Info

Publication number: WO2002099685A1
Application number: PCT/HU2002/000049
Authority: WO
Inventors: István FLEDRICH
Original assignee: Afca-System Kft
Priority date: 2001-06-06
Filing date: 2002-05-29
Publication date: 2002-12-12
Also published as: HUP0102356A2; HU0102356D0

Abstract

The inventive method associates a global address to an instruction, which is jointly used by processing units, stored in a global instruction memory and routed to a global instruction bus. Local memory data indicate, in local data memories, the addresses modified in the local data memories. The data selected in local data memories take part in the procedures, which are determined by local arithmetic/logic units, by the global arithmetic/logic unit and by the global instruction bus. The circuit arrangement is comprised of a processing block, which is comprised of at least two local processing units. In the processing units, a local address converter is connected via the local address bus to a local data memory, a local data memory is connected via the local data bus to a local arithmetic/logic unit, on one side, and to the local address converter on the other side. The inventive circuit arrangement also includes at least one so-called global address generator, at least one global instruction memory, at least one global control unit, at least one global arithmetic/logic unit and at least one external coupling unit.

Description

Method and circuit arrangement, advantageous for parallel, cyclically repeating data processing

Various methods are known for increasing the computing capacity of the electronic data processing systems.

The increase in speed in single-processor systems is associated with an increase in the system clock (depending on the respective technological circumstances), with an increase in the file processing bit width or z. B. using the so-called pipeline-like instruction execution.

A speed increase beyond the maximum limit specified by the current technology can only be achieved with parallel data processing.

The different data processing systems are also characterized by different data processing methods.

The method SISD (single instruction, single data: "Neumann machine") has become widespread in single processor systems, the method SIMD (single instruction, multiple data) and MISD (multiple instruction, single data) for pipeline-like instruction execution.

In multiprocessor systems, the SIMD (Single Instruction, Multiple Data) and MuMD (Multiple Instruction, Multiple Data) methods are primarily used, as described in detail in the specialist literature (e.g. Bόna-Erenyi-Vajda: Többmikroprocesszoros rendszerek)

Solutions suitable for increasing the speed are also described in various patent applications.

Patent application EP0917056 (US19970064250P) describes a method and processor arrangement that is suitable for parallel cooperation and consists of microprocessors, memory units and input / output units. In the described In this system, the current activities in the processing units are determined with independent elementary operating programs. The cooperation of the processing units is guaranteed by a suitable hardware expansion and software mechanisms. The advantage of this version is that after the distribution of the tasks and data, the elementary programs execute the tasks simultaneously, which leads to an increase in the processing speed.

The processing speed is limited by the synchronization procedures, at the same time the local address preparation and the local data traffic require additional process cycles.

With a larger number of units working in parallel, the time required for the distribution of tasks and data can be considerable.

With a jointly treated data group formation and with parallel processing of the data groups, an increase in processing speed is achieved, which can be found in the method and processor arrangement proposed in patent application DE 19835216 AI.

The procedure is very effective when performing some tasks (FFT, FIR, ...), mainly due to the possibility of moving data in parallel. However, parallel data access is only possible for the segments of the previously grouped data.

For tasks that require regular data grouping, the time consumption can be significant, which leads to a reduction in overall performance.

Optimization of processes in systems with processors operating in parallel is intended in patent application DE19854810, the empty instructions (NOP) being hidden when instructions and data are provided from the main memory. Avoiding empty cycles increases system capacity and processing speed.

In general it can be said that with the known multiprocessor systems the increase in the number of processors working in parallel is associated with a non-linear increase in computing capacity. When using a very high number of processors, the required data movement and process synchronization tasks lead to considerable loss of time and as a result a further increase in the number of processors becomes pointless.

In the method proposed by us, we intend to offer a method for processors working in parallel, with which higher operating speeds can be achieved compared to known methods.

We have recognized that with the use of double, sequential addressing, an approximately linear increase in performance (of the control and computer operations) can be achieved depending on the number of processing units.

The main idea in our solution is that several data are assigned to one command by means of several addresses (SEVLAD for short: "Single Insrtuction, Multiple Address and Multiple Data" method).

The method proposed in the patent application makes it possible to process several data in parallel with the aid of adapted programs and with their control by means of a command assigned to a so-called global address.

The sequence of the procedure is only limited by the number of processing units arranged in parallel and is carried out with any number of so-called data selected from local addresses. The double addressing and the common command, in cooperation with suitable peripherals, enables a very fast and effective data exchange between the individual elements and also in the direction of the external units. The invention now relates to, on the one hand, a method for parallel processing of repetitively cyclically repeating data processing tasks, on the other hand a circuit arrangement for realizing the method, and also a program system for generating and simulating the code required for the function.

In the method present in the invention, a so-called global address defined by a global address generator presents the command used jointly by the processing units, stored in a global command memory and forwarded on a global command bus.

The global address also specifies the local memory data located in the address memory belonging to the address converter, which are combined in a processing block and form processing units working in parallel with one another.

The local memory data, modified in the local address converters, indicate addresses in the local data memories.

The data selected in local data memories participate in the procedures defined by local arithmetic / logic units and the global arithmetic logic unit, as well as by the global command bus.

Likewise, the data associated with the selected data belonging to the local data memories appear on the bit bus with which the prescribed, current procedures are carried out in the global arithmetic / logical unit.

Between the processing units, the global address generator and the global data memory, the internal data exchange controlled by the global control bus is carried out by a global data bus, the drive of the global control bus, depending on the global address bus and the global command bus, being guaranteed at the output of a global control unit. Supplementing the units intended for internal data exchange with the coupling surface for external control and with the associated control lines also enables an external data exchange to be carried out.

At this point it should be pointed out that the local processing units are able to carry out procedures independently in the sense that they have the local addresses and data and can use them to carry out procedures defined by the global command memory.

The global address generator creates the global addresses while the program is running. The global address formed at the output defines the current procedure by addressing the global command memory and, modified in the local address converter, selects local data. In order to avoid multiple data storage, the global data storage is used. The data selected here can be used by all local processing units at the same time.

The program generation is carried out excellently on a computer with an assembler, compiler or optimizer suitable for program generation.

The local addresses and output data of the program are stored in the local processing units (or in the global data memory) under the supervision of the global control unit. Access to the results of the program run is guaranteed at any point in the system.

Processing the data traffic of the bit bus through the global data bus can be advantageous because this leads to a reduction in the total number of data lines.

It is helpful to have local analog and digital data exchange between the to ensure local processing units and local out / in buses. (This can be achieved in cooperation with the local output / input coupling units, by controlling the global command bus and the global control bus.)

The method presented in the invention proves to be suitable for ensuring parallel data exchange between the local processing units which form the processing block.

This task can be solved excellently by channel processing units, the channel driver outputs, the quantity of which corresponds to at least one and at most the number of processing units involved in the parallel processing process and of channel reception inputs whose quantity is at least one and at most the number of those in the parallel processing process corresponding processing units involved. The data traffic between the local data buses, the local auxiliary data buses, the channel driver outputs and channel reception inputs is determined by a global command bus, a global control bus and local control lines.

The selection of the active channel reception input is ensured with the input multiplexer, the active channel driver output with the output demultiplexer. A system with only one channel receive input does not need a multiplexer and with only one channel driver output does not need a demultiplexer.

For some tasks, it is advantageous to equip the directly parallel data channels with directly parallel data channel switches for the parallel data traffic that is processed between the local processing units.

The data channel switches are primarily controlled by the global data bus, the global command bus and / or the global auxiliary data bus. The use of data channel switches ensures the controlled splitting of the directly parallel data channels. It can also be helpful if the data traffic is influenced by the network formed from the data bus sections generated from the data channel switches. The simultaneous use of the data channels with local output / input coupling units leads to a further improvement.

It is very advantageous to supplement the data channel control units with From input channel control units. (Its control enables the reception and output of local analog and digital signals through the local input / output buses.)

It is similarly advantageous to supplement the data channel control units with external data bus control units. (Their control enables parallel data traffic through external data buses).

If necessary, the local address data can also be stored in the local data memories. (This enables storage units to be saved, but it leads to a reduction in the processing speed. In this embodiment, the local address data reach the local address converter through the local data buses.

To avoid multiple data storage, the shared data must be stored in a global data memory that has an address converter and data memory. (Data traffic with the global data memory is guaranteed by the global data bus and controlled by the global address bus and global control bus).

It is similarly advantageous to supplement the global command bus with a global auxiliary data bus. Thus, the data stored in a global auxiliary data memory and selected directly by a global address bus can be controlled by the global control bus through the global auxiliary data bus into the global arithmetic / logical Unit where they participate in global arithmetic / logical procedures or in the execution of control tasks.

For carrying out input / output procedures, it is equally advantageous to use a global output / input control unit controlled by the global control bus. This enables global control from the control bus

Control unit to perform determined input / output data traffic between the data bus of the global output / input control unit and a global data bus.

The data bus of the global output / input control unit in the input and output direction is also excellent, as are the control bus of the global output / input control unit in the output direction with registers.

In addition to controlling the data traffic, the control signals defined by the program also allow the output of external addresses, whereby the processing speed of the periphery can be freely adjusted up to the limit specified by the cycle time.

Another advantage is provided by using an interruption control unit controlled by a global control bus.

The interrupt control unit directs the inputs to the single-cycle interrupt inputs and / or multi-cycle interrupt inputs

Interrupt requests through the interrupt control bus to the global control unit. The specified procedure is carried out after acceptance of the interruption and confirmed with the setting of the interruption confirmation output.

The invention also relates to the circuit arrangement with which the methods described above can be implemented.

The circuit arrangement corresponding to the invention consists of a so-called processing block, which is formed from at least two local processing units, a local address converter in the processing units local address bus with a local data memory, a local data memory is connected through the local data bus on the one hand to a local arithmetic / logic unit and on the other hand to the local address converter. In addition, the circuit arrangement corresponding to the invention includes at least one so-called global address generator, at least one global command memory, at least one global control unit, at least one global arithmetic / logic unit and at least one external coupling unit.

The global address generator is connected to the processing block, to the global data memory, to the global arithmetic / logic unit and to the external coupling unit through the global address bus. The global command memory is connected to the processing block, to the global arithmetic / logic unit and to the global control unit through the global command bus.

The global arithmetic / logic unit is connected to the processing block, to the global address generator, to the global data memory, and to the global arithmetic / logic unit through the global control bus.

The processing block is bidirectionally connected to the global arithmetic logic unit through the global bit bus.

The external coupling unit is connected to an external unit (e.g. computer) and to the global arithmetic logic unit.

The circuit arrangement described above and corresponding to the invention thus forms a control system. It does not lead to any substantial change in the sense of the invention if the system capacity is increased during the implementation with an increase in the control system or its units.

The circuit arrangement with the local processing units is advantageous to expand coupling / output coupling units connected to the global command bus and the global control bus by means of bidirectional local data buses. At the same time, the local out / input buses are connected to the local out / input coupling units.

It is just as advantageous to expand the circuit arrangement with local data channel managers connected to the local processing units by bidirectional local data buses, which are connected to the local processing units at least with a local auxiliary data bus and additionally with local control lines.

The global control bus, global command bus and the local control lines are connected to the data channel manager as input. At the same time, the data channel manager is also connected to the directly parallel data channel outputs and the directly parallel data channel inputs.

Further advantages are to be expected when expanding the circuit arrangement with directly parallel data channel switches connected to the data channel manager, which enable the direct parallel data channels to be divided into sections or switched into the network. The global control bus, global command bus and the global auxiliary data bus are attached to the data channel manager.

It is advantageous to expand the local processing units at the same time as with local output / input coupling units, and also with directly parallel data channel managers.

A considerable advantage can be expected if the direct parallel data channels (Direct Parallel Access: in further DPA) are expanded as input and output, with DPA out / input channel control units and DPA external data bus control units. The DPA out / input channel control units are connected to the global command bus, to the local out / input buses and to the DPA out / input channels. The DPA external data bus control units are connected to the global control bus, the external data bus and the DPA out / input channels.

It is expedient to expand the control system (circuit arrangement) with at least one global auxiliary data memory which is attached to the global control bus, to the global address bus, through the global auxiliary data bus to the global arithmetic logic unit and to the global data bus.

It may be helpful to expand the control system with at least one of the global data memories connected to the global control bus, the global address bus and the global data bus.

Equally advantageous is the expansion of the control system with at least one interruption control unit to which the global control bus, the single and multi-cycle interruption inputs, the interruption control bus which connects in the direction of the global control unit, the interruption confirmation output and the global data bus are connected.

The expansion of the control system with at least one global out input control unit, that with the global control bus, the

Control bus of the global out / input control unit, the global data bus and the data bus of the global out input control unit.

The invention also relates to a suitable program system which is responsible for the generation and simulation of program codes and generates program and data codes for the circuit arrangement corresponding to the method.

The program system according to the invention divides the tasks into elementary procedures, then brings the elementary procedures for the current system structure and for the parallel processing optimized form that can be loaded into the system, the dependencies of the elementary procedures on each other, on the Time and also from the specified sequence of operations are taken into account.

The extended form of the program system is able to simulate the generated program code, or it allows to execute the code step by step or in a controlled section on a suitable graphic / text output surface, whereby the results of the parallel procedures can be understood during the process.

Furthermore, we present the invention through figures and characteristic tasks, whereby

1 shows the structure of the processing blocks, the processing units corresponding to the invention, FIG. 2 shows the advantageous structure of the processing blocks, supplemented with local input / output channel units, and FIG. 3 shows the advantageous structure of the processing blocks , supplemented with local data channel processing units, and FIG. 4 shows a possible structure of the data channel processing units of the

Processing Blocks, Figure 5 shows a preferred embodiment of the processing units where the local input / output channels are equipped with local data channel management units. FIG. 6 records the local data channel management units, supplemented with DPA input / output data channel management and external data bus management units, FIG. 7. FIG. 8 shows the complete control system corresponding to the block structure (circuit structure) according to the invention, as well as FIG. 8. records extended control system equipped with useful additions. Table 1 Example 1 - Execution of global bit operations, Table 2 Example 2 - Word formation from any data bits,

Table 3 Example 3 - Distribution of data for LU decomposition of a 7x7 matrix Table 4 Example 4 -. Distribution of data in LU decomposition of two 3 3 Matr Table 5a Example 5 - Distribution of data in the multiplication of two 3 ^χ 3 matrices with 3 vectors, Table 5b Example 5 - program steps Table 6a Example 6 - Other Distribution data when multiplying two 3 3 matrices by 3 vectors, Table 6b Example 6 - program steps,

Table 7 Example 7 - Distribution of data when multiplying an 8 8 matrix by 8 vectors. In FIG. 1, the structure of the processing system E according to the invention is found, with the aid of which - of course in the assignment of suitable global units and using task-adapted program codes - the procedure is to be carried out. Processing block E consists of any number, but advantageously at least two local processing units D1, D2, ..., Dn. The local processing units D1, D2, ..., Dn contain a local address converter A, a local data memory B and a local arithmetic / logical unit C.

The local address converters A are connected to local data memory B by local address buses LAi, LA ₂ , ..., LA _n . The local data memories B are connected to local arithmetic / logic units C by local data buses LDι, LD ₂ , ..., LD _n . The local data memories B are connected to the local address converter A by bidirectional local data buses LDi, LD ₂ , ..., LD _n .

Processing block E according to the invention is connected by global address bus GA, global control bus GC, global command bus GI and global data bus GD, and further by global bit bus GB to the additional units of the circuit arrangement (control system), as explained in connection with FIG. 7 is. Depending on the design of the local address converter, the system includes an address memory addressed from the global address bus GA, data registers which are provided for the address value and auxiliary values selected for storage in the address memory, and a local address computer logic -Unit. The local address computers as logical units determine the output signals of the address converters based on the address data, the content of data registers and depending on the state of the GA global address bus:

- directly, that is to say that data value selected by the global address bus GA in the local address memory is sent to the output of the

Address converter appears as the result address.

with offset formation, the result address being formed by the local address computer of the address converter on the basis of the data values previously stored in the registers and selected in the local address memory,

- Corrected from the state of the global address bus GA, the result address being formed by the local address computer of the address converter on the basis of the data previously stored in the registers and lines of the address bus selected according to the state. During the execution of the inner cycle procedures, the registers in the

Address converters A in cooperation with the local address computer logic enable the assignment of different data to the same program segment in each cycle by specifying base addresses of the data treated in parallel. In cooperation with the global address bus GA, through the assigned

Address lines, the global address bus serves as an argument of the address converter A. This allows the size of the address memory to be reduced.

The generated local addresses on the output side of the address converter define the data in the local data stores B, which are involved in current processes. The data defined in the local data memories B, in mediation with the local data buses LDi, LD ₂ , ..., LD _n , are involved in the local arithmetic / logical unit C in arithmetic and logical operations, of which they are referred to as results selected local positions can be saved. Each local arithmetic logic unit C contains at least one computer unit ALU for

Perform arithmetic (sum, subtract, multiply, ...) and logical (shift, AND, OR, ...) operations. Similarly to the arithmetic logic unit C, a subunit belongs, which is provided depending on the register content and command for bit selection. The task of this is to pass the selected bits for further processing through the global bit bus GB into the global arithmetic / logic unit P, as can be seen from FIG.

The arithmetic / logic unit C also contains registers for performing various data storage tasks. Some of the registers, which are provided for intermediate storage in arithmetic and logical operations, have the bit width corresponding to the local data bus LD. Some local registers can also perform control functions. Bit registers are provided for fixing the local state of the selected bits, the transmission bits (carry), the result bits (> =, <=, ==, = 0, ...), the control bits for address and Data modification.

The modification or access to the current content of the bit register, controlled by the global control bus GC, is guaranteed by the global data bus GD or global bit bus GB. The data traffic of the bit bus GB, with a suitable design, must also be carried out by the global data bus GD. This design eliminates the need to use a separate global bit bus GB. The control which processes take place in the local arithmetic logic units C is carried out as a function of the global command buses GI and global control buses GC. 2 shows the processing block E_Df corresponding to the structure of the invention, which is expanded with local output / input coupling unit F. The local output / input coupling units F are carried out from the side of the global command bus GI and global control bus GC. Its task is to adapt the local digital and analog signals, as well as to secure data traffic between the local out input buses LIOi, LIO _2ι ..., LIO _n and local data buses LD. In this context, you are equipped with suitable instruments, such as:

- ports for input / output of digital signals,

- intelligent digital coupling units equipped with independent programs or controlled from the control system,

- D / A converters equipped with independent programs or controlled from the control system for generating analog output signals,

- A D converter for acquisition of analog signals, as well as multiplexers, etc.

The number of assigned local output / input coupling units F is arbitrary, but is expediently not higher than the number of local processing units D used.

FIG. 3 shows the structure of the processing blocks corresponding to the invention and expanded with a local data channel manager H. The task of the local data channel managers H is to secure the direct parallel data traffic (Direct Parallel Access -DPA) between the local processing units D. The DPA is controlled in accordance with the command and is from the global command bus GI and global control bus GC defines. The number of assigned local data channel managers H is arbitrary, expedient but not higher than the number of local processing units D used. Each local data channel manager H operates an output channel which is responsible for the transfer of the local data bus LA states and is inactive Record position tri-state. This makes it possible for a local data channel manager to have H number, less than the number of local processing units D, with one control unit to operate several data channels.

The local data channel managers H have a number of input channels corresponding to the local data channel manager H, which are selected in the local data channel manager H and the local auxiliary data buses SB], SB ₂ , ..., SB _n or can be assigned to the local address buses LAι, LA _> ..., LA _n . Thus in the system, with the aid of a program, it is ensured, for example, that the data state found in one processing unit D on the local data bus LD can participate in the data processing in another, or even simultaneously in all processing units D.

Fig. 4 shows the possible construction of the local data channel processing units H 1, H ₂ , ..., H _n ,, corresponding to the invention and records a possible configuration of the directly parallel data channel switches KjDPAi, ..., K_DPA _X. In the example recorded, the local data channel processing units are H 1, H ₂ , ..., H _n , with local data buses LDi, LD ₂ , ..., LD _n , local control lines LC ₁₅ LC ₂ , ..., LC _n and local auxiliary data buses SB _1? SB ₂ , ..., SB _n connected. The task of local auxiliary data buses SB SB ₂ , ..., SB _n can of course be taken over by the local data buses LD i, LD ₂ , ..., LD _a .

The selection of the direct parallel data channel output is guaranteed by the Odemux channel demultiplexer, whose task is the transfer of content from the local data bus LD to the selected parallel data channel output. This enables the content of the local data bus LD of a local processing unit D to be issued on any data channel. This version requires that connections are made to the "Open Collector" or "tristate" data channels. Counter switching (or short circuit) is to be avoided from the control program or with additional hardware elements. It's easy to use an Imux channel multiplexer for direct parallel input selection. Here it is possible to use an Odemux channel demultiplexer at the same time, but also without this data access is possible. In this case, only one output channel is assigned to each local processing unit D, the current local processing unit D operating output channel corresponding to its position.

When performing some tasks [multiplication 2 _x 3x3 matrix with 2 _x vectors (see example no.5)], the use of directly parallel data channel switches K_DPAχ, ..., K_DPA _{x is} advantageous. Data channel switches are required if several data that are not connected to one another are passed along a parallel data channel. If a data channel channel switch K_DPA is installed between local processing units D3 and D4, after they have been separated, two independent data can be passed on on one channel.

The use of KJDPA data channel channel switches is particularly advantageous with a large number of processing units D and a small number of parallel data channels. In the case of more complicated data movements, claims can be made for the execution of a data channel network. A simple and quick control is guaranteed from the program, even with any complicated composition of a network.

FIG. 5 shows the processing block E_Dfh corresponding to the invention, with local output / input coupler units F and data channel managers H, which contains the units described in FIG. 2 and FIG.

6 shows the control system corresponding to the invention, in that the processing block E_Dh with data channel manager H has an expanded form, through the parallel data channels of the data channel managers H directly parallel data channels DPAj, DPA ₂ ,..., DP On, on the one hand towards the off / Input channel control units H_io _I; ..., H_io _n DPA assigned local output / input buses LIOj, ..., LIO _n , on the other hand by DPA external data bus control units H_exD _1; ..., H_exD assigned external data buses exD _1? ..., exD _{n are} applied. In this system, the local data channel administrators H are still able to carry out the tasks described above unchanged.

When using the data channel manager H, it is advantageous to supplement the DPA output / input channel control units with F local output / input coupling units described in FIG. This enables the local output / input buses LIO _1? ..., LIO _ft controlled from the system program, to establish a connection with any or, at the same time, with several processing units D if expedient. This ensures the dynamic execution of local output / input procedures at high speed.

A further significant advantage is the assignment of the DPA external data bus control units H e Di, ..., H_exD _n to the data channel manager H. This enables parallel data traffic with an external storage unit, where, in cooperation with the internal system programs, A connection can be established between any external data positions and any D local processing units. For example, any external data block can be saved within a cycle to the intended position of the D local processing units or from which data can be read out.

7, which is regarded as the main figure, shows a structure of the control system R corresponding to the invention, by means of which the method can be implemented, it only contains elements which are inevitable but sufficient for the function. Main elements Di of the control system are:

- J global address generator

- K global command memory

- M global control unit

- T external coupling unit - P global arithmetic logic unit

- E processing block. The processing block E has at least two local processing units Di, ..., D _n , or Df, Dh, Dhf, which were described in connection with Fig. 1-3. The global address generator J is connected through the global address bus GA to the processing block E, the global command memory K, the global control unit M and the external coupling unit T.

The global command memory K is connected to the processing block E, the global arithmetic / logic unit P and the global control unit M through the global command bus GI. The global control unit M is connected by a global control bus GC to the processing block E, the global address generator J, the global command memory K and the global arithmetic / logic unit P. The processing block E is bidirectionally connected to the global arithmetic / logic unit P through a global bit bus GB. The global external coupling unit T is the external one through control lines ExV

Coupling unit with an external unit EX (e.g. computer) and connected to the global control unit M.

Here we note that the control of the global address bus GA can, if necessary, also follow from the side of the external coupling unit T. When the system is put together, an increase in system performance can be achieved by increasing the control system R or its components.

The size of the global address generator J corresponds to the width of the global address bus GA and contains a counter that can be changed with a clock signal, the content of which is expedient depending on the state of the global control bus GC, preferably from the direction of GD global data bus, or possibly depending from the state of the global command bus GI can be overwritten.

The address generated on the output of the global address generator J corresponds to the content of the address counters and serves as a common address for all units. If the contents of the counters are overwritten (eg from the global data bus GD side), it is possible to carry out global jump commands that can be conditional or unconditional.

The global command memory K is responsible for storing the operational commands. The global command memory K is filled up with command codes, advantageously controlled by the global control bus GC, through the global data bus GD, to the address specified by the global address bus GA. During the program run, the command selected by the global address bus GA appears on the output of the global command memory K, and it reaches the intended positions through the global command bus GI.

The global control unit M generates the control signals that are required to carry out the system procedures and data traffic. Serve as input variables:

the address variable generated in the global address generator J and passed on through the global address bus GA,

the operation command corresponding to the current content in global command memory K and forwarded by the global command bus GI, and

- Control lines ExV corresponding signals of the external coupling unit T.

The production of the output signals as a function of the input signals is carried out, preferably synchronized with the clock signal. The output variables, possibly via suitable auxiliary circuits, appear on the global control bus GC and thereby control and synchronize the procedures in the system. The task of the external coupling unit T is to secure the data traffic between the control system and an external unit (eg computer). Depending on the control lines ExV of the external coupling unit T and according to the states of the global control bus GC, the data coupling between the coupling surface EX to the external control and the global data bus GD is secured. This means that tasks that are inevitable for the system are completed, especially: - The content of the global address generator J can be read and overwritten, which enables the external determination and query of the global address.

- In processing block E using the control lines ExV of the external coupling unit, it is possible to determine a local processing unit D. The position specified with the global address and selected in the address memory for the local address converter A of a local processing unit D is to be read and written. Likewise, read and write data position corresponding to the local address bus LA in the local data memory B, which is generated in a local processing unit D from the global address after modification in the local address converter A.

In the processing block E it is possible to perform a global write operation on the local address LA, which is determined with the global address and modified in the local address converter A. During the global write operation, the date appearing on the global data bus GD is determined in each local processing unit D with the respective local address bus LA and stored in the local data memories B within a cycle. - The operation code selected by the global address bus GA in the global command memory K can be read and overwritten. This allows the operation codes to be brought into the system. The global arithmetic / logic unit P has a fundamental meaning for the system functions. The main elements are: - The unit performing global arithmetic and logical operations

(ALU)

Registers which are assigned to bit operations with global determination,

General purpose registers involved in global operations. Circuits that secure data traffic.

Operations to be performed in the global arithmetic / logic unit P. are:

- Bit operations (AND2, AND3, AND4, ..., OR2, OR3, OR4, ..., EXOR2, EXOR3, ..., LNV, ...).

- The processing of bit variables that are assigned to the local processing units D, but in the global arithmetic logic

Unit P, to be processed as a common value, such as transmission bits (carry), result bits (<=,> =, ==, ...), operation enable bits, bits for local address correction, bits for local data correction.

- Data transfer between the global bit bus GB and global data bus GD.

- Data traffic between the global data bus GD and the local processing units D.

8 shows a structure of the control system R corresponding to the invention, by means of which the method can be implemented, wherein it contains avoidable but additional advantages, namely:

- the N global data storage,

- The L global auxiliary data memory, - The Q global output / input control unit and the S interrupt control unit. The global data memory N and the global auxiliary data memory L are connected to the global address bus GA. The global data memory N, the global auxiliary data memory L, the interruption control unit S and the global output / input control unit Q are connected to the global control bus GC and to the global data bus GD.

The global auxiliary data memory L is connected to the global arithmetic logic unit P of the control system R by the global auxiliary data bus GS. The interrupt control unit S is with single-cycle interrupt, multi-cycle interrupt inputs lcikl.INT, ncikl.INT and with the

Interrupt confirmation output acklNT equipped, with the Interrupt control bus GM, it is connected to the global control unit M of the control system R. The global output / input control unit Q is connected to the control system R via the data bus Q_ioD of the global output / input control unit and the control bus Q_ioC of the global output / input control unit, through the global address bus GA and global control bus GC.

The global data memory N contains an address converter and data memory equipped with address memory, the task of which is the storage of common operational data. The structure and task of the address converters in global data memory N is the same as that described in the local processing units D. Global address corresponding to global address bus GA, modified in the address converter, selects a data position in global data memory N. A value can be written to or read from the selected position, with the global control bus GC controlled by the global data bus GD.

In the global data memory N, data are stored which are shared for the local processing units D. This avoids the multiple storage of data, which increases the effectiveness of the storage capacity. The global auxiliary data memory L serves mainly to expand the global command memory K. In the global auxiliary data memory L there are those, controlled simultaneously with the operation code and by the activatable data which are selected directly from the global address bus GA, with the global control bus GC , processed by the global auxiliary data bus GS in the global arithmetic / logic unit P, or can also perform some control tasks.

Depending on the operations command:

- define inversion tasks for the global logical operations, or

- contain jump addresses, - bit data for global consumption,

Data involved in control tasks, - ETC

The global output / input control unit Q, controlled from the global control bus GC in cooperation with the global data bus GD, is provided for global output / input operations. From input control unit Q contains control registers, whose outputs

- the control bus GjtoC of the global out input control unit, and

- operate the data bus G_ioD of the global output / input control unit, and

- The synchronized record the data coming from the G_ioD data bus. Registers belonging to the control bus G_ioC of the global output / input control unit, an address is fed in during the program run, whereby a data position is selected on an external input / output unit. In the case of data output, it is also expedient to feed the intended data value into the register belonging to the data bus G_ioD of the global output / input control unit. The content of the registers described remains unchanged until the next access. Depending on the peripheral speed, the

Program run continues for a defined time, then the required control signals are output on the control bus G_ioC of the global output / input control unit. When data is read in, the content of the data bus G_ioD is stored in registers, which can be forwarded to the global data bus GD at any time.

In an inactive situation, the data bus G_ioD should preferably adopt a tri-state.

The interrupt control unit S on external requests executes an intended command or starts an assigned program segment (subroutine).

Interrupt request forwarded to global control unit M is signed on the interrupt confirmation output ackTNT.

In the case of a single cycle interruption lcikl.INT (input is activated), the operation prepared in the register of the global control unit M, expediently according to the global addresses stored in the same registers (here the global address is output from control unit M), data corresponding to the data position carried out. This type of interruption is preferably, for example, parallel data traffic carried out by the local data channel manager H. The repetition of the interruption is of course permitted, wherein, for example, between two interruptions only the global address located in the registers in the control unit M is changed.

In the case of a multi-cycle interruption ncikl.INT (input is activated), the content of the global address generator J is preferably stored in registers, after which the address counter is picked up from the register corresponding to the interruption code or from a data memory. At the end of the interrupt program sequence, a return instruction signs the end of the interrupt task. The return address saved in the register is fed in again and the interrupted program continues.

Furthermore, based on a few examples, we carry out the presentation of the control system corresponding to the invention, whereby

- Example No. 1 shows a possible solution for performing global bit operations,

- Example No.2 shows a word formation from any data bits,

- Example No. 3 shows the distribution of the output and result data, with LU decomposition of a 7x7 matrix,

- Example No. 4 shows the distribution of the output and result data, with simultaneous LU decomposition of two 3x3 matrices,

- Example No. 5 a. shows the possible division of the data when multiplying two 3x3 matrices by 3 vectors, whereby - Example No. 5b. shows the program steps required for this,

- Example No.6a. shows a different division of the data, for the same task, where

- Example No.6b. shows the program steps required for this,

- Example No.7. shows the division of the output and result data when multiplying an 8x8 matrix by 8 vectors. Example 1 (Table 1) shows a possible solution for performing global bit operations. Let us assume that in the system the processing block E consists of local processing units D ₀ , Dι, ..., D ₇ . Let us choose a logical function assigned to a result word QQ, where the result word QQ consists of 8 bits. Each input word AA, BB, CC, DD is also 8 bits wide, and the logical operations performed have a QOj = (! (! AAj & BB) |! (! C & DD ")) form where j, k, l, n - are index values that can assume any, even repeating, value within the bit width. & logical AND function, | - logical OR function,! - Invert means.

According to the previous description, the task is to create logic functions between 32 AA _j , BB _k , CCι, DD _n input and 8 output variables QQi. During the data input, an input variable, within one cycle in all local processing units D ₀ , Dι, ..., D ₇ , address position arising on the output side of the local address converter A is stored in the local data memory B. Saving the 4 variables would take 4 cycles, but on the one hand they can participate in several procedures afterwards, on the other hand only the values that have changed in the meantime have to be read again. The number of readings, depending on the task, can also be considerably less because the values for other operations are available through different local addresses. In the case when all the input bits belong to another word in the given functions, the number of read cycles can of course be greater. However, this means that we only use one bit from a word. However, this usually only happens as a result of incorrect program creation. If the 32 input variables belong to different words, but the variables are also used in other operations, the read-in time does not increase. The operation codes, the assigned local addresses and the data indicating the bit position do not change during the course of the program. The values are to be defined during the program development and brought into the system before the program runs.

The program sequence starts at the global address GAj. The bit position values for the variables AA and BB can be found on the local positions LA corresponding to the global address GAi. The contents of the local address memories aABsho, ..., aABsh ₇ give data to the local bit positions ÄBsho, ..., ABsh ₇ . For example, with 8-bit data width, the lower 4 bits indicate the bit position for the variable AA, the upper 4 bits for variable BB. With the LatchSh command, the bit selection data are stored in the registers provided for this purpose in the local processing units D. The bits selected according to the registers, or (depending on the state of the global auxiliary data bus GS) inverted values, are used as input data by the global bit bus GB in the global arithmetic / logic unit P.

The content aAAo, ..., aAA _{7 of} the local addresses LA belonging to A _{Q +} D determines the data AAo, ..., AA ₇ involved in the first logical operations. Simultaneously with the data storage in the register, it is also expedient to store the data from the global auxiliary data bus GS, which are responsible for the inversion procedure.

The content aAAo, ..., aAA _{7 of} the local addresses LA belonging to A <i ₊₂ ) determines the further data BBo, ..., BB ₇ involved in the first logical operations. The ones for the inversion tasks are also determined here from the global auxiliary data bus GS. In the operation cycle GA (i + ₃ ), formed in the global arithmetic logic unit P (here 8 bits wide), and the HH data value common to the D local processing units is expediently defined during the program development, signed by the global data bus GD, and in Positions aHHo, ..., aHH7 saved. Accordingly, the procedures described in GAφ .., GA (i + ₃ ) are repeated until GA (j ₊ π), where the result word QQ is stored.

In another exemplary embodiment of the control system, it is of course also to be handled in such a way that the corresponding processing unit, the corresponding bit on the global auxiliary data bus GS, is used as an input. The forwarding of the bit result, after the bit selection in the local processing units D, or possibly after Inversion in the direction of the global arithmetic / logic unit P can also take place, for example, through the global data bus GD. In uniprocessor systems, with the assumption that the composition of the operations and the indexing no longer allows simplification and the command contains no bit addressing and inverting parameters, the task set described above requires at least 192 cycles. In a multiprocessor system which corresponds to the example and has a similar size but works in parallel (known), the task is to be carried out with considerable additional effort because of the task synchronization, the transfer of data and commands.

In the control system according to the invention with the specified size, the described procedures are carried out in 12 cycles. Another advantage is that with an increase in the system (increase in the number of local processing units D, increase in the data width), the number of bit operations which can be carried out in the same cycle increases proportionally.

Example 2 (Table 2) shows the generation of an 8 bit wide word from any bit variable. When used under the control system described in FIG. 1, the result word QQ can be 8 bits wide. The task is to be carried out in two cycles because the bit selection data, as described above, was determined during the program generation and introduced into the local processing units D when the program was loaded, and is available there. These values are fixed on the global addresses GAi), as a result of which the bit configuration of the desired word is specified, and this corresponding to the global address

associated command, can be stored by the local address buses LA specified data position.

A similar procedure is possible if the task is only to cross-connect a global data word. In this case, it is advantageous to take the output word from the global data memory N and, if necessary, to write it back there. The number of cycles required is also 2. Example 3 (Table 3) shows the distribution of the output and result data, and on the basis of this example we explain the program steps required for LU decomposition of a die. LU decomposition is advantageous for solutions, eg in the field of linear algebra, for simulation of larger, even non-linear electrical networks, in general for inverting a matrix, etc

In solving this task, we also go out under the control system described in Fig.1. The bars of the die are located in the associated numbered D ₀ , Dι, ..., D ₇ local processing units. The 7-dimensional vector is introduced in D ₇ . The row data with the corresponding local addresses LAφ ... LA (i + 6) selected are fed into the associated local data memory B after program generation, for example during program loading, data loading or as results of other computer operations. In the same way, during the program generation we define the position of the permutation bit matrix used for the arithmetic operations, which is given in the given case in the N global data memory for the LA local addresses with 7 pieces 8 bits wide word.

The program segment that performs the LU decomposition starts at the global address GA,) and is carried out in the following steps: a. The permutation die corresponding to the shape of the die is generated, b. A cycle starts, with the cycle counter set to i = 0, c. A particular main element selection is carried out in the bars i, that is to say the largest i element located in the bars is selected. This serves as the current generation element. The i element and the row index are recorded in registers, whereby the contents of the registers in the local address formation can also have an address-modifying task. d. If the i element (i bar) of row i does not have the greatest value, a row swap operation must also be carried out, this swap action by the global data bus GD, also in the permutation

Bit matrix is playing out. e. The multiplier vectors are defined, which are subsequently stored in the positions of the global data memories N already defined. f. The parallel multiplications are carried out with the corresponding vectors, after which the deduction from the basic series follows. G. i value is incremented, then the procedures, with simultaneous permission of the corresponding procedures, are c. repeated until i reaches the set value. H. Depending on the shape of the permutation matrices, the L-matrix vectors are stored in the result matrix.

The expected cycle numbers of the individual operations: a-7; b = l; c = 7; d = l; e = (6-l); fH36 ... 2); g = l; h = min.6;

A further reduction in the operating procedures can be achieved by storing the multiplier vectors in the result matrix immediately after completion (h = 0), and if the result only requires the number of permutations, there is the possibility of permutations - omit die (a = 0).

A significant reduction in time is associated with using DPA Direct Parallel data channels. As their number, depending on the design of the control system, reaches the maximum value, it is possible to carry out the particular main element selection in parallel. This allows the maximum value for each row to be determined in parallel before the start of the cycle (c), in a total of 7 cycles. The results, after being stored in a suitable form, are to be applied immediately later. In the case c = 0, that means that the LU-decomposition of a 7x7 matrix presumably consumes 1 + 7 + 6 * ((6 + 1) / 2 + (36 + 2) / 2) - 150 operation cycles , With the use of suitable algorithms and with structure-dependent optimization, a further cycle reduction can be achieved.

Example 4 (Table 4) shows the preferred division of the output and result data with simultaneous LU decomposition of 2 pieces of 3x3 matrix in the system described above. The LU decomposition follows as described in example no.3. The required operating cycles are of course drastically reduced. The use of the DPA Direct parallel data channels also brings an additional reduction in the duration. If two DPA channels are available in the system, they are preferably operated from local processing units (here D ₃ and D ₇ ), the vectors aVo, ..., aV and bVo, ..., bV ₂ include. The reading and writing, for which data common to individual matrix operations, is done through DPA channels. This enables the same operations to be performed on data found at different positions. With a larger number of local processing units D, this leads to considerable advantage. As can be seen, with any expansion of the system, the computing power increases proportionally. When assembling larger systems for limiting the required DPA channels, it is very useful to provide interconnection and separation options at the specified points (Fig. 4.). In a system which contains, for example, 128 local processing units D, when performing the task described, the separation (bus switch controlled by the program) is desirable after every 4 units.

This makes it possible to disassemble 32 pieces of 3x3 matrices simultaneously. At the same time when LU is broken down from a 7x7 die, the separation must be carried out after every 4 local processing units D. The number of matrices processed at the same time is 16, which means that the number of required operating cycles for a 7x7 matrix is only 10.

Example 5 (Table 5a, 5b) shows the possible division of the data when multiplying two 3x3 matrices by 3 vectors. In the D ₀ , ..., D ₂ local

Processing units are the parameters Aoo ... A _{22 of} the matrix AA, in the local processing unit D ₃ the vector parameters AVo ... AV ₂ , in the local processing unit D ₄ , ..., D6 the parameters B ₀ or .. .B _{22 of} the matrix BB, the vector parameters BVo .. -BV _{2 are} stored in the local processing unit D ₇ . This happens in such a way that the corresponding global addresses become during the program generation

in the local processing units Do, Dι, ..., D, assigned local address pointing to the parameters. Before the procedure expires, but at the point in time determined by the task, the output data are stored, after which the procedure described in Example 5 is carried out. In the first operation step, parallel multiplications are carried out between

- The first vector element AV ₀ arriving through DPA channel No. 3 and the elements of the first row Aoo, Aio, A ₂ o of the matrix AA, and the first vector element BV ₀ arriving through DPA channel No. 7 and the elements the first row Boo, Bio, B _{20 of} the BB die. We save the 6 result data EAo, EAι, EA ₂ , and EBo, EBι, EB ₂ to the corresponding local positions. Then perform parallel multiplications between

- The first vector element AVi arriving through DPA channel No. 3 and the elements of the second row Aoi, An, A ₂ ι of the AA matrix, and the first vector element BVi arriving through DPA channel No. 7 and the elements of first row B ₀ ι, Bπ, B ₂ ι the BB die.

Then perform parallel multiplications between the first vector element AV ₂ arriving through DPA channel No. 3 and the elements of the third row AQ ₂ , A _Ϊ2 , A _{22 of} the AA matrix, and the first vector element arriving through DPA channel No. 7 BV ₂ and the elements of the first row B ₀₂ , B _J2 , B _{22 of} the BB die, and that

Result is summed with the previous results EAo, EAι, EA, and EB ₀ , EBι, EB ₂ . After operation cycle No. 6, the result vector elements EA (0,1,2) are in the local processing units D ₀ , ..., D ₂ , the result vector elements EB (0, 1,2) in the local processing units D, ..., D ₆ . The task required a total of 6 operating cycles. As far as this is required, some data movements can still be carried out.

Example 6 (Tables 6a, 6b) a shows another possible implementation with Example 5 having the same task, with a division of the local addresses and data, where the output and result vector have the same data distribution. This division can be useful if there are more to the result vectors Transformations are made. It can be seen that the vector elements AVo, AVi, AV _{2 of} the vectors AV in the D ₀ , Dι, D ₂ local processing units, the vector elements BVo, B Vi, BV _{2 of} the vectors BV in the local processing units D ₄ , D5, The processing units are located. The processing units D ₃ and D _; are not used for the good representation of the comparison. Of course, the operations described are to be carried out with a number of “n” processing units for n / 3 matrices and vectors. In the processing units that remain free, operations (multiplication, addition) that correspond to the current procedures and are defined during program generation can be carried out. The operations are independent of the matrix operations.

Table 6b. shows that of 6a. Data segment corresponding matrix vector multiplication required program segment. Here, in the first operation step, a parallel multiplication with the matrix elements Aoo, An, A ₂ , B ₀ o, B _a , B _{22 of the} same indexing and the assigned AV ₀ , AVi, AV ₂ , BVo, BVι, BV ₂ vector elements. The result is stored in the intended positions EAo, EAι, EA, EBo, EBι, EB ₂ . Accordingly, first DPA of D ₀ and D ₄ , then D ₂ and D ₅ and finally D ₃ and D ₆ local processing units are activated, the required multiplications and additions being carried out. After 9 cycles, the result vectors adopt the same data distribution format with the output vectors.

With an increase in the number of processing units, the performance of data processing also increases proportionally according to this example. For example, When using 128 pieces of local processing units D, it is guaranteed to multiply 46 matrices with 46 vectors within 9 cycles.

Example 7 (Table 7) shows, when using the task described above, the division of the output and result data when multiplying an 8x8 matrix by 8 vectors. After program _generation , the LA _© , ..., LA <i ₊ 7) local address data, which indicate the row and bars of the die, are transferred to the local one Processing units D ₀ , ..., D ₇ fed. The local addresses of the output vector LA _® and result vector LA <k) are also given. During the program sequence analogous to that previously described, the LA local addresses select the data for the current operation. The result is generated in the LA ₍ ic) position.

Another data distribution and control program is needed, e.g. in a transformation task, where several different vectors are multiplied with a matrix. In the case of a task of this type, the matrix elements are to be placed in the global data memory N in an excellent manner. The vector elements should be introduced in the corresponding local processing units D. After the corresponding program segment has expired, e.g. in a system with 8 local processing units D, after 9 multiplications, 6 additions and 3 storage operations, a total of 18 elementary operations, the 3D transformations were carried out for 8 vectors. In a system with 128 local processing units, it means the simultaneous transformation of 128 vectors.

Similar to the previous examples, all tasks that can be broken down into elementary procedures can be carried out in parallel without additional data movements and synchronization steps.

If the control of the direct parallel data channel switch KJDPA, in addition to the selection of the corresponding output / input channels, e.g. follows from the global auxiliary data bus GS, then it means the expansion of the operation command. This structure enables a freely selected data access configuration to be assigned to each operation. As a result, data access, even with any number of processor units, is just as easy as with a one-processor system.

In accordance with the examples described, any operations can be carried out using a suitable program. It is associated with considerable advantage that during the parallel, concurrent operations, where the results come as bit variables (o, <=, ==, = 0 ...) that can be used together as system data words, in other control or computer procedures.

The control system according to the invention is of course applicable in any field where a large amount of calculations has to be carried out, e.g. Image processing, fast, multi-dimensional controls, etc. The system elements and units are manufactured on any technological basis, and this enables the process to be carried out.

In summary, the statement is to be made that the process corresponding to the application of the invention, with an increase in the number of local processing units D ₀ , Dι, ..., D ₇ , also approximately proportionally increases the processing power so that the for Programs intended for the same tasks are also generated in the same form for different system sizes. Another significant advantage of the system is that the structure of the local processing units is the same even with a large number of elements, which leads to considerable specific cost reductions.

Claims

1. A method, advantageously for the parallel execution of cyclically repeating data processing tasks, characterized in that a global address specified with a global address generator signs in a global address

Command memory stored and forwarded by a global command bus, signed in the processing units working in a parallel connection, which form a so-called processing block, local address data stored in the storage of the local address converters and modified in the local address converter, local data positions on local data memories and at these data positions, local operations corresponding to the global command are carried out in the local arithmetic / logic units operating in parallel, and global operations corresponding to the global command and, with the selected bits, global bit operations in the global arithmetic / logic unit; Defined by the task and controlled by the global control unit between the local processing units (Dι, D ₂ , ... D _n ), the global address generator, the global instruction memory and the global arithmetic logic unit, global internal data traffic is carried out, as well as with application external data traffic carried out by external interfaces.

2. The method according to claim 1, characterized in that the internal data traffic is controlled by a global data bus, and a global control bus defined by the global command bus and attached to the output side of the global control unit.

3. The method according to claim 1, characterized in that the data traffic in the local data memory selected data bits is ensured by the global bit bus or the global data bus.

4. The method according to claim 1, characterized in that the data traffic is equipped with one, at least with one data channel output and data channel input or more, at most the number of local processing units involved in parallel operations corresponding data channel outputs and data channel inputs equipped data channel - administrator performed.

5. The method according to claim 1, characterized in that data channel switches are used for the controlled separation of the data channels.

6. The method according to claim 1, characterized in that a network is formed with the data channel switches manufactured channel sections.

7. The method according to claim 1, characterized in that parallel data traffic is carried out with control of the data channel manager.

8. The method according to claim 4, characterized in that the analog and digital data traffic by the data channel manager with data channel output and data channel input data channel control units is guaranteed.

9. The method according to claim 1, characterized in that common data involved in the operations are stored in a global data memory.

10. The method according to claim 1, characterized in that the operations defined by the global logical unit are carried out using single and multi-cycle interruptions.

11. Circuit arrangement, advantageous for parallel execution of cyclically repeating data processing tasks, characterized in that it consists of at least two local processing units (Dι, D ₂ , ..., D _n ) which form a processing block in which a local address converter ( A) is connected to a local data memory (B) by the local address bus (LA), The local data memory (B) is connected by a local data bus (LD) on the one hand to a local arithmetic / logic unit (C) and on the other hand to the local address converter (A), and the global address bus (GA) to at least one local address converter ( A), the global control bus (GC) to at least one local address converter (A), one

Data storage (B) and a local arithmetic / logic unit (C) is connected, wherein the global command bus (GI), the global data bus (GD) and the global bit bus is connected to at least one local arithmetic / logic unit (C), furthermore consists of a global address generator (J), at least one global command memory (K) -, at least one global control unit (M), at least one global arithmetic / logic unit (P) and at least one external coupling unit (T), at the same time the global address generator ( J) is connected to the processing block (E), the global command memory (K) and the global control unit (M) through the global address bus (GA), the global control unit (M) being connected to the processing block through a global control bus (GC) (E), the global address generator (J), the global command memory (K) and connected to the global control unit (M) so that the processing block (E) through a global bit bus (GB) is connected to the global arithmetic logic unit (P), furthermore the external coupling unit (T) is connected to at least one external unit (eg computer) - and is also connected to the global control unit (M), the global data bus ( GD) forms a connection between the global address generator (J), the global control unit (M), the global arithmetic / logic unit (P) and the external coupling unit (T).

12. Circuit arrangement according to claim 11, characterized in that to the processing units (D], D ₂ , ..., D _n ) by a local data bus (LD) local output / input coupling units (Fι, F ₂ , ... F _n ) are connected, which are connected to the global control bus (GC), the global command bus (GI) and to the local out input buses (LIOι, LIO ₂ , ..., LIO _n ).

13. Circuit arrangement according to claim 11, characterized in that to the local processing units (Dι, D ₂ , ..., D _n ) by a local data bus (LD) local data channel manager (Hι, H ₂ , ..., H _n ) are connected, which have local auxiliary data buses (SBι, SB ₂ , ..., SB _n ), local control lines (LCι, LC ₂ , ..., LC _n ) and to which direct parallel data channel inputs (IDP Ai, IDP A ₂ , ..., IDP A _n ) and direct parallel data channel outputs (ODPA _b ODP A ₂ , ..., OIDPA _n ) are assigned.

14. Circuit arrangement according to claim 13, characterized in that to the

Directly parallel data channels (DPAι, DPA ₂ , ..., DPA _n ) DPA out / input channel control units (H_ioι, H_io ₂ , ..., H_io _n ) and DPA external data bus control units (H_exDι, H_exDι, ..., H__exD ") Are switched on.

15. Circuit arrangement according to claim 11, characterized in that the local data channel manager (Hι, H ₂ , .. ,, H _n ) with direct parallel data channel switch (K_DPAι, K_DPA ₂₅ ..., K DPA _x ) are expanded ,

16. Circuit arrangement according to claim 11, characterized in that the local processing units (Dι, D ₂ , ..., D ") are expanded with local output / input buses (LIOi, LIO ₂ , ..., LIO _n ).

17. Circuit arrangement according to claim 11, characterized in that the system has at least one global data memory (N) which is connected to the global control bus (GC), the global address bus (GA) and the global data bus (GD).

18. Circuit arrangement according to claim 11, characterized in that the system has at least one global auxiliary data memory (L) which is connected to the global control bus (GC), the global address bus (GA) and the global data bus (GD), also by the global auxiliary data bus (GS) is connected to the global arithmetic logic unit (P).

19. Circuit arrangement according to claim 11, characterized in that the system has an interruption control unit (S) controlled by the global control bus (GC), to which the global data bus (GD) is connected and with the global arithmetic logic unit the control bus (GM) interruption is connected, and has interruption inputs (lcikl.INT, ncikl.INT) and interruption confirmation output (acklNT).

20. Circuit arrangement according to claim 11, characterized in that the system has a, from the global control bus (GC) controlled, global output / input control unit (Q), to which the global data bus is connected, also via a data bus of the global From input control unit (G__ioD) and a control bus of the global output / input control unit (G_ioC).

21. Circuit arrangement according to claim 11 - 19, characterized in that the processing blocks (E) are carried out in a module system whose contact surface consisting of data buses and data lines is pin-compatible.

22. Circuit arrangement according to claim 21, characterized in that the global units of the system with the processing blocks (E) are pin-compatible.