WO2011120812A1

WO2011120812A1 - Cyclic priority change during data processing

Info

Publication number: WO2011120812A1
Application number: PCT/EP2011/054014
Authority: WO
Inventors: Eberhard Boehl; Ruben Bartholomae
Original assignee: Robert Bosch Gmbh
Priority date: 2010-03-31
Filing date: 2011-03-17
Publication date: 2011-10-06
Also published as: DE102010003565A1

Abstract

The invention relates to a circuit assembly for a data processing system for running several tasks by means of a central processing unit having a processing capacity attributed to the processing unit. Said circuit assembly is configured to associate the temporally offset processing unit with the respective task, and to control said processing unit so that the tasks are treated according to a predetermined sequence and the tasks without actual processing requests in the sequence, are skipped during processing. The invention also relates to a corresponding method and a corresponding computer programme.

Description

Description title

Cyclic prioritization change in a data processing

The invention relates to a circuit arrangement for a data processing system for processing a plurality of tasks by means of a central processing unit and a corresponding method for processing a plurality of tasks in a data processing system.

State of the art

In data processing systems, such as. In computer and microprocessor systems, control units, peripheral units and other information processing systems often called CPUs (Central Processing Unit) as the central processing units of a computer or even simple arithmetic-logical Units (ALU) used. Furthermore, memories such as RAM, ROM, EPROM, EEPROM, etc. are used to store programs and data. The processor or the CPU executes a program or executes a program. The program is usually composed of different subprograms, which in turn may be dedicated to different tasks. One speaks then of multitasking.

Depending on a current scenario, ie which tasks request processing at this time, it is decided which task is to be processed by the CPU. It is conceivable that different tasks are assigned different priorities, so that when processing the tasks, the priority assigned to the respective tasks is taken into account and, accordingly, the task of the highest priority is given priority. This happens, for example, by so-called interrupts. An interrupt is a short-term interruption of one program to another, prioritized perform higher-level or time-critical processing of another task. In this case, an interrupt request is initially made, whereupon an interrupt routine is executed and the previously processed task is interrupted and this task is continued at the interruption point after the interrupt has ended. This means that as a rule, a current request for a task that has a higher priority than the task currently being processed is given priority, and that the task already in progress is interrupted for this task. The interrupt in question causes the CPU to jump to a corresponding program section.

The aforementioned multitasking can also cause a so-called "time sharing", which apparently serves several users simultaneously. Generally, multitasking refers to the ability of a CPU to perform several tasks in parallel. In doing so, the different processes are always activated alternately at such short intervals that an impression of simultaneity arises.

However, the selection of priorities to be assigned to each task, and the processing times of high-priority tasks, may result in low-priority tasks being rarely executed and, in the extreme case, being unworkable. This can be the case if quasi-permanently mentioned above interrupt requests and thus the execution of a low prioritized in relation to other tasks task can not be fully completed here. In this case, it is necessary to take action to guarantee a so-called Worst Case Execution Time (WCET), thus guaranteeing a minimum of CPU working capacity for each task to be processed.

However, it has been shown in the past that such guarantees can be very costly and can also limit the performance of the underlying system. Furthermore, an interrupt controller is necessary here, which must also be equipped with a priority control.

Accordingly, it would be desirable to provide a possibility of using the available processing capacity of a central processing unit or CPU in such a way that it is easy to ensure that each processing unit can be processed. tende task that is currently requesting a processing, no matter which priority this task is in relation to all other tasks, is processed within a certain time.

Disclosure of the invention

Against this background, a circuit arrangement according to claim 1 and a corresponding method with the features of claim 9 are provided.

The circuit arrangement provided according to the invention can, for example, be implemented in a data-processing architecture and correspondingly allocate tasks currently to be processed to a processing unit available to the data-processing system, for example a CPU or ALU. Suitable embodiments of the circuit arrangement presented according to the invention as well as the method presented according to the invention emerge from the respective dependent claims and the description.

Core and advantages of the invention

According to claim 1, a circuit arrangement is provided for a data processing system for processing a plurality of tasks by means of a central processing unit having a processing capacity associated with the processing unit. The circuit arrangement provided according to the invention is configured to allocate the processing unit to the processing unit for processing in a time-delayed manner, and also to control that the tasks are processed in the order to be preselected and tasks without current processing request are skipped in the order in the processing.

According to a possible embodiment of the circuit arrangement provided according to the invention, the circuit arrangement is further configured to each assign a channel with its own registers to each of the tasks and to select the respective registers according to the assignment of the respective task to the processing unit and to connect to the processing unit. Furthermore, it can be provided that the circuit arrangement is configured to assign the processing unit each of the tasks with the current processing request, ie each of the so-called active tasks, for a constant time duration which is the same for all active tasks. A passive task is tasks without a current processing request, ie tasks that currently signal no processing. By contrast, in the context of the present description, active tasks are tasks which make a current processing request and signal this accordingly, so that they are taken into account during processing by the processing unit.

The circuit provided according to the invention therefore provides for the processing capacity or computing capacity of the processing unit provided in the data processing system, such as an ALU or CPU, to be divided equally among all active tasks. According to a further embodiment of the circuit arrangement provided according to the invention, the time duration corresponds exactly to one cycle of a clock cycle of the processing unit.

Furthermore, it may be provided that the circuit arrangement is configured to provide a processing of instructions of the respective registers of the respective channels of the active tasks in a pipeline having a plurality of pipeline stages, wherein the respective registers at the time of processing in appropriately clocked pipeline registers Pipeline be switched and the pipeline stages in a period of time in parallel, but each processed for different channels.

In this case, it can be provided that instruction decoding and memory access for a first active task occur in each case in time segments in which at least one second active task is assigned to the processing unit.

According to a further embodiment of the circuit arrangement provided according to the invention, it is provided that a calculation of the assignment of a task which is currently under processing in a task following in FIG

Depending on which is the task currently being processed and Furthermore, which tasks make a processing request or signal a request for processing at the time of calculation.

There is thus a prioritization of the active tasks with variable priority. This means that all active tasks are processed in an order and the order depends on which task is being executed or is currently being processed. This will be explained in more detail below with reference to FIGS. By means of the circuit arrangement provided according to the invention, there is a precisely defined above-mentioned Worst Case Execution Time (WCET), which, however, is only fully utilized if all tasks are active in the specified sequence, ie. H. all tasks have each submitted a corresponding processing request. Each active task is allocated the processing capacity of the central processing unit in the given order. If additional initially passive tasks of the order become active, then these are immediately integrated into the given order or are incorporated into the given order. Inactive tasks in the order are skipped as described above.

It is conceivable that the calculation of the assignment of the respective respective subsequent task is carried out in a forward-looking manner by means of a so-called pipelining.

Furthermore, the present invention relates to a method for processing a plurality of tasks in a data processing system by means of a central processing unit with a processing capacity allocated to the processing unit. It is provided to assign the respective tasks the processing unit for processing offset in time, to edit the tasks in a predetermined sequence and to skip tasks without current processing request in the order in the processing.

Further advantages and embodiments of the invention will become apparent from the description and the accompanying drawings.

It is understood that the features mentioned above and those yet to be explained not only in the combination specified in each case, but which can also be used in other combinations or alone, without departing from the scope of the present invention.

FIG. 1 shows an architectural model in which the method proposed according to the invention can be carried out.

FIG. 2 shows a possible embodiment of a pipelining, according to which command processing can be carried out according to the method proposed according to the invention.

FIG. 3 shows a possible embodiment of the circuit arrangement provided according to the invention, in particular when determining the respective next task to be performed.

FIG. 4 shows the embodiment illustrated in FIG. 3 in detail.

FIG. 5 shows an embodiment for realizing a barrel shifter with 16 × 16 multiplexers used in the embodiment of FIGS. 3 and 4.

FIG. 6 shows a possible embodiment of a "first one detector" with 32 multiplexers used in the embodiment of FIGS. 3 and 4.

FIG. 7 shows a possible implementation of an addition modulo 17 with an overflow look-ahead (OLA), as may be provided in one embodiment of the circuit arrangement provided according to the invention.

FIG. 8 shows a possible embodiment of a method provided according to the invention, in which an additional pipeline stage for task selection is provided.

Figure 9 shows a rotating register as an alternative to the barrel shifter provided in the embodiment of Figures 3 and 4. Embodiments of the invention

The invention is schematically illustrated by means of embodiments in the drawings and will be described schematically in detail with reference to the drawings. Here is a detailed description in structure and

Function given.

The method proposed according to the invention or the circuit arrangement provided according to the invention can be implemented, for example, in a so-called multi-channel sequencer (MCS) of a generic timer module (GTM), a possible architecture of such a multi-channel sequencer being illustrated in FIG.

The Multi Channel Sequencer (MCS) 100 shown here serves several channels, for example 8 or 16, which corresponds to the tasks to be processed. This means that each task to be processed is assigned a channel. The MCS 100 has a central processing unit 10, such as an ALU, and a memory 20, such as a RAM. In the case shown here, the MCS serves T channels, each channel having its own microprogram located at different locations in the memory. To execute this program, each channel has its own command register (IR) 35, its own instruction counter (PC) 25, its own status register (STA) 45 and its own so-called General Purpose Register (GPR) 55, which in the present representation with 0 ... N-1 are designated.

That is, in the MCS architecture shown here, T command registers ( ^" HR ^" 35, T command counters (T ^* PC) 25 and T status registers (T ^* STA) 45 are shown.) Each channel also has its own interface 30, such as an ARU interface via which data in the respective GRPs 55 can be updated asynchronously, ie without waiting for processing of a corresponding channel, or data can be output.

The plurality of existing interfaces 30, corresponding to the number of channels, is made clear by an indicated "overlaying" of the symbols respectively representing the interfaces. The same applies to the number of command registers 35, command counters 25, general purpose registers 55 and status registers 45 and ABC registers 65.

For synchronization purposes, processing of a channel is blocked, which means that an operation to be performed and thus another program sequence of the corresponding channel only takes place when requested data has arrived or been picked up via the respective ARU interface 30 assigned to the corresponding channel. Furthermore, it is provided in the architecture shown here that an ARU interface 30 of a channel is completed by its own control bit register (ACB) 65. These control bits are forwarded to the ARU interface 30 with the data of the GPRs 55, or are updated with each ARU read command. The blocking state of a channel is signaled by setting a corresponding bit (eg in status register (STA) 45). All other channels continue to work their program. The cancellation of a blocking command is initiated asynchronously (i.e., whether the channel is in the pipeline) via the ARU interface 30 as soon as data has been received from the General Purpose Register 55. All channels served by the MCS 100 use the same central arithmetic logic unit 10 (ALU), the same instruction decoder 40, the same instruction predecoder 50, the same memory 20 and the same address decoder 15 in the architecture shown here for the memory 20, as shown in FIG. In the MCS architecture 100 shown here, a host CPU accesses the memory 20 (RAM) via a host CPU interface 60

Expiration, d. H. Data is transferred to or from the handshake interface 60 of the host CPU in a dedicated cycle.

In accordance with the method provided according to the invention and the circuit arrangement provided according to the invention, it is first determined which channel, ie which task should be processed next in each case within the scope of a processing cycle. The next channel to be processed is determined as a function of a count (CNT) with respect to currently active tasks and a request signal (RDYi) of each task i. This will be explained in more detail below with reference to FIG. 3. It is provided that, on the one hand, each task is independent of a priority assigned to it Compared to other tasks can be processed, but the respective task is only processed if it is to be considered as an active task, as described above. This means that tasks that do not make a request for processing at a current time, ie send out a request signal, are skipped during execution in the order of tasks. That is, the order that is given includes all the tasks to be processed, but when processing the tasks, care is taken in the sequence as to whether the respective task, which is to be performed according to the order, is also to be regarded as an active task, ie whether this is - There is currently a request signal has sent out.

Each active task i and associated therewith each active channel i, whose request signal is thus set to 1, d. H. RDYi = 1, always processing exactly one cycle of the processing cycle or another equal processing time. Furthermore, it can be provided that the CPU also has a so-called reserved

Task served, d. H. at least one additional channel is reserved for it. As an additional channel while writing or reading the RAM or memory 20 is considered. Since this memory 20 is advantageously implemented as a single port RAM (for the purpose of saving hardware compared to a dual port RAM), writing or reading the RAM 20 would be done by a

Host CPU via the host CPU interface 60 may conflict if another process simultaneously requires access to the RAM 20. This other process can be the preparation of the program for a currently edited channel. If an additional channel is reserved for the host CPU 60, then only the host CPU gets access to the RAM 20 via the host CPU interface 60 in the corresponding time slot 250 (see FIG. 2). In this case, for example, it may also be provided that possible interrupt requests are served via this additional channel if this request is connected to the reading or writing of the RAM 20.

Processing of instructions in accordance with a possible embodiment of the circuit arrangement provided according to the invention is carried out in a pipeline with a plurality of pipeline stages. In this case, four pipeline stages are preferably distinguished. The pipeline, or pipeline, is a kind of assembly line used to decompose the execution of instructions into sub-instructions according to the number of pipeline stages, the pipeline stages for multiple channels (ie tasks) can be performed in parallel, but each for a different channel (see Figure 2). This means that instead of a task being completely processed during a processing cycle of the processing unit, only one partial task is executed at a time, whereby, however, different subtasks of several tasks are processed simultaneously. The pipeline stages may be preferably:

Stage 0: RAM access decoding

Stage 1: RAM access

Stage 2: Command predecoding

Stage 3: Command processing

In Stage 0, addresses and control signals are initially formed for the RAM access pending in the next pipeline stage. RAM access may be reading a date or command or writing a date. In the case of reading a command, the address is formed from the relevant instruction counter (PC). Stage 0 is represented by the reference numeral 0 in FIGS. 1 and 2.

Stage 1 then accesses main memory 20 (RAM), with the corresponding instruction being loaded from main memory 20. Stage 1 is shown in Figure 1 and 2 respectively with reference numeral 1.

In Stage 2, the instruction predecoder 50 then uses a command predecode. Stage 2 is shown in Figure 1 and 2 respectively with reference numeral 2.

Finally, in stage 3, the command processing takes place, which is carried out for all tasks by one and the same processing unit 10, such as an ALU. Stage 3 is shown in Figure 1 and 2 respectively with reference numeral 3.

All pipeline stages or pipeline stages are processed in parallel, but each for a different channel or a task assigned to the corresponding channel. However, according to the circuit arrangement proposed here, the processing takes place only for active tasks or associated active channels. The type of temporal processing is shown in FIG. 2 with the aid of a so-called called pipeline flows. When channel C _a 204 first starts decoding the RAM address (stage 0), channel C _{a will} execute RAM access (stage 1) in the next clock, while channel C _b 206 is busy decoding the RAM address ( Stage 0). The corresponding "own" registers T ^* PC or ^" HR ^" , respectively represented by 25 or 35 in FIG. 1, are automatically switched in. Commands which require only one processing cycle execution time process the operands 200 in stage 3 and, if necessary, 300 from the own registers or the direct operands from the instruction word and write the result back to the corresponding registers For a write-back of data in the memory 20 (RAM) a further processing cycle is required by a bit in the status register STA 45 of the corresponding channel This means that if the corresponding channel is again processed in Stage 0, the corresponding RAM address is decoded and the data is then in Stage 1 from the corresponding General Purpose Register (GPR) 55 in the Memory 20 (RAM) is written similarly to operands from memory 20 (RAM). A whole pip The eline flow is required to load data from the memory 20 (RAM) into the corresponding GPR register 55. Only at the next processing cycle can this operand be processed. For commands which require several processing cycles, it should be noted that the corresponding task or the corresponding channel remains active, ie the corresponding request signal RDYi remains set until the command has been completely executed.

A special register TRG 75 provides the possibility of triggering channels among each other. The channels themselves can use time or position-related signals (TBU Time Base Unit) 85 to place an event into a current time reference or to control it in dependence on positions. This comparison is carried out in the processing unit 10, for example an ALU, in which the TBU (Time Base Unit) 85 can form an operand. An access to data of the respective interface 30, for example an ARU interface, is expected from the corresponding channel and the blocking reading ensures that no data inconsistency can occur. The instruction predecoder 50 allows a data to be provided from the memory 20 (RAM) in the next cycle of operation. This will be a RAM date into an instruction that writes the date in the required destination register. In addition, the instruction predecoder 50 ensures that with the blocking instructions enabled (the corresponding bit in the control register is set), the subsequent instructions from the previous pipeline stages are discarded. While a blocking command is still active, the subsequent command is already processed in pipeline stages 0 and 1 and transferred out of RAM, and in pipeline 2 a decision is made as to whether the command is now being processed in subsequent pipeline stage 3 ( if the blocking command was ended asynchronously via the ARU interface) or discarded (if the blocking command is still active).

FIG. 2 again illustrates the parallel processing of pipeline stages at T channels. In the case illustrated here, the pipeline comprises 4 pipeline stages, represented as Stage 0, Stage 1, Stage 2 and Stage 3. A processing cycle 202 corresponds to, for example, T + 1 clocks. After a start-up phase, exactly one channel is assigned to a pipeline stage in each cycle and processed accordingly. Therefore, the pipeline stages are processed in parallel but each for a different channel. As already mentioned, the CPU is allocated at least one additional time slot 250, whereby the processing cycle comprises 202 T + 1 clocks.

FIG. 2 shows in detail that in the first pipeline stage, during a first processing cycle 202, a channel C _a 204, a channel C _b 206, a channel C _c 208, a channel C _d 210, etc. are processed one after the other. Finally, during the first processing cycle 202, the execution or execution of a channel C _k 218, which represents the T-th active channel, takes place. The last time slot 250 of the processing cycle 202 is reserved for the CPU or reserved for the CPU channel. In a subsequent processing cycle, a channel C _x 220, a channel C _y 222, a channel C _z 224, etc. are processed, which are always currently active channels.

For the further pipeline stages Stage 1, Stage 2, Stage 3, a processing of the respective active channels starts offset by one clock.

In this case, it is provided for the first pipeline stage stage 1 according to FIG. 2 that during the processing cycle 202, channel C _a 204, channel C _b 206, channel C _c 208, etc. are processed or executed. At the end of the machining cycle 202 Channel C _j 216 and channel C _k 218 are processed. In the subsequent processing cycle, a time slot 250 is first provided to the CPU and subsequently channel C _x 220, channel C _y 222, etc. are executed. Also offset by one clock for the second pipeline stage 2 during the first processing cycle 202, first, an execution of the channel C _a 204 and the channel C _b 206. At the end of the processing cycle channel C, 214 and channel Q 216 are processed , In the next processing cycle, first channel C _k 218 are processed, the time slot 250 is provided for the CPU and then channel C _z and so on are processed.

For the third pipeline stage 3, 202 channels C _a 204 etc. are processed during the first processing cycle. At the end of the first processing cycle 202, a processing or execution of a channel C _h 212 and of the channel C, 214 takes place. In the next processing cycle, the processing of the channel C _j 216 and of the channel C _k 218 takes place first.

In summary, according to the exemplary representation in FIG. 2, after a start-up phase in each processing cycle, for example, T processes active tasks and additionally provides a time slot for a host CPU or for a channel reserved for the host CPU, whereby each processing cycle T + 1 bars. The individual active tasks are each processed in all four pipeline stages, with a time delay for a single active task, here in FIG. 2, offset by one clock. Those shown in FIG. 2 indicate all active tasks which are processed in the processing cycle 202 without explicit designation.

FIG. 3 shows a circuit arrangement for determining the task to be carried out next during a processing.

FIG. 4 shows the embodiment of FIG. 3 in a more detailed form.

First, a so-called RDYi bit is set for each task. This bit signals whether the task in question is active or inactive, ie whether this task is to be processed or has to be skipped in the order in which it is processed. Thus, for each RDYi = 1, there is a requirement of task i for processing, as indicated by arrow 1_0. With the help of a combinatorial selection network 2_0, the next task to be processed is then determined in each case and displayed via a so-called new_C NT counter reading, as indicated by arrow 3_0. This information is then supplied to a corresponding flip-flop circuit 4_0, so that it is determined by means of a corresponding storage signal for the flip-flops the corresponding task to be processed next. The count CNT, as represented by arrow 5_0, indicates a task currently being processed, which is needed in the selection as a basis to be able to determine from the given order which task is to be processed next, taking into account the fact that some of the tasks following the order may not be active.

According to Figure 4, which shows further details to Figure 3 and in which therefore the same operations or units are provided with the same reference numerals is at

Determination of the next active task of, for example, 17 possible tasks initially in a barrel shifter 2_1 as part of the selection network 2_0 a correspondingly present RDY vector "RDY_rot" 2_3 shifted CNT positions to the right and fed overflowing bits on another side again , One can also see this shift as a rotation to the right around CNT

Introduce bits. The RDY bit values i of the input vector 1_0 are stored in memory units (eg flip-flops), these flip-flops are interconnected as a shift register and the content of this shift register is shifted to the right by CNT positions, taking the right of the shift register The bit values shifted outwards are again fed as input to the shift register on the left (see also FIG. 9, bottom). In order to save hardware, but it is usually cheaper to use a combinatorial circuit instead of such a shift register. Such a circuit is known in the art as a barrel shifter. Details of a corresponding circuit of said barrel shifter 2_1 are shown and described below in FIG. The bit

RDY_rot (0) is not generated because it corresponds to the task being processed and therefore always has to be 1. The here shown and in use located barrel shifter 2_1 requires about 640 gate equivalents. The result of the above rotation is a RDY_red vector 2_3 normalized to the corresponding CNT value. The number of least significant (LSB) "zeros" (except in position 0) in the one to be determined RDY_rot vector indicates with its bit values whether the next tasks to be processed are active (bit = 1) or passive (bit = 0). The described rotation with the barrel shifter is necessary because depending on the current processing status of the tasks (and thus the value of CNT, 5) another task would be the next one (regardless of the activity of this task). The bit values in RDY_rot (2-3) then show at position 0 (LSB, Least Significant Bit) the value of the RDY bit of the task currently being processed at position 1, the RDY bit of the task next to the row would be the bit at location 2, the RDY bit of the task after next, and so on.

In the barrel shifter 2_1, due to the reasons given above, the determination of the value RDY_rot (0) can be dispensed with. With the aid of a so-called first-one detector circuit (FOD) 2_2, it is determined at which lowest point in the RDY_rot vector there is a "1". The least significant digit, starting from the position 0 indicating the task currently being processed, indicates the next task in the given order, which is active and therefore to be processed next. At this point it should be emphasized again that a 1 at the position 0 of the RDY_rot vector does not matter, because this 1 corresponds to the current processed task or is currently being processed and therefore always has to be 1 at this position. It is therefore determined only a 1 above the 0 position. The position is represented in the value FOP (First One Position) 2_4 binary coded. The circuit for determining the FOP bits is shown in FIG. 6 and, in this implementation, requires approximately 80 gate equivalents. A description by means of a hardware description language is easily possible for this circuit. The structure can be described, for example, in VHDL for the bit FOP (0) (declared as a signal) as follows:

IF (RDY_rot (1) = '1') THEN FOP (0) <= Ύ;

ELSIF (RDY_rot (2) = '1') THEN FOP (0) <= Ό ';

ELSIF (RDY_rot (3) = '1') THEN FOP (0) <= Ύ;

ELSIF (RDY_rot (4) = '1') THEN FOP (0) <= Ό "; ELSIF (RDY_rot (15) = ^' 1 ^' ) THEN FOP (0) <= T;

ELSE FOP (0) <= Ό ^' ;

END IF;

For the bit F0P (1) you get:

IF (RDY_rot (1) = ^' 1 ^' ) THEN FOP (1) <= Ό ^' ; ELSIF (RDY_rot (2) = ^' 1' OR RDY_rot (3) = ^' 1 ^' ) THEN FOP (1) <= Ί ^' ; ELSIF (RDY_rot (4) = T OR RDY_rot (5) = ^' 1 ^' ) THEN FOP (1) <= Ό ^' ; ELSIF (RDY_rot (6) = ^' 1 ^' OR RDY_rot (7) = ^' 1 ^' ) THEN FOP (1) <= Ί ^' ;

ELSIF (RDY_rot (14) = ^' 1 ^' OR RDY_rot (15) = ^' 1 ^' ) THEN FOP (1) <= Ί ^' ; ELSE FOP (1) <= Ό ^' ;

END IF;

Continue for F0P (2):

IF (RDY_rot (1) = ^' 1 ^' OR

RDY_red (2) = ^' 1 ^' OR RDY__rot (3) = ^' 1 ^' ) THEN FOP (2) <ELSIF (RDY_red (4) = ^' 1 ^' OR RDY_rot (5) = ^' 1 ^' OR

RDY_rot (6) = ^' 1 ^' OR RDY _" red (7) = ^' 1 ^' ) THEN FOP (2) <ELSIF (RDY_red (8) = OR RDY_rot (9) = ^' 1 ^' OR

RDY_rot (10) = ^' 1 ^' OR RDY_rot (11) = Ί ^' ) THEN FOP (2) <ELSIF (RDY_red (12) = ^' 1 ^' OR RDY_rot (13) = ^' 1 ^' OR

RDY_rot (14) = ^' 1 ^' OR RDY_red (15) = ^' 1 ^' ) THEN FOP (2) <ELSE FOP (2) <= Ό ^' ;

END IF;

For FOP (3):

IF (RDY_rot (1) = ^' 1 ^' OR

RDY_rot (2) = ^' 1 ^' OR RDY_rot (3) = ^' 1 ^' OR

RDY_rot (4) = ^' 1 ^' OR RDY_rot (5) = ^' 1 ^' OR

RDY_rot (6) = ^' 1 ^' OR RDY_red (7) = ^' 1 ^' ) THEN FOP (3) <= ^' 0 ^' ELSIF (RDY_red (8) = ^' 1 ^' OR RDY_rot (9) = ^' 1 ^' OR

RDY_rot (10) = ^' 1 ^' OR RDY_rot (11) = ^' 1 ^' OR

RDY_rot (12) = ^' 1 ^' OR RDY_rot (13) = ^' 1 ^' OR

RDY_rot (14) = ^' 1 ^' OR RDY_rot (15) = ^' 1 ^' ) THEN FOP (3) <= Ί ^' ; ELSE FOP (3) <= Ό ^' ;

END IF;

and finally for FOP (4):

IF (RDY_rot (1) = ^' 1 ^' OR

RDY_rot (2) = ^' 1 ^' OR RDY_rot (3) = ^' 1 ^' OR

RDY_rot (4) = ^' 1 ^' OR RDY_rot (5) = ^' 1 ^' OR

RDY_rot (6) = ^' 1 ^' OR RDY_rot (7) = ^' 1 ^' OR

RDY_rot (8) = ^' 1 ^' OR RDY_rot (9) = ^' 1 ^' OR

RDY_rot (10) = '1'OR RDY_rot (11) = Ί' OR

RDY_rot (12) = ^' 1 ^' OR RDY_rot (13) = ^' 1 ^' OR

RDY_rot (14) = ^{: '} 1 ^' OR RDY_rot (15) = ^' 1 ^' ) THEN FOP (4) <ELSE FOP (4) <= RDY_ END IF; The thus determined FOP vector 2_4 is added as a binary value and added to the current CNT value 5_0, as shown in FIG. The adder 2_5 performs, for example, an addition modulo 17 and usually requires about 50 gate equivalents. For this, a correction value 15 must be added to the result, in the event that a value greater than 16 was determined as the result. In order to obtain the corrected new_CNT vector, one may need an additional clock phase as soon as an overflow occurs. For this purpose, after performing the addition of CNT and FOP in a preliminary result vector new_CNT_v, it is checked whether the Most Significant Bit (MSB) is equal to "1" and another bit in the result value is equal to 1 and adds in this wall the Add correction value. This can be described by the following equation in the hardware description language VHDL:

IF (new_CNT_v (16) = '1' AND (new_CNT_v (15) = '1' OR new_CNT_v (14) = '1' OR new_CNT_v (13) = '1' OR new_CNT_v (12) = '1' OR .. OR new_CNT_v (0) = '1')

THEN new_CNT <= new_CNT_v + 15;

ELSE new_CNT <= new_CNT_v;

However, it is also possible to connect an additional adder 7_2 with the corresponding correction value to the first adder 2_5, as shown as an alternative embodiment in FIG. 7, only a section of the selection network 2_0 being shown here. For this purpose, the adder 7_2 is supplied either with 15 (OxF) in the event of an overflow 7_4 (OV = Overflow) or with 0 (0x0) in the case of a value less than or equal to 16. In this case, it is suitably possible to use a circuit 7_5, which anticipates an overflow 7_4 (overflow look ahead, OLA). Such a circuit is described below by means of Table 1 and the Boolean equations also described below:

CNT value FOP value OV FOP value OV

16> 1 1 0 0

15> 2 1 <1 0

14> 3 1 <2 0

13> 4 1 <3 0

12> 5 1 <4 0

11> 6 1 <5 0

10> 7 1 <6 0

9> 8 1 <7 0

8> 9 1 <8 0

7> 10 1 <9 0

6> 1 1 1 <10 0

5> 12 1 <1 1 0

4> 13 1 <12 0

3> 14 1 <13 0

2> 15 1 <14 0

1 16 1 <15 0

0 - 0 - 0

Table 1: Value table for OV

In FIG. 7, the reference numeral 7_6 is assigned to the CNT value, the reference numeral 7_8 to the FOP value, the reference numeral 7_10 to the new new_CNT vector, and the reference numeral 7_1 to the provisional result vector new_CNT_v.

Boolean equation for OV in direct implementation of Table 1 with ^Λ = conjunction and v = disjunction:

CNT (4) ^Λ (FOP (4) v FOP (3) v FOP (2) v FOP (1) v FOP (0)) v

CNT (3 ^Λ CNT (2) ^Λ CNT (1) ^Λ CNT (0) ^Λ (FOP (4) v FOP (3) v FOP (2) v FOP (1)) v CNT (3 ^Λ CNT (2) ^Λ CNT (1) ^Λ (FOP (4) v FOP (3) v FOP (2) v (FOP (1) ^Λ FOP (0))) v CNT (3 ^Λ CNT (2) ^Λ CNT (0) ^Λ (FOP (4) v FOP (3) v FOP (2)) v

CNT (3 ^Λ CNT (2) ^Λ (FOP (4) v FOP (3) v (FOP (2) ^Λ (FOP (1) v FOP (0)))) v

CNT (3 ^Λ CNT (1) ^Λ CNT (0) ^Λ (FOP (4) v FOP (3) v (FOP (2) ^Λ FOP (1))) v

CNT (3 ^Λ CNT (1) ^Λ (FOP (4) v FOP (3) v (FOP (2) ^Λ FOP (1) ^Λ FOP (0))) v

CNT (3 ^Λ CNT (0) ^Λ (FOP (4) v FOP (3)) v

CNT (3 (FOP (4) v FOP (3) ^Λ (FOP (2) v FOP (1) v FOP (0)) v

CNT (2 CNT (1) ^Λ CNT (0) ^Λ (FOP (4) v FOP (3) ^Λ (FOP (2) v FOP (1)) v

CNT (2 CNT (1) ^Λ (FOP (4) v FOP (3) ^Λ (FOP (2) v (FOP (1) ^Λ FOP (0)))) v

CNT (2 CNT (0) ^Λ (FOP (4) v (FOP (3) ^Λ FOP (2))) v

CNT (2 (FOP (4) v (FOP (3) ^Λ FOP (2) ^Λ (FOP (1) v FOP (0)))) v

CNT (1 CNT (0) ^Λ (FOP (4) v (FOP (3) ^Λ FOP (2) ^Λ FOP (1))) v

CNT (1 (FOP (4) v (FOP (3) ^Λ FOP (2) ^Λ FOP (1) ^Λ FOP (0))) v

CNT (0 FOP (4) This Boolean equation still offers optimization potential for the compilation and is estimated with approx. 50 gate equivalent implementation costs.

It should be noted that the total cost of the proposed circuit arrangement according to the invention, as shown as a possible embodiment in Figure 4, with approximately 900 gate equivalents in a range that does not the associated hardware savings, which is achieved by the provided Multitask- king surpasses. The circuit arrangement provided according to the invention ensures that the latency for processing the tasks is shortened if not all tasks simultaneously make a request to the central processing unit. Furthermore, it is possible that the CNT and new_CNT values 7_6, 7_10 are calculated in a forward-looking manner. For this purpose, an additional pipeline stage can be provided, which allows the CNT value 7_6 to take effect one clock later for the tasks or channel selection. For this purpose, CNT is delayed by one clock stored in old_CNT and this value determines the selection of the current task, as shown by way of example in Figure 8.

In FIG. 8, the same reference numerals are again used for identical processes or units as in the previously described embodiments of FIGS. 3, 4 and 7. In the signal 3_1, however, the next but one task to be processed is displayed here as new_CNT 7_10, which is fed to a corresponding flip-flop circuit 4_1. In the signal 5_1, the next task to be processed is displayed as CNT value 7_6, which in turn is then fed to the selection network 2_0 and is then displayed via a flip-flop circuit 4_2 in old_CNT 6_0 as a task to be processed currently.

In a further possible embodiment of the circuit arrangement proposed according to the invention, as an alternative to the barrel shifter shown in FIG. 5, a register with 17 bits which rotates in accordance with the CNT values can also be used, as shown in FIG. An infeed of the RDYi

Values in this register take place at a fixed position, in FIG. 9 this is position 0, and depending on the rotation state, the corresponding RDYi bit is stored at that position. This embodiment of the circuit arrangement proposed according to the invention requires, with approximately 200 gate equivalents, a significantly lower hardware outlay than the barrel shifter illustrated in FIG. However, it should be noted that the RDY signals can only be taken into account after an additional latency due to a rotation of the register. Furthermore, it should be noted that when several tasks are skipped due to the associated increase in the CNT value by more than 1, rotation must take place in several cycles. A correspondingly forward-looking calculation in the form of pipelining in several stages increases the latency time again.

It should also be noted that the prioritization, as described above, can also be switched off by assigning each task a processing time, irrespective of a processing request. This is FOD on the

Value 0x002, 0x003, ... or set OxFFF, regardless of the values RDY_rot. This causes at least the next task to be active (bit 1 = 1).

Claims

R. 331301 2011/120812 PCT / EP2011 / 054014 - 21 - Claims

1 . Circuit arrangement for a data processing system for processing a plurality of tasks by means of a central processing unit (10) having a processing capacity associated with the processing unit, wherein the circuitry (100) is configured to assign to the respective tasks, the processing unit (10) offset for processing that the tasks are edited in a predefined order, and that tasks without a current edit request are skipped in the order in which they are edited.

2. Circuit arrangement according to claim 1, wherein the circuit arrangement

(100) is further configured to each assign a channel with its own registers to each of the tasks and to select the respective registers according to the assignment of the respective task to the processing unit (10) and to connect them to the processing unit (10).

3. Circuit arrangement according to claim 1, wherein the circuit arrangement is further configured to associate the processing unit with each of the current processing request tasks for a constant time duration that is the same for all active tasks.

4. A circuit arrangement according to claim 3, wherein the time period corresponds to a clock of a clock cycle of the processing unit (10).

5. Circuitry according to claim 2, wherein the circuit arrangement is further configured to provide execution of commands of the respective registers of the respective channels of the tasks with current processing request as active tasks in a pipeline with multiple pipeline stages, wherein the respective registers are switched at the time of processing into correspondingly clocked pipeline registers of the pipeline and the pipeline stages are processed in parallel in a time period, but in each case for different channels. R. 331301

2011/120812 PCT / EP2011 / 054014

- 22 -

6. Circuit arrangement according to claim 5, wherein instruction decoding and

Memory accesses for a first active task in each case take place in time segments in which at least one second active task is assigned to the processing unit.

7. Circuit arrangement according to one of the preceding claims, wherein

a calculation is made of the assignment of a task following a task currently being processed as a function of this, which is the task currently being processed and which tasks make a machining request at the time of calculation.

8. Circuit arrangement according to claim 7, wherein the calculation of the assignment of the corresponding subsequent task is carried out by means of predictive Pelinening.

9. A method for processing a plurality of tasks in a data processing system by means of a central processing unit having a processing unit associated processing capacity, wherein the processing unit is allocated to the respective tasks for processing offset, the tasks processed in a predetermined order and tasks without current processing requirement in the Sequence to be skipped during editing.

10. The method of claim 9, wherein each of the tasks is assigned a channel with its own registers and the respective registers are selected according to the assignment of the respective task to the processing unit and connected to the processing unit.