WO2012069831A1

WO2012069831A1 - Method and arrangement for a multi-core system

Info

Publication number: WO2012069831A1
Application number: PCT/GB2011/052303
Authority: WO
Inventors: Keith Athaide
Original assignee: Tte Systems Ltd
Priority date: 2010-11-24
Filing date: 2011-11-23
Publication date: 2012-05-31
Also published as: WO2012069830A1

Abstract

A method of communicating data between a plurality of processors in a multi-core system via a plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas. For all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to any one time, whilst for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time. The method comprises identifying a first write operation scheduled to be performed by a first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor, and in response identifying a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to a second processor, wherein the second shared memory area is in a first state where it is available to receive data. The method further comprises writing data associated with the first write operation to the first shared memory area in at least the first buffer, and in response thereto changing the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area, and writing said data to the second shared memory area of the second buffer, to thereby ensure each corresponding shared memory area contains the most recent version of data.

Description

METHOD AND ARRANGEMENT FOR A MULTI-CORE SYSTEM

TECHNICAL FIELD

The invention relates to a method and system for enabling data to be exchanged between processors in a multi-core system.

BACKGROUND

Microprocessors are typically associated with computing based

applications such as running software on devices like personal computers, smart-phones, games consoles and so on. In such devices, a

microprocessor is usually arranged to run complex software enabling the device to perform many different functions.

However, microprocessors are also employed to control systems that are designed and built to perform more specific functions. These systems are sometimes referred to as "embedded" systems. As is known in the art, embedded systems are used to provide control for a very large range of applications. For example, an embedded system employing a simple microprocessor might be used for controlling a domestic appliance such as a washing machine or an oven. On the other hand, in more complex examples a more sophisticated embedded system might be used: in an avionics system; in an aircraft or in a control system for a robotic arm used in a factory.

Some embedded systems employ a so-called "time-triggered" interrupt system to provide improved predictability and stability in the behaviour of the microprocessor.

A conventional microprocessor includes a number of interrupt lines. As is known in the art, these interrupts allow a task currently being executed on the processor to be interrupted by an external event. When designing such a system, it is difficult to say with complete certainty that the processor will always behave as intended because it is very difficult to anticipate the effect of every conceivable interrupt occurring at every conceivable point during every possible task that might be run on the microprocessor.

On the other hand, a system using a "time-triggered" technique is arranged such that the number of interrupt lines is reduced, typically to a single interrupt line. Moreover time-triggered systems are typically designed such that the point in time during task execution when an interrupt will occur is determined prior to the interrupt occurring. In some examples, time-triggered systems are designed such that interrupt timing is known only to the extent of when the next interrupt will occur. In other examples, more precise interrupt timing is known, for example the point in time at which every interrupt will occur is known.

In some examples of the time-triggered technique, such as a "time- triggered co-operative" technique, a system is designed such that every task performed by the processor always runs to completion. In other words the system and software for the system are designed such that a task being performed on a processor is never interrupted by an external event. The time-triggered techniques described above can reduce to some extent the speed at which tasks are executed and also typically require specific time-triggered software to be written, specifically for use in the time- triggered system. However, employing such techniques can significantly ease the process of predicting a microprocessor's behaviour, and that of the system of which it is a part. This is because all that is necessary is to model the behaviour of a time-triggered system is to separately model the execution of each task and the effect of the interrupts occurring at the known times. This alleviates the requirement to undertake the complex and time-consuming task of modelling the effect of interrupts occurring at non-predetermined times. Accordingly, time-triggered systems are particularly useful in safety critical applications such as avionics, where a trade off in speed is acceptable for an increase in predictability of the behaviour of the system. Another characteristic of time-triggered systems is that they have an inherent "single writer" restriction. In other words, as each task is run to completion, it can be assumed that for a given memory location at any single point in time there will only ever be one task (i.e. "writer") writing to memory.

Returning to microprocessor systems in general, as the demand for more complex systems has grown, it is now common, particularly in computing applications, to employ microprocessor based systems which include more than one microprocessor core. In other words, rather than having a single processor executing all of the tasks required by the system, multiple cores are provided and the various tasks to be performed by the system are divided amongst the cores. This multi-core approach has also begun to be employed in embedded systems.

In a single-core processor, tasks frequently communicate by reading and writing to common memory locations. However, when executed on different cores of a distributed memory multi-core, the task design must change significantly to allow messages to be passed or to use some other mechanism to exchange data between tasks. In other words when designing a multi-core system, techniques have to be employed to enable the tasks operating on the different microprocessors to communicate data with each other in such a way that data integrity is maintained. This must be done even if the tasks are operating asynchronously (e.g. writing and reading data at different times) and running at different rates (e.g. writing and reading data at different rates). Thus, with an increase in the number of cores in a system, a mechanism is necessary for tasks on different cores to communicate. In particular it is desirable to take advantage of the characteristics of time-triggered systems to provide an improved technique for managing multi-core systems.

In the following, the core that is executing a task that produces data is termed a writer and the core executing a task that receives data is termed a reader. The reader and writer may be executing asynchronously, where the read and write tasks run at different rates, as shown schematically in Figure 1 .

Since the periodicity of the applications is a part of the application design, it can be assumed that the writer will buffer data appropriately at the application level if it runs faster than the reader. For example, in Figure 1 (a), the writer will only use one buffer while in Figure 1 (b), the writer will create and use two buffers at the application level.

Figure 2 shows a schematic illustration of possible overlaps between a writer and reader (Kopetz et al. 1993), where a single-core processor provides control over concurrent task execution by the option of utilising pre-emption and a multi-core device will always exhibit concurrent task executions. The overlaps can be seen in Figure 2 (a) and (b) where tasks occur within each other, Figure 2 (c) where the writer is started while the reader is executing and Figure 2 (d) where the reader is started while the writer is executing. Such overlap is permissible but requires special measures to maintain data integrity.

SUMMARY OF THE INVENTION

In accordance with a first aspect of the present invention there is provided a method of communicating data between a plurality of processors in a multi-core system via a plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas. For all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to any one time, whilst for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time. The method comprises identifying a first write operation scheduled to be performed by a first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor, and in response identifying a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to a second processor, wherein the second shared memory area is in a first state where it is available to receive data. The method further comprises writing data associated with the first write operation to the first shared memory area in at least the first buffer, and in response thereto changing the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area, and writing said data to the second shared memory area of the second buffer, to thereby ensure each corresponding shared memory area contains the most recent version of data.

In accordance with this aspect of the invention, a technique is provided which takes advantage of the characteristic of microprocessor based systems, such as "time-triggered" based systems, in which a "single-writer" condition is imposed such that at any one time, only a single task scheduled to be performed by the system will perform a write operation on any of the versions of a specific memory location held in various buffers within the system. It is recognised that by further imposing a "single- reader" condition such that at any one time, only a single task scheduled to be performed by the system will read from any of the versions of a specific memory location held in the various buffers connected to a specific processor, an advantageous technique for communicating data between tasks operating in a multi-core system can be provided. It should be noted that the "single-writer" condition is applied "globally" i.e. to all the buffers in the system. In other words, across the entire system only one version of a shared memory area is written to at any one time. On the other hand, the "single-reader" condition is applied locally. In other words, for the buffers in one buffer group attached to one processor only one version of a shared memory area is read from at any one time. However corresponding shared memory areas from buffers attached to other cores (i.e. from different buffer groups) may be read from during this time.

As explained above, multi-core systems must be arranged such that the communication of data between tasks run on individual processor cores does not cause conflict. In accordance with the first aspect of the present invention, the single-reader/single writer constraint is imposed and each processor core is provided with a plurality of corresponding buffers in which a number of versions of the shared memory areas are provided which are updated when shared memory areas are written to by other cores. Further, data that is most recently written to a shared memory area in a particular buffer is flagged as such by virtue of a state associated with that memory area in that buffer. As a result, different tasks are able to run independently on different cores with a reduced risk of the different tasks writing data to shared memory areas that will cause a conflict. In some examples, this risk may be effectively entirely mitigated.

Thus, although a write and a read operation may be performed on the same shared memory area (i.e. different versions of that shared memory area) at the same time by different processors, this can be tolerated because for a given processor core, the shared memory area is replicated in a number of buffers (allowing simultaneous reading and writing).

Further as write operations by one core are replicated at other cores and the most recent (and thus correct) data for a particular memory area is identified, the chances of inter-task conflicts arising are reduced. In some examples, this risk may be effectively entirely mitigated.

Thus, the ability for tasks to communicate via common memory locations even if running concurrently on a distributed memory multi-core with the use of hardware extensions is extended. As a result it is possible to continue developing applications (i.e. software) as if targeting a single- core system.

In some embodiments the method further includes reading data from the second shared memory area of the second buffer by identifying a first read operation scheduled to be performed by the second processor from the second shared memory area; identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data; and performing the first read operation by reading the data from the second memory area of the second buffer.

In some embodiments, after identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data, the method includes changing the state of the second shared memory area of the second buffer to indicate the second shared memory area of the second buffer is in use by the second processor. In some embodiments, identifying the first write operation comprises identifying the first write operation by a scheduler coupled to the first processor.

In some embodiments the plurality of buffers are clocked at a same rate as the plurality of processors. In some embodiments the plurality of buffers are arranged into a plurality of buffer groups, each processor being coupled to buffers of one buffer group, and each buffer group comprises three buffers.

In some embodiments the read operations performed by the plurality of processors and write operations performed by the plurality of processors are performed at different rates.

In accordance with a second aspect of the invention, there is provided a multi-core system comprising a plurality of processors and a plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas. The system is arranged such that for all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to at any one time, whilst for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time. The system further comprises a first scheduler coupled to a first processor of the plurality of processors arranged to identify a first write operation scheduled to be performed by the first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor. The system also includes a communication controller coupled to a second processor arranged in response to the identification of the first write operation to identify a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to the second processor, wherein the second shared memory area is in a first state where it is available to receive data. The communication controller is also arranged, in response to a writing of data associated with the first write operation to the first shared memory area in at least the first buffer, to change the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area, and to write said data to the second shared memory area of the second buffer, thereby ensuring each corresponding shared memory area contains the most recent version of data.

In one embodiment of the second aspect of the invention, the system is arranged so that the second processor is coupled to a second scheduler. The second scheduler is arranged to identify a first read operation scheduled to be performed by the second processor from the second shared memory area. In response the communication controller is arranged to identify that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data. In response the second processor is arranged to perform the first read operation by reading the data from the second shared memory area of the second buffer.

In another embodiment of the second aspect of the invention, after identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data, the communication controller is arranged to change the state associated with the second shared memory area of the second buffer to indicate that the second shared memory area of the second buffer is in use by the second processor.

In another embodiment of the second aspect of the invention the plurality of buffers are clocked at a same rate as the plurality of processors.

In another embodiment of the second aspect of the invention the plurality of buffers are arranged into a plurality of buffer groups, each processor being coupled to buffers of one buffer group, and each buffer group comprises three buffers. In another embodiment of the second aspect of the invention read operations performed by the plurality of processors and write operations performed by the plurality of processors are performed at different rates. In accordance with a third aspect of the invention there is provided a scheduler for use in a system arranged in accordance with the second aspect of the invention.

In accordance with a fourth aspect of the invention there is provided a communication controller for use in a system arranged in accordance with the second aspect of the invention.

In accordance with a fifth aspect of the invention there is provided a product comprising a system arranged in accordance with the second aspect of the invention.

Various aspects and embodiments of the invention are defined in the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention will now be described with reference to the accompanying drawings, in which:

Figure 1 provides a schematic diagram showing a prior art reader and writer operating synchronously in a microprocessor system; Figure 2 provides a schematic diagram showing a prior art reader and writer operating asynchronously, i.e. such that read operations and write operations overlap; Figure 3 provides a simplified schematic diagram of an embedded system in which examples of the present invention can be implemented; Figure 4 provides a schematic diagram of a processing unit arranged in accordance with an example of the present invention; Figure 5 provides a schematic diagram of a memory unit comprising three buffer memories arranged in accordance with an example of the present invention; Figure 6 provides a state diagram illustrating a change in states of the shared memory areas in the buffer memories; Figure 7 provides a schematic diagram of a first and second core module arranged in accordance with an example of the present invention; Figure 8 provides a schematic diagram of a connection of a communication controller and a number of core modules arranged in accordance with an example of the present invention; Figure 9 and Figure 10 provide a schematic diagram showing a change in state associated with shared memory areas of three buffers during the execution of a first and second task in accordance with an example of the present invention; and Figure 1 1 provides a flow diagram of a process performed in accordance with an example of the present invention.

DETAILED DESCRIPTION Figure 3 provides a simplified schematic diagram illustrating the basic arrangement of an example of an embedded system 1 1 . The system includes an input output (I/O) unit 12 for receiving input data into the system and for sending output data out of the system. The I/O unit 12 is connected via a system bus 13 to a processing unit 14. Schematically, the I/O unit 12 receives input from external sources, converts this to a suitable format and sends it to the processing unit 14 via the system bus 13. The processing unit 14 receives the input data and performs processing on it in accordance with a software program loaded on the processing unit 14. The processing unit 14 sends data back to the I/O unit 12 via the system bus as the processing is performed which is then converted into system output.

In the example shown in Figure 3, the processing unit 14 is a multi-core device. Rather than comprising a single processor core, the processing unit 14 instead comprises a plurality of core modules 10. Each core module typically comprises a discrete microprocessor and associated components as is known in the art.

Processing performed by the processing unit 14 is distributed across the plurality of core modules 10. As is known in the art, by using a number of core modules rather than a single core module, data processing of a greater complexity can be undertaken and/or data processing can be performed more quickly. However, when employing a processing unit with multiple cores, additional considerations must be taken into account.

As is known in the art, in a single core system, when software being run on the system is converted into instructions for the processing core, a scheduling entity groups the instructions into a set of tasks. A single processing core can be constrained to execute one task at a time and therefore inter-task conflicts due to concurrently running tasks can be substantially reduced or mitigated.

However, in a multi-core system, tasks execute concurrently, by choice, on the various processing cores. In order to take full advantage of the benefits provided by distributed processing cores, tasks running

concurrently on different cores are arranged to communicate data with each other. Specific pieces of data communicated between tasks are generally referred to as shared variables.

Special measures need to be taken to ensure that conflicts do not arise between concurrently running tasks when they communicate shared variables. Such conflicts can occur, for example, when a section of memory is concurrently being written to by a first task and read from by a second task. In a simple case the likelihood of conflicts can be reduced by buffering data to be written by the first task until the second task has read the data. However, accommodating concurrent write and read operations to the same memory area becomes considerably more complex when the read and write operations are performed at different rates. For example, if more than one write operation is performed during a single read operation, not only does the data to be written need to be buffered, but also what is deemed to be the most "recent" data must be carefully tracked.

It has been found that implementing the following design principles provides a particular advantage in multi-core based embedded systems:

Multi-Core Design Principles

. Data is communicated between cores via a plurality of shared memory areas (SMAs). However, it should be noted that in the examples described below, these SMAs do not exist as one single group of memory areas.

Instead each core is attached to a plurality of buffer memories. Each buffer memory contains a version of the plurality of SMAs. Thus if a system includes n buffers, there would be a total n versions of each SMA.

It should be noted that in the following, unless otherwise stated, reference to an "SMA" refers to a particular version of an SMA in a particular buffer.

. Shared variables are allocated to SMAs that possess globally unique identifiers.

. In some examples each core is attached to three buffer memories. Therefore, each core is connected to three versions of each SMA.

. Each SMA in each buffer memory may be in one of a number of states including "in use by a local core", "in use by an external core" or marked as containing the "latest data". The SMAs cycle states depending on local or external state switching instructions.

. Each core module maintains a list of SMAs which are relevant to the tasks that are to be run on that core. . When a core writes to an SMA, a message tagged with an SMA identifier is broadcast over an arbitrary on-chip network to all other cores.

. When a core receives such a message, it processes the message only if the SMA identified therein is in the list of relevant SMAs, whereupon it writes the contents into the corresponding SMA of one of the three buffers which has been placed in an "in use by an external core" state by a separate switch message.

. When a core reads from an SMA, the data is read from the relevant SMA in one of the buffers that is in the "in use by the local core" state.

Example implementation

Figure 4 provides an example implementation of a multi-core processing unit arranged in accordance with the design principles set out above.

As will be explained in further detail below, the multi-core processing unit shown in Figure 4 implements a hardware communication controller and scheduler that synchronises task execution such that the fact that the processing unit comprises multiple cores is transparent to the application software running on the processing unit. Moreover, to safeguard access overlaps, a hardware implementation of a three buffer single-writer, single- reader mechanism is used, with the entire data memory being buffered.

In this architecture, at any one time corresponding SMAs in all the buffers have imposed on them a one "writer" condition and corresponding SMAs in each buffer group have one "reader" imposed on them.

Thus, as described above, the system is arranged such that for

corresponding SMAs in any given buffer group (i.e. all the versions of an SMA coupled to a particular processor), a maximum of only one task will ever be reading from one version of an SMA. On the other hand, for all buffers across the whole system (i.e. all versions of an SMA in all of the buffers) a maximum of only one task will ever be writing to one version of an SMA, at any given point in time.

The multi-core processing unit shown in Figure 4 includes a first core module 10a and a second core module 10b. The first and second core modules can be implemented as core modules in a processing unit of an embedded system, such as the processing unit 14 shown in the embedded system 1 1 of Figure 3.

As can be seen from Figure 4, the structure of the first core module 10a and the second core module 10b correspond. For the sake of brevity only the structure of the first core module 10a will be explained in detail. It will be understood that like parts from the first core module 10a correspond in function and in the nature of their interconnection with like parts of the second core module 10b.

Further, it will be understood that although the multi-core processing unit shown in Figure 4 comprises only first and second core modules, in some implementations, the multi-core processing unit may include more than two core modules.

As can be seen from Figure 4, the first core module 10a includes a core 1 a. The core 1 a is connected to a communication controller 2a. The communication controller 2a is connected to a scheduler 3a which is also connected to the core 1 a. The core 1 a and communication controller 2a are connected to a buffer switch 4a. The buffer switch 4a is connected to three buffer memories 5a, 6a, 7a which together comprise a memory unit (or buffer group). The communication controllers 2a, 2b of each core module 10a, 10b are connected via a common data bus 8. The buffer memories 5a, 6a, 7a are connected to the communication controller 2a.

In summary, the software (i.e. the application) running on the system is divided into tasks which are organised by the schedulers 3a, 3b. Each scheduler 3a, 3b then sequentially sends these tasks for execution to the core 1 a or core 1 b depending on which it is attached to. The core then performs the task which typically involves writing to and/or reading from the buffer memories. Data is shared between cores by writing to and reading from shared memory areas (SMAs) which are replicated in each buffer 5a, 6a, 7a, 5b, 6b and 7b. When a core writes to or reads from an SMA this is communicated to the other cores of the system by the communication controller attached to that particular core on the common data bus 8. The buffer from which a core reads is determined by the position of the buffer switch 4a under the control of the communication controller.

The various elements of the core module are explained in more detail below:

Core

The core 1 a is typically a central processing unit (CPU) that executes tasks sent to it from the scheduler 3a. The core 1 a can read and write data to the buffer memories 5a, 6a, 7a, via the buffer switch 4a under the control of the communication controller 2a.

Buffer Memories

As explained above, in each buffer memory is a version of each SMA. The buffer memories are referred to as "buffers" from this point. In the implementation shown in Figure 4, each buffer 5a, 6a, 7a, comprises an area of memory which corresponds with the total shared memory in the system.

This is shown in more detail in Figure 5. As can be seen from Figure 5, each of the buffers 5a, 6a, 7a contain a section of memory which is substantially identically divided into a plurality of n SMAs. The size and location of each SMA is typically defined at compile time.

As will be described below, the scheduler "switches" the buffers for the SMAs. The "switching" performed by the scheduler is a "switching" of a state associated with the SMA in the buffers used by a task before it executes (switching the state of the SMAs in the buffers is not to be confused with the function of the buffer switch which is explained in more detail below).

The switching performed by the scheduler includes the local buffers ("local switch") and the buffers in other cores ("external switch") sharing these memory areas.

As will be understood, a "local switch" refers to switching the state of SMAs of buffers attached to the same core as the scheduler. An "external switch" is switching the state of SMAs of buffers attached to a different core than the scheduler.

A switch also locks the buffers and so the buffers must be released by the scheduler when the task is finished. A "local switch" sets the local buffer to the latest written buffer; an "external switch" reserves a buffer that is not the latest and which is not being read. An "external switch" uses the last written buffer if a "local switch" has not occurred since the last external switch. This allows tasks working at different rates to function properly. A switch may also be performed locally only (read switch) if an SMA has multiple readers since multiple readers attempting external switches can disrupt each other. It will be understood that the term "buffer" in this context refers generally to the SMA in a particular buffer.

As shown in Figure 6, a buffer (or more specifically an SMA in a buffer) may then be in one of several states: available (a), being used locally (I), being used externally (e) and being the last used externally (u) and several guards. "Switching" between states occurs when a switch operation occurs. Specifically, an external switch (ES) and release (ER); a local switch (LS) and release (LR); and a local switch have happened since the last external switch (LSSLE).

In some examples the transitions from the available state are the least preferred and transitions by a buffer from another state are then performed instead, if possible.

As long as the application (i.e. the software running on the system) buffers data appropriately and the reading task executes (on another core) after the last execution of a write operation in a batch but before the first execution of a write operation in the next batch, then the reader can execute concurrently until the start of the next plus one batch without any data losses or any incoherence.

Multiple buffers can be toggled together by a single register write to the communication controller and there is no variability introduced by tasks using variable numbers of shared memory areas. This prevents the communication controller from increasing a task's release jitter.

Accordingly, during operation of the processing unit, each SMA in the buffer 5a, 6a, 7a, 5b, 6b, 7b may be in one of a number of states. These states can be labelled as follows: · "IN LOCAL USE" (corresponding to "being used locally (I)" described above). When in this state, an SMA of a buffer is being used by the local core (i.e. the core to which the buffer is attached). No data originating from an external core can be written to an SMA of a buffer in this state.

• "IN EXTERNAL USE" (corresponding to "being used externally (e)" described above). When in this state, an SMA of a buffer is being used by an external core (i.e. a core from a different core module).

• "MOST RECENT DATA" (corresponding to "and being the last used externally (u)" described above). In this state, the data in an SMA of a buffer is deemed the "most recent" data, and will be the SMA which will be read next when a read instruction is issued for that set of corresponding SMAs.

• "AVAILABLE" (corresponding to "available (a)" described above). This is the initial state of a buffer and in this state, data originating from either a local or external core can be written to this buffer by the core without any further consideration.

As is explained in more detail below, the communication controller is arranged to monitor the states in which the SMAs of each of the three buffers are in at any given time. Particularly, the communication controller includes a buffer state register in which the state of the SMAs in each buffer is stored. The buffer state register is updated whenever the state of one of the SMAs in the buffers changes.

Scheduler

The scheduler component of the RTOS (real time operating system) associates SMAs with tasks, requests the controller to switch to the latest buffer for those areas when the task is about to execute and releases the area when the task is finished, i.e. the whole task is considered a critical section. (As is known in the art, a "critical section" is a section of code that accesses some shared data. It is critical since the shared data ideally must not be altered by any other process during execution).

An overview of a write on one core being propagated to another core can be seen schematically in Figure 7 and is explained in further detail below. Figure 7 provides a schematic diagram indicating a flow of data during a write process. A first core module 71 including a first communication controller 72, writes data via a physical link 73 (i.e. the bus) to a second core module 74 including a second communication controller 75. It will be understood that although not shown in Figure 7, the first and second core modules are structurally the same.

As is known in the art, schedulers, such as scheduler 3a are operable to arrange the processor instructions into tasks. The scheduler schedules which tasks a core is to perform and in what order. As explained above, the scheduler 3a is also responsible for associating tasks with particular SMAs. For example, the scheduler 3a identifies when a first task needs to share data with a second task. The scheduler then indentifies which SMAs are allocated for shared variables of the first and second tasks and then associates the tasks with the SMAs accordingly.

So-called SMA descriptions, relating to the SMAs that are used by tasks being performed on the processing unit are created upon request by the RTOS and are associated with an identifier decided at compile-time.

Typically, this is how SMA identifiers are created. The communication controller spends one or more cycles updating the lookup table that converts addresses to SMA identifiers.

In the example implementation shown in Figure 4, for clarity the first and second scheduler 3a, 3b are shown as two discrete units. However, it will be appreciated that multiple discrete schedulers may be part of the same system-wide entity.

Communication Controller

The communication controller is attached to the same bus as the data memory and the rest of the peripherals (i.e. peripherals of the system), allowing it to be directly controlled by software. It receives messages from other cores and writes them directly into data memory as shown schematically in Figure 8. Figure 8 provides a schematic diagram illustrating that in accordance with examples of the present invention a communication controller 81 of a first core module 82 is typically connected via a bus 83 to a plurality of other core modules 84 in the system.

The communication controller also monitors when a core writes data to an SMA in the buffers. As explained in more detail below, the communication controller is then arranged to communicate a message including an identifier associated with the SMA to which the data has been written, along with the data itself to other core modules in the processing unit.

Correspondingly, the communication controller is also arranged to receive messages from communication controllers in other core modules which indicate that external cores (i.e. cores from other core modules) have written data to an SMA. If this SMA is relevant to the local core, the communication controller is arranged to write this data into the local cores SMA buffers to the corresponding SMAs in the local buffers.

As can be seen from Figure 4, the communication controllers 2a, 2b are connected via data bus 8.

The communication controllers are also connected to their respective cores 1 a, 1 b via other data buses. These data buses connecting each core to their communication controllers are typically attached to other peripherals of the system, thereby allowing each communication controller 2a, 2b to be directly controlled by software, during for example an initialisation stage. During this initialisation stage the relevant data in registers described below can be set indicating, for example, which SMAs are relevant to a particular core.

The communication controller maintains separate registers (the

"description") for each SMA: a globally unique identifier, the address and size of the area, an indication of whether the SMA has been read since the last write and the state of each buffer (latest data, being written, being read). In other words the communication controller maintains a buffer state register. Stored in the buffer state register for each SMA is a globally unique identifier, the address and size of the SMA, an indication of whether each SMA has been read since the last write and the state of each SMA in each buffer (corresponding to latest data, being written, being read).

The controller also maintains a lookup table that allows for half-cycle conversions from a memory address to an SMA identifier. "Half cycle conversion" means to fetch an SMA identifier from an address which can be done in half the time taken to execute an instruction. Thus, an instruction that initiates a lookup (by accessing memory) can use the lookup result during its execution.

Example Write and Read Operations

A write operation and a read operation will now be described with reference to Figure 4. For clarity, the core 1 a of the first core module 10a is referred to as a "local" core, and the second core 1 b of the second core module 10b is referred to as the "external" core.

As explained in more detail below, write operations from a core are applied to all buffers attached to that core. This is because the half-cycle required to fetch the correct buffer number combined with the additional half-cycle to actually write the data might cause data hazards in the processor pipeline.

Again, as explained in more detail below, if the address being written to is part of an SMA, then after half a cycle when the address has yielded valid SMA information, a notification message is sent to all cores. The message contains the identifier of the SMA, the offset of the write address from the area's origin and the data that was written. This data is sufficient for the other cores to write the data into their own buffers at the proper location.

Since shared memory areas may be of different sizes even if associated with the same identifier, the communication controller typically ignores write requests from other cores that cross defined memory boundaries.

Example Write Operation

Firstly, the scheduler 3a identifies that a task (Task 1 ) is to be executed on the local core 1 a. The scheduler 3a then determines if Task 1 will result in the local core 1 a writing to memory and if this write to memory is a write to an SMA.

If Task 1 requires a write operation to an SMA (referred to from this point forward as SMA 1 ), the scheduler sends a first SMA state switch message to the communication controller 2a. The SMA state switch message includes an identifier of SMA 1 .

The communication controller 2a, upon receipt of the SMA state switch message, sends an "external switch" instruction to all other core modules on the data bus 8. The "external switch" instruction includes an identifier identifying SMA 1 . As explained below, the "external switch" instruction causes a corresponding SMA in at least one buffer of the other core modules to be changed to the IN EXTERNAL USE state.

Once sent on the common data bus 8, the "external switch" instruction is received by the communication controller 2b of the second core module 2b. The communication controller 2b of the second core module then extracts the identifier of SMA 1 from the "external switch" instruction and identifies the current state of SMA 1 in the buffers 5b, 6b, 7b from the buffer state register. The communication controller 2b then identifies the first buffer in which SMA 1 is in the AVAILABLE state by referring to the buffer state register. The control unit 2a then changes the state of SMA 1 in this buffer to the IN EXTERNAL USE state. This change of state is recorded in the buffer state register. This completes the process associated with the "external switch" instruction.

Returning, to the local core 1 a, once the external switch instruction has been sent by the communication controller 2a of the first core module 10a, the scheduler 3a of the first core module 10a instructs the local core 1 a to execute Task 1 .

Task 1 executes on the local core 1 a causing the local core 1 a to write data to SMA 1 . The local core 1 a performs a write operation by writing the data identically to SMA 1 in all three of the buffers 5a, 6b, 7b.

As mentioned above, the communication controller 2a is arranged to monitor all local write operations performed by the local core 1 a.

Accordingly, the communication controller 2a detects a write operation has been performed by the local core 1 a and specifically the memory address to which data has been written (i.e. the address of SMA 1 ).

The communication controller 2a then compares the memory address associated with the detected write operation (i.e. the address of SMA 1 ), with the SMAs listed in the buffer state register.

The communication controller 2a then determines that the address of the detected write operation corresponds to an SMA (i.e. SMA 1 ), and then transmits a write message on the data bus 8. The write message includes the data that has been written, the SMA identifier identifying the SMA to which data has been written (i.e. the identifier of SMA 1 ), and an address offset of the data from the start of the SMA 1 . The communication controller 2b of the second core module 10b, receives the write message transmitted on the data bus 8 and extracts the SMA identifier of SMA 1 .

The communication controller 2b using the identifier of SMA 1 identifies the state of SMA 1 in each of the buffers 5b, 6b, 7b from the buffer state register. The communication controller 2b then identifies which of the buffers 5b, 6b, 7b contain the SMA in question (i.e. SMA 1 ) in the IN EXTERNAL USE state. The communication controller 2b then writes the data that was contained in the write message to the SMA 1 in this buffer. When the execution of Task 1 on the local core 1 a has completed, the scheduler of the first core module 10a sends a second SMA state switch message to the communication controller 2a. The second SMA state switch message includes the identifier of the SMA modified by Task 1 (i.e. the identifier of SMA 1 ).

The second SMA state switch message causes the communication controller 2a to transmit an "external release" instruction on the bus 8 which includes the identifier of the SMA modified by Task 1 (i.e. the identifier of SMA 1 ).

The "external release" instruction is received by the communication controller 2b of the second core module 10b. On receipt of this message, the communication controller 2b extracts the SMA identifier (i.e. the identifier of SMA 1 ) and retrieves the state of the corresponding SMAs from the buffer state register (i.e. the state of SMA 1 in each of the buffers 5b, 6b, 7b). The communication controller then identifies which one of the buffers 5b, 6b, 7b contains SMA 1 that was previously changed to the IN EXTERNAL USE state, and changes this to the MOST RECENT DATA state. Note that the write process described above assumes only one SMA is modified by Task 1 . However, as will be understood, in some examples, the task being executed may result in writes to multiple SMAs.

Example Read Operation

The read operation described below describes a read operation performed by the external core, i.e. core 1 b shown in Figure 4.

Firstly, the scheduler 3b of the second core module 10b identifies that a task (Task 2) is to be executed on the external core 1 b. The scheduler 3b then determines if Task 2 will result in the external core 1 b reading from memory and if this read from memory is reading from an SMA.

If the scheduler 3b of the second core module 10b identifies that Task 2 requires a read operation from an SMA (referred to from this point forward as SMA 2), the scheduler 3b sends a first SMA state switch message to the communication controller 2b. The first SMA state switch message includes an SMA identifier identifying SMA 2.

The communication controller 2b then uses the SMA identifier of SMA 2 to identify the state of SMA 2 in the three buffers 5b, 6b, 7b from the buffer state register.

When a buffer is identified in which SMA 2 is in the MOST RECENT DATA state, the communication controller changes SMA 2 in this buffer to the IN LOCAL USE state.

If there is no buffer with SMA 2 in the MOST RECENT DATA state, then first buffer with SMA 2 in the AVAILABLE state is changed to the IN LOCAL USE state.

Task 2 is then run on the external core 1 b. As a result, Task 2 requests the external core 1 b to perform a read operation by reading from SMA 2. As mentioned above, the communication controller 2b is arranged to monitor all local read operations performed by the core to which it is attached. Thus the communication controller 2b detects a read operation is to be performed by the external core 1 b and specifically the memory address (i.e. the address corresponding to SMA 2) from which data is to be read.

On detection of the read operation to be performed by the external core 1 b, the communication controller 2b refers to the buffer state register to determine the state of SMA 2 in the three buffers.

The communication controller 2b then controls the buffer switch 4b to ensure that the core 1 b reads the data in SMA 2 from the buffer in which SMA 2 is in the IN LOCAL USE state.

When the execution of Task 2 on the external core 1 b has completed, the scheduler 3b sends a second SMA state switch message to the communication controller 2b. The second SMA state switch message includes an SMA identifier identifying SMA 2 (i.e. the SMA which Task B requested the external core 2b to read).

On receipt of the second SMA state switch message, the communication controller 2b extracts the SMA identifier of SMA 2 and retrieves the state of SMA 2 from each buffer from the buffer state register.

The communication controller 2b then performs a "local switch" instruction by identifying in which of the buffers SMA 2 is in the IN LOCAL USE state and changing this to the AVAILABLE state.

Note that the read process described above assumes only one SMA is read by Task 2. However, as will be understood, in some examples, the task being executed may result in reads from multiple SMAs.

Clocking Rate In the example read and write operations discussed above, the buffer memories are clocked at the same rate as the core, with no caches. As a result, after the core places an address on the bus valid data is expected in the next clock cycle. Translating from a memory address to a shared memory identifier (to fetch the number of the buffer with the latest data) takes half a cycle; and so, all buffers fetch data concurrently from the same address and the data is multiplexed when the right buffer is known.

Communication Buffers and Application Buffers

It should be noted that the buffer memories described above are considered to be "communication" buffers as opposed to "application" buffers. At the software level (e.g. the application level) there may be several application buffers in one communication buffer each pertaining to different tasks. In the context of the present invention, the application buffers typically from one task form one or more shared memory areas (SMAs) in the three communication buffers.

Switching Behaviour

The switching behaviour is examined in more detail in Figure 9 and Figure 10 where the state of the buffers is shown from the point of view of a Task B. The condition of a local switch having happened since the last external switch is also shown. The start times of a Task A are shown as the time when the external switch request reaches the hardware of the core on which Task B executes, and likewise Task A ends when the external release request is recognised.

In other words, Figures 9 and 10 show how the states of a particular SMA in each of the buffers changes during operation due to the execution of Task A on a first core and Task B on a second core. Figures 9 and 10 show this from the "point of view" of Task B in the sense that the state of the SMA of the buffers connected to the core performing Task B are shown.

Figure 9 schematically illustrates a buffer that switches from the view of Task B when it overlaps with a Task A running at the same rate with a combined utilisation less than one.

In Figure 9, both tasks run at the same rate and overlaps from Figure 2 are chosen. The tasks in (a), (b) and (c) can all be scheduled on single processors; (b) is the sort of timeline that can occur on a single processor. It is interesting to note, that (b) only ever uses one buffer, and only (d) uses all three buffers. If the precedence constraint of task B needing to run after Task A was added to (d) & (e), then they would resemble (c) and would also use only two buffers. A great disadvantage in this system is that initial data is lost if the tasks are given non-zero offsets due to the extra condition imposed by LSSLE (local switch since last external switch). This behaviour is clearer in Figure 10.

In Figure 10, Task A runs at twice the rate of Task B, with the first execution of Task B taking place after two executions of Task A, so that data is valid. As before, various combinations are taken: either Task B runs before the next execution of Task A or not, either Task B starts before the next plus one execution of Task A or not, either Task B finishes before the next execution of Task A or not and either Task B finishes before the next plus one execution of Task A or not. In all cases, executions of Task A which have not seen an execution of Task B after a prior execution of Task A cause no switches.

Figures 10 (a) to (d) illustrate in schematic form a buffer that switches from the view of task B when it overlaps with a Task A running at twice the rate.

In particular, Figure 10 (a) is another example of a single processor type system and accordingly, one buffer is sufficient. Figure 10 (b), (c) and (d) exhibit the long-task problem. However, depending on the data structure used by the application buffers, data losses may occur in (c), (d) and (e) and may be sustained, (c), (d) and (e) could avoid data losses with proper scheduling but will recover in the next tick (not shown). The example in Figure 10 can be expanded to higher frequency rate mismatches as well. As long as the application buffers data appropriately and the reading task executes (on another core) after the last execution of a writer in a batch but before the first execution of a writer in the next batch, then the reader can execute concurrently until the start of the next plus one batch without any data losses or any incoherence.

Figure 1 1 provides a schematic diagram of a flow chart illustrating a process according to an example of the invention. The process can be implemented on a multi-core system comprising a plurality of processors and plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas. For all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to any one time, whilst for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time.

Step S101 comprises identifying a first write operation scheduled to be performed by a first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor.

In response Step S102 comprises identifying a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to a second processor, wherein the second shared memory area is in a first state where it is available to receive data.

Step S103 comprises writing data associated with the first write operation to the first shared memory area in at least the first buffer.

In response thereto Step S104 comprises changing the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area.

Step S104 comprises writing said data to the second shared memory area of the second buffer, to thereby ensure each corresponding shared memory area contains the most recent version of data.

The skilled person will envisage further embodiments within the scope of the appended claims.

Claims

1 . A method of communicating data between a plurality of processors in a multi-core system via a plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas, and

for all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to any one time, whilst

for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time; said method comprising:

identifying a first write operation scheduled to be performed by a first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor, and in response

identifying a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to a second processor, wherein the second shared memory area is in a first state where it is available to receive data;

writing data associated with the first write operation to the first shared memory area in at least the first buffer, and in response thereto changing the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area, and

writing said data to the second shared memory area of the second buffer, to thereby ensure each corresponding shared memory area contains the most recent version of data.

2. A method according to claim 1 , further comprising reading data from the second shared memory area of the second buffer by

identifying a first read operation scheduled to be performed by the second processor from the second shared memory area;

identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data; and

performing the first read operation by reading the data from the second memory area of the second buffer.

3. A method according to claim 2, wherein after identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data, changing the state of the second shared memory area of the second buffer to indicate the second shared memory area of the second buffer is in use by the second processor.

4. A method according to any previous claim, wherein identifying the first write operation comprises identifying the first write operation by a scheduler coupled to the first processor.

5. A method according to any previous claim wherein the plurality of buffers are clocked at a same rate as the plurality of

processors.

6. A method according to any previous claim, wherein the plurality of buffers are arranged into a plurality of buffer groups, each processor being coupled to buffers of one buffer group, and each buffer group comprises three buffers.

7. A method according to any previous claim, wherein read operations performed by the plurality of processors and write operations performed by the plurality of processors are performed at different rates.

8. A multi-core system comprising a plurality of processors and a plurality of buffers, each buffer comprising memory divided into a plurality of shared memory areas such that each buffer comprises a corresponding version of each of the plurality of shared memory areas, the system being arranged such that for all of the buffers in the system, for the corresponding version of each same shared memory area of each buffer, a maximum of one shared memory can be written to at any one time, whilst for all of the buffers associated with each individual processor, for the corresponding version of each same shared memory area a maximum of one shared memory area can be read from at any one time; said system further comprising:

a first scheduler coupled to a first processor of the plurality of processors arranged to identify a first write operation scheduled to be performed by the first processor of the plurality of processors to a first shared memory area in at least a first buffer, said first buffer being coupled to said first processor,

a communication controller coupled to a second processor arranged in response to the identification of the first write operation to identify a second shared memory area corresponding to the first shared memory area in at least a second buffer, said second buffer being coupled to the second processor, wherein the second shared memory area is in a first state where it is available to receive data; wherein

said communication controller is arranged, in response to a writing of data associated with the first write operation to the first shared memory area in at least the first buffer, to change the state associated with second shared memory area of said second buffer to indicate the data in the second shared memory area is most recent data written to the second shared memory area, and to write said data to the second shared memory area of the second buffer, thereby ensuring each corresponding shared memory area contains the most recent version of data.

9. A system according to claim 8, wherein the second processor is coupled to a second scheduler, said second scheduler arranged to identify a first read operation scheduled to be performed by the second processor from the second shared memory area; and in response the communication controller is arranged to identify that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data; and in response the second processor is arranged to perform the first read operation by reading the data from the second shared memory area of the second buffer.

10. A system according to claim 9, wherein after identifying that the second shared memory area of the second buffer is in the state indicating the data stored therein is most recent data, the communication controller is arranged to change the state associated with the second shared memory area of the second buffer to indicate that the second shared memory area of the second buffer is in use by the second processor.

1 1 . A system according to any of claims 8 to 10, wherein the plurality of buffers are clocked at a same rate as the plurality of processors.

12. A system according to any of claims 8 to 1 1 , wherein the plurality of buffers are arranged into a plurality of buffer groups, each processor being coupled to buffers of one buffer group, and each buffer group comprises three buffers.

13. A system according to any of claims 8 to 12, wherein read operations performed by the plurality of processors and write operations performed by the plurality of processors are performed at different rates.

14. A scheduler for use in a system of the type defined in any of claims 8 to 13.

15. A communication controller for use in a system of the type defined in any of claims 8 to 13.

16. A product including a system of the type defined in any of claims 8 to 13.

17. A method, system, scheduler, communication controller or product as generally hereinbefore described with reference to and/or illustrated in Figures 3 to 1 1 of the accompanying drawings.