WO2004044733A2 - State engine for data processor - Google Patents

State engine for data processor Download PDF

Info

Publication number
WO2004044733A2
WO2004044733A2 PCT/GB2003/004867 GB0304867W WO2004044733A2 WO 2004044733 A2 WO2004044733 A2 WO 2004044733A2 GB 0304867 W GB0304867 W GB 0304867W WO 2004044733 A2 WO2004044733 A2 WO 2004044733A2
Authority
WO
WIPO (PCT)
Prior art keywords
state
parallel processor
processor
parallel
data
Prior art date
Application number
PCT/GB2003/004867
Other languages
French (fr)
Other versions
WO2004044733A3 (en
Inventor
Anthony Spencer
Original Assignee
Clearspeed Technology Plc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clearspeed Technology Plc filed Critical Clearspeed Technology Plc
Priority to AU2003283545A priority Critical patent/AU2003283545A1/en
Priority to GB0509997A priority patent/GB2411271B/en
Priority to US10/534,430 priority patent/US7882312B2/en
Publication of WO2004044733A2 publication Critical patent/WO2004044733A2/en
Publication of WO2004044733A3 publication Critical patent/WO2004044733A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/56Queue scheduling implementing delay-aware scheduling
    • H04L47/562Attaching a time tag to queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/60Queue scheduling implementing hierarchical scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/6215Individual queue per QOS, rate or priority
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/624Altering the ordering of packets in an individual queue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9042Separate storage for different parts of the packet, e.g. header and payload

Definitions

  • the present invention relates to State Engines for use in data processors, especially parallel processors. Background to the Invention
  • a good case in point is the challenge of Traffic Management in network routers.
  • a significant, recognised issue in per-flow Traffic Handling is that a number of items of state need to be maintained for each of a large number of queues. The implications of this are that: (a) a considerable volume of shared memory needs to be implemented; (b) a lot of memory address bandwidth is required if each queue requires separate accesses to be made to different (shared) state variables; and (c) the memory access latency is likely to be long, thus causing state blocking during modification to impact on performance.
  • Processors in parallel can create a high rate of access to the same item of state.
  • the present invention provides, in one aspect, a parallel processor comprising state element means providing coherent parallel accesses to shared state.
  • the parallel processor is preferably an array processor, such as a SLMD processor.
  • the parallel processor may further comprise means to serialise and/or synchronise multiple accesses/updates to said shared state.
  • the said state may comprise a single item of state or multiple items of state and may comprise a single storage location or a data structure in storage.
  • Operations on said state may be carried out as a fixed or hardwired set of operations. Further means may supply data to update said state. Means may also send a command and data to said state, whereby said operations are programmable.
  • a plurality of said state element means may be organised into state cell means, whereby operations on said state can be pipelined. There may be a plurality of said state cell means, whereby to allow multiple requests in relation to said state to be handled concurrently.
  • the state cell means may also include input and output interconnect means to provide access to and from said state cell means, a bus interface for said input and output interconnect means, said bus interface interfacing with a system bus, and a control unit interconnected with said system bus for controlling accesses to said state.
  • Each said state element means preferably comprises local memory, and each field of a data record is stored in a respective memory of a respective state element means.
  • Each said state element means preferably comprises a local memory for said state, an arithmetic unit adapted to perform an operation on said state in said local memory, and command and control logic to control said operation.
  • the invention also contemplates a computer system and a network porocessor incorporating a parallel processor as specified in any of the above statements.
  • the processor may be provided on a single silicon chip.
  • Figure 1 is a schematic diagram of a State Engine using State Elements, all in accordance with the invention.
  • Figure 2 is a functional representation of the State Engine
  • Figure 3 is an implementation of a State Cell forming part of the State Engine
  • Figure 4 is a specific implementation of the generic State Engine as a complex state engine designed for Traffic Handling and queue management;
  • Figure 5 illustrates the prior art method of accessing shared state contrasted with the benefits of using State Elements as embodied in the present invention
  • Figure 6 is a functional overview of a State Element in accordance with the invention.
  • Figure 7 is an implementation overview of a State Element
  • Figures 8 and 9 show respective examples of preferred implementations of the state and command units of a State Element. Detailed Description of the Illustrated Embodiments
  • a system for interlocking processors together must be implemented so that they may arbitrate for a resource and then lock it when there is contention. This control signalling can be complex and add significant functional and performance overhead.
  • Semaphores can be used to interlock processors or control logic and caches can be used to intercept concurrent accesses and serialise them. However, these can be complex, slow and/or require significant support tied into hardware. Embedded memory can reduce lock-out time, but delays can still be significant.
  • the State Elements described in this specification adopt a different approach by which it is accepted that there is a serialisation point and a method is established to manage it rather than create an interlock, hi broad terms, as will become clearer later, a function is co-located with local memory, the function performing the read, modification and write back in the memory rather than in the software. It is a physical solution rather than a software-based solution.
  • a disadvantage of this solution is that it "handcuffs" the software. Something that was previously programmable is now replaced by something else that blocks the software. These two aspects are interrelated.
  • the first problem is the serialisation of the software but this has the consequence that, if the solution is by way of hardware, instead of having many tens of cycles of read/modify/write back turn-round time and complex control signals in the system, it will be reduced to within a couple of cycles but the entity that does the actual modification is now hardware. As well as removing the latency problem, this solution also responds to the other problem it introduces of potentially handling the software.
  • the ingenuity in this aspect of the present solution lies in the combination of relocation of the logic close to the memory and making it semi-programmable.
  • State Elements are the key components in the present context which perform the serialisation of acesses into a shared memory.
  • parallel processors where simultaneous access to shared state is increasingly likely, there are potentially many state elements all in parallel, executing function calls from the parallel processors but instead of it all being SLMD, where the parallel processors are operating from a single instruction stream, the state elements operate in parallel but from individual instruction streams. They effectively operate in response to requests from processors.
  • An advantage of this aspect of the invention is a reduction in the burden on the system bus. Normally, a function call is issued over the bus to instruct the state element to perform a function. A command is issued and the command results in a read request, data to be returned and modified data to be written back again. In the State Elements, instead of having three accesses across the system bus, there is now only one. Quite complex operations can thus be performed remotely, without having to keep sending information back to the processor.
  • the command line allows commands to be issued to access and modify a piece of memory or deposit micro-code in the state element.
  • the state element therefore consists of a basic memory plus an ALU, a controller unit where the micro-code is written to, and special function units, such as addition units.
  • Part of the design philosophy is to enable the element to become part of Applicant's toolkit, where required functions can be "bolted" on as necessary.
  • the microcode in the memory can be changed so that, instead of performing operations like read/modify by adding a fixed operation that is passed to the command line to write it back, it is possible, with another piece of software, to read, add a value, hold, add another value and write back, for example. If a conditional read/modify/write back is desired, a condition block can be added on. If a history function is wanted, a history block, where sets of flags are maintained, can be provided. Thus, a control flag can be maintained, enabling future access, where one or other of two operations may be performed, based on that flag.
  • State elements are the key components which perform the serialisation of acesses into a shared memory.
  • the state elements are combined in state engines and connected to a bus.
  • the state element can be likened to a miniature, micro-coded ALU but the emphasis is on memory access rather than on the processing side.
  • the state element comprises memory with an attached function which is flexible but with the emphasis on rapid transfer of data in and out of the memory. It performs a function on the memory, which is flexible.
  • Figure 5 a illustrates schematically the type of problem that the present invention can overcome.
  • the Figure represents a time line of a process involving conventional memory accessed via an on-chip bus in response to a request from one or other of two processors.
  • processor 1 for example, issues a read request
  • the addressed data is read from memory and the data carried by the bus to the processor.
  • the data is then modified in the processor and the result written back into memory via the bus.
  • any other requests for example from the second processor, are locked out.
  • the memory once again becomes available to the next request. It is obvious that the lock-out period imposes considerable constraints on the speed of the overall process executed by the individual processors.
  • the State Elements provide an alternative to this known approach, which requires an arrangement in which parallel processors read from memory, modify, writeback data, and request the memory to perform the modification on its behalf. Instead, the State Element in the preferred implementation of the invention positions the serialisation point, not within/between each processor, but in a simple shared processor which has local and rapid access to the memory in which the shared state variables are stored.
  • the state element is analogous to an object in Object Oriented Design. It has privately stored data which is accessible only via the objects' methods. By issuing commands, parallel processors could be considered to be making method calls to the object.
  • the preferred embodiment of state element comprises a small block of embedded memory 60 with single cycle read/write access time combined with a simple arithmetic and logic unit, as shown generically in Figure 6.
  • the Arithmetic Unit 61 receives commands (from processors) which comprise an address, data and a command code.
  • the address identifies the state variable which is to be accessed
  • the data provides operands which a simple computer uses to modify the variable
  • the command 62 selects a locally stored thread of programmed microcode 63 which is able to read, modify and writeback the state variable within a very small number of system clock cycles. The result can be returned to the processor that issued the command. Details of the embodiment
  • a state element comprises an embedded memory and an attached function.
  • the function could either be hardwired (a finite state machine) or a programmable, microcoded circuit. The latter approach is the more versatile and complex.
  • Figure 7. Note the presence of special function and condition blocks. These greatly extend the functional capability of the element.
  • Figure 8 illustrates the simplicity of the arithmetic unit, and how the path between the command line and the memory has minimal delay.
  • Figure 9 shows a more complex variant in which multiple items of state are held in memory. The impact on the command line turnaround (and microcode store size) is significant. However, this is not to say that the Figure 9 circuit could not be used in appropriate circumstances. For example, in a lower performance system with a more complex set of state it could be the preferred approach.
  • a state engine can be built up in a structured and well defined manner using a state element as an atomic part.
  • a state element As an atomic part.
  • state elements can be combined into state cells, which are multiplied into state arrays, which in turn may be grouped together to form state engines.
  • specific state element designs are described later in this context, the present invention encompasses state engines using other state elements.
  • State Record - This is a conceptual entity and consists of a group of one or more state variables which have a given base address.
  • Command line - A message sent by a processor to the state engine. Fields in the command line include command code, address and data. The processor is effectively requesting that the function indexed by the command code be performed on the state record at the given address. Parameters can be both supplied and returned in the general purpose data field.
  • State Element - A state element is a small, private memory which contains state variables accessible only via functions executed by the state element's control logic. Functions typically read a state variable, perform some modification and write a new value back. The result may also be recorded in a data field in the command line.
  • the primary role of the state element is to manage the state access serialisation point by executing a simple function on memory at maximum speed. A specific implementation of a state element has already been described in this specification.
  • State Cell If there is more than one state variable in a record, it is permissible for the entire record to be stored as an entry within a single state element. However, as each field in the record would need to be processed in turn this would throttle the available bandwidth to the state.
  • each field of the record is stored as a single state variable in its own State Element. These State Elements are then chained together in a pipeline. The command line passes from one Element to the next, the same address and control word being used at each stage to pick a different field from a common record and perform some function on it.
  • State cell logic provides synchronisation between its constituent Elements which effectively make up a memory oriented pipelined processing system.
  • the primary role of the State Cell is thus to provide a means of constructing simple, pipelined processors which enable more complex state records to be handled at high speeds.
  • State Array The embedded memory used in the State Elements of State Cells must be relatively small in volume for rapid (ideally single cycle) access. This places a limit on the number of instances of a state record which may be stored in a single State Cell.
  • State Cells of a given type can be tiled to form a large State Array. Scaling during device layout is simplified by the State Array interconnect. The segmentation of an interconnection framework and the coupling of adjacent Cells in a tiled array using well defined interfaces is shown in Figure 2. The interconnect preserves order between accesses to the same State Cells. Since order preservation amongst command lines accessing different State Cells is not required, there is no need for the latency of command line accesses to different Cells across the array to be balanced.
  • the Array is scalable in a simple way and is layout- friendly.
  • Increasing the total state storage volume by multiplying State Cells can also increase overall state access bandwidth as the throughput of an individual State Cell is likely to be a little lower than that of the interconnect. If the number of State Cells is increased to the point that the interconnect becomes the limiting factor then aggregate throughput can be further increased by providing multiple interconnect channels, each channel accessing a different portion of the array (ie. table). This is analogous to designing a memory system with multiple, independently addressable channels to increase random access bandwidth.
  • the primary role of the state array is to provide scalable capacity. It also provides a means for scaling address and data bandwidth.
  • State Engine The State Engine combines State Arrays with all the additional glue logic and facilities that are required to construct a block which can be configured and accessed via a system bus.
  • Components include:
  • System control logic The state engine controller may issue (private) system commands to the state arrays. These commands are invoked by external blocks through accesses to the controller via the utility bus interface. Only (public) state commands may arrive via the main data flow interfaces. System commands configure the arrays or extract diagnostic information.
  • Inter-array switch connectors This involves a new application of (Banyan) switching technology for routing accesses between tables. It may only be required when there is more than one independent route through each State Array.
  • Command line "morphing" As command lines propagate from array to array they are used and sometimes updated as a result of each state element access. The data inserted into the command by state elements in one array could be used by the state elements in the next. Data and perhaps even addresses could be modified.
  • the State Cell which "stitches" State Elements together in a pipeline, is shown in Figure 3.
  • a pipeline of State Elements making up the State Cell store the component variables for a set of state records. Commands to access these state records arrive at the first State Element.
  • the control field is used to determine the update made to the first component of the state record, in this example via a microcoded controller.
  • the command line is then passed on to the next State Element to update the next component of the state record. This is repeated for the length of the pipeline in the State Cell.
  • the result from the final stage of the pipeline is returned to the requesting processor.
  • the preferred implementation of the invention therefore provides the following features, in which all of the specified issues associated with high speed data lookup by parallel processors are addressed:
  • Command line "morphing" As command lines propagate along the pipe from table to table they are used and sometimes updated as a result of each table access. The data inserted into the command by one table could be used by the state elements in the next.
  • Controller provides system commands for data and instruction broadcasts.
  • State Elements can interact.
  • the problems addressed by the present invention are particularly suited to application to array processors accessing shared state in a Traffic Handling application.
  • State engines were conceived as a way to arrange the state elements (required for managing state contention) in a way that addressed the additional issue of a high rate of state access.
  • State Engines can also be architected from the same or similar state elements to meet the needs of other applications - for instance meter management in the related area of Traffic Conditioning. State engines could therefore be used to deliver state element technology to any other application in which parallel (or even pipelined) processors access shared state at high rates. Additional optional features
  • System threads could be programmed to operate on the data in the state memory when commands from processors are not being serviced. For instance, could be useful for identifying state entries which are idle.
  • Find free queue algorithm - Find_free_queue system function. This is a background thread which implements a "Two strikes and out” algorithm for de-assigning state entries used to represent/manage queues which go idle (ie. empty).
  • Special function units - The "flag unit” and "address unit” are special function units designed to support the find free queue algorithm. The features they provide are considered to be of generic value and could be used by other algorithms (such as that required for maintaining meters in state elements)
  • Scheduling algorithm - The information required by the Self-Clocked Fair Queueing algorithm cannot be mapped directly into the state element. It is represented in a form which makes access and manipulation more robust and efficient.
  • the state element provides a number of (configure ⁇ V)programmed remote functions which may be performed on the stored data - functions would comprise a small number of data read, write, arithmetic operations and conditional accesses.
  • State elements could also be used as a general purpose tool in support of parallel processors in any application. Contention can arise when any two processors in a realtime (data flow processing) system require R/M/W access to a shared state variable. State Elements could therefore be used in conjunction with any parallel or pipelined arrangement of processors.

Abstract

Coherent accesses and updates to state shared by parallel processors, such as SIMD array processors, is made possible by the use of state elements having local memory storing the state and permitting serialisation of accesses. Operations on single or multiple items of state are perfumed by a fixed/hardwired set of operations but they can be programmable by sending command and data to control operations. Individual state elements comprise the local memory, an arithmetic unit, and command and control logic. Multiple state elements are pipelined in state cells which can, in turn, be organised into state arrays and state engines effecting complete control over shared state access. A read/modify/write operation can be performed in only two cycles and a complete command in only three to five cycles.

Description

STATE ENGINE FOR DATA PROCESSOR Field of the Invention
The present invention relates to State Engines for use in data processors, especially parallel processors. Background to the Invention
Situations often arise whereby functions must be performed on a continuous stream of data. If the functions are implemented in software on a processor, then each datagram (packet of data) which arrives in sequence from the stream must be stored, processed and then forwarded. This process will take some finite quantity of time to execute. As the rate of packet arrival increases there will come a point at which a single processor can no longer keep up. The function must then either be distributed across multiple processors arranged in a pipeline, or across multiple processors arranged in parallel - each receiving a packet from the stream in turn in some round robin sequence. Packets output from parallel processors are typically reordered before forwarding.
This is a well proven approach to high performance packet processing, but is limited in its scalability as the number of processors increases. Access to shared memories, be it for code or data, eventually becomes a bottleneck. Simultaneous R/W access to shared state will further add to the complexity of system control signalling in order to resolve contention.
This leaves the issue of high speed access to multiple items of shared state information by multiple parallel processors. As the number of processors and the complexity of their algorithms increases, address and data bandwidth requirement over the system bus to the shared data will also increase. This can then become a bottleneck. The State Element technology described later in this specification supports parallel processing systems by localising and managing serialisation to shared state.
A good case in point is the challenge of Traffic Management in network routers. A significant, recognised issue in per-flow Traffic Handling is that a number of items of state need to be maintained for each of a large number of queues. The implications of this are that: (a) a considerable volume of shared memory needs to be implemented; (b) a lot of memory address bandwidth is required if each queue requires separate accesses to be made to different (shared) state variables; and (c) the memory access latency is likely to be long, thus causing state blocking during modification to impact on performance.
Contention for shared state variables can be resolved by implementing state elements as described later. However, the state element concept in high performance systems is not a solution in itself. For maximum throughput and flexibility, a number of state elements are comnbined in a state engine. This allows multiple concurrent access to the shared state. The present invention aims to overcome the following problems:
1. Processors in parallel can create a high rate of access to the same item of state.
2. What happens if a given function needs to access multiple variables from the same address, ie needs to access and process a state record?
3. What if multiple functions executing in a processor on a single datagram each requires access to different, independently addressable tables of state variables or records?
In short, the fundamental problem being addressed is that of a high rate of state access. This problem must be solved in a flexible way which enables the easy scaling of both the quantity of state being stored and the rate of state access. Summary of the invention
The present invention provides, in one aspect, a parallel processor comprising state element means providing coherent parallel accesses to shared state.
The parallel processor is preferably an array processor, such as a SLMD processor. The parallel processor may further comprise means to serialise and/or synchronise multiple accesses/updates to said shared state.
The said state may comprise a single item of state or multiple items of state and may comprise a single storage location or a data structure in storage.
Operations on said state may be carried out as a fixed or hardwired set of operations. Further means may supply data to update said state. Means may also send a command and data to said state, whereby said operations are programmable.
A plurality of said state element means may be organised into state cell means, whereby operations on said state can be pipelined. There may be a plurality of said state cell means, whereby to allow multiple requests in relation to said state to be handled concurrently.
The state cell means may also include input and output interconnect means to provide access to and from said state cell means, a bus interface for said input and output interconnect means, said bus interface interfacing with a system bus, and a control unit interconnected with said system bus for controlling accesses to said state.
Each said state element means preferably comprises local memory, and each field of a data record is stored in a respective memory of a respective state element means.
Each said state element means preferably comprises a local memory for said state, an arithmetic unit adapted to perform an operation on said state in said local memory, and command and control logic to control said operation.
The invention also contemplates a computer system and a network porocessor incorporating a parallel processor as specified in any of the above statements.
The processor may be provided on a single silicon chip. Brief Description of the Drawings
The invention will now be described with reference to the drawings, in which:
Figure 1 is a schematic diagram of a State Engine using State Elements, all in accordance with the invention;
Figure 2 is a functional representation of the State Engine;
Figure 3 is an implementation of a State Cell forming part of the State Engine;
Figure 4 is a specific implementation of the generic State Engine as a complex state engine designed for Traffic Handling and queue management;
Figure 5 illustrates the prior art method of accessing shared state contrasted with the benefits of using State Elements as embodied in the present invention;
Figure 6 is a functional overview of a State Element in accordance with the invention;
Figure 7 is an implementation overview of a State Element; and
Figures 8 and 9 show respective examples of preferred implementations of the state and command units of a State Element. Detailed Description of the Illustrated Embodiments
A particular design of State Element will now be described with reference to Figures 5 to 9. However, other designs are possible for inclusion in the State Engine.
A problem arises when pipelined or parallel processors share state variables for which both read and write access is required. Processors can not be permitted to simultaneously read/modify/writeback a shared variable since the result from the first writeback will be overwritten by the second. It is necessary to serialise the accesses. This raises two significant issues:
1. A system for interlocking processors together must be implemented so that they may arbitrate for a resource and then lock it when there is contention. This control signalling can be complex and add significant functional and performance overhead.
2. When a processor has successfully negotiated for a resource, it should use that resource and then release it as soon as possible to limit the delay imposed on other processors. If access latencies are long to external memories, this can impact heavily on system performance.
Semaphores can be used to interlock processors or control logic and caches can be used to intercept concurrent accesses and serialise them. However, these can be complex, slow and/or require significant support tied into hardware. Embedded memory can reduce lock-out time, but delays can still be significant. The State Elements described in this specification adopt a different approach by which it is accepted that there is a serialisation point and a method is established to manage it rather than create an interlock, hi broad terms, as will become clearer later, a function is co-located with local memory, the function performing the read, modification and write back in the memory rather than in the software. It is a physical solution rather than a software-based solution.
A disadvantage of this solution is that it "handcuffs" the software. Something that was previously programmable is now replaced by something else that blocks the software. These two aspects are interrelated. The first problem is the serialisation of the software but this has the consequence that, if the solution is by way of hardware, instead of having many tens of cycles of read/modify/write back turn-round time and complex control signals in the system, it will be reduced to within a couple of cycles but the entity that does the actual modification is now hardware. As well as removing the latency problem, this solution also responds to the other problem it introduces of potentially handling the software.
The ingenuity in this aspect of the present solution lies in the combination of relocation of the logic close to the memory and making it semi-programmable.
State Elements are the key components in the present context which perform the serialisation of acesses into a shared memory. In the context of parallel processors, where simultaneous access to shared state is increasingly likely, there are potentially many state elements all in parallel, executing function calls from the parallel processors but instead of it all being SLMD, where the parallel processors are operating from a single instruction stream, the state elements operate in parallel but from individual instruction streams. They effectively operate in response to requests from processors.
The State Elements could be utilised in MIMD architecture or, indeed, anywhere that there is a conflict to resolve. It is particularly applicable to SIMD architecture, however, because MIMD is more tolerant to indeterminism in memory access whereas SLMD prefers everything to be deterministic.
An advantage of this aspect of the invention is a reduction in the burden on the system bus. Normally, a function call is issued over the bus to instruct the state element to perform a function. A command is issued and the command results in a read request, data to be returned and modified data to be written back again. In the State Elements, instead of having three accesses across the system bus, there is now only one. Quite complex operations can thus be performed remotely, without having to keep sending information back to the processor.
In prefered implementations, the command line allows commands to be issued to access and modify a piece of memory or deposit micro-code in the state element. The state element therefore consists of a basic memory plus an ALU, a controller unit where the micro-code is written to, and special function units, such as addition units. Part of the design philosophy is to enable the element to become part of Applicant's toolkit, where required functions can be "bolted" on as necessary.
There is therefore flexibility on two levels. On the one hand the microcode in the memory can be changed so that, instead of performing operations like read/modify by adding a fixed operation that is passed to the command line to write it back, it is possible, with another piece of software, to read, add a value, hold, add another value and write back, for example. If a conditional read/modify/write back is desired, a condition block can be added on. If a history function is wanted, a history block, where sets of flags are maintained, can be provided. Thus, a control flag can be maintained, enabling future access, where one or other of two operations may be performed, based on that flag.
State elements are the key components which perform the serialisation of acesses into a shared memory. The state elements are combined in state engines and connected to a bus. The state element can be likened to a miniature, micro-coded ALU but the emphasis is on memory access rather than on the processing side. Primarily the state element comprises memory with an attached function which is flexible but with the emphasis on rapid transfer of data in and out of the memory. It performs a function on the memory, which is flexible.
In systems such as data packet queue controlling systems using a single processor, there is no contention when the processor seeks data from memory. The state of the packet queues are available to a single processor. A single processor can hold up to 10,000 such queues. However, in contrast, consider the situation where a plurality of processors share access to certain states in memory. At any instant, more than one processor may need to update the state of the same, one queue. There is therefore greater potential for contention. If contentions were to be avoided by replicating the processors, there would be greater complexity, especially as regards taking measures to preserve coherence of state across the processors and the amount of storage needed. There is therefore a demand for the states to be held in memory that is available to all processors but in such a way that the whole process is not slowed down to an unacceptable extent.
Figure 5 a illustrates schematically the type of problem that the present invention can overcome. The Figure represents a time line of a process involving conventional memory accessed via an on-chip bus in response to a request from one or other of two processors. There is assumed to be inter-processor serialisation. If processor 1, for example, issues a read request, the addressed data is read from memory and the data carried by the bus to the processor. The data is then modified in the processor and the result written back into memory via the bus. During this time, any other requests, for example from the second processor, are locked out. This is essential since the same data cannot be undergoing modification under the control of different processors at the same time. At the end of the lock-out period, the memory once again becomes available to the next request. It is obvious that the lock-out period imposes considerable constraints on the speed of the overall process executed by the individual processors.
The State Elements provide an alternative to this known approach, which requires an arrangement in which parallel processors read from memory, modify, writeback data, and request the memory to perform the modification on its behalf. Instead, the State Element in the preferred implementation of the invention positions the serialisation point, not within/between each processor, but in a simple shared processor which has local and rapid access to the memory in which the shared state variables are stored.
The state element is analogous to an object in Object Oriented Design. It has privately stored data which is accessible only via the objects' methods. By issuing commands, parallel processors could be considered to be making method calls to the object.
The preferred embodiment of state element comprises a small block of embedded memory 60 with single cycle read/write access time combined with a simple arithmetic and logic unit, as shown generically in Figure 6. The Arithmetic Unit 61 receives commands (from processors) which comprise an address, data and a command code. The address identifies the state variable which is to be accessed, the data provides operands which a simple computer uses to modify the variable, and the command 62 selects a locally stored thread of programmed microcode 63 which is able to read, modify and writeback the state variable within a very small number of system clock cycles. The result can be returned to the processor that issued the command. Details of the embodiment
A state element comprises an embedded memory and an attached function. The function could either be hardwired (a finite state machine) or a programmable, microcoded circuit. The latter approach is the more versatile and complex. A more complete picture of the system of component modules and their interconnection is shown in Figure 7. Note the presence of special function and condition blocks. These greatly extend the functional capability of the element.
The emphasis in state element design is on the rapid memory access speed, not the processing capability. Embedded memory blocks are small enough that single cycle access time is achievable. Configurable read/modify/write back (R/M/W) is possible within a two cycle period as it is possible to perform a simple arithmetic operation on the result of a read and have it turned around for writeback within the second cycle. Typically, a command could be fully processed within 3 to 5 clock cycles.
Figure 8 illustrates the simplicity of the arithmetic unit, and how the path between the command line and the memory has minimal delay. Figure 9 shows a more complex variant in which multiple items of state are held in memory. The impact on the command line turnaround (and microcode store size) is significant. However, this is not to say that the Figure 9 circuit could not be used in appropriate circumstances. For example, in a lower performance system with a more complex set of state it could be the preferred approach. General overviewofthe State Engine
A state engine can be built up in a structured and well defined manner using a state element as an atomic part. Just as atoms are the components of molecules, which may be the building blocks of simple cells, which then combine into simple organisms, state elements can be combined into state cells, which are multiplied into state arrays, which in turn may be grouped together to form state engines. Although specific state element designs are described later in this context, the present invention encompasses state engines using other state elements.
This hierarchical design framework is illustrated in Figure 1. The component parts shown are:
State Record - This is a conceptual entity and consists of a group of one or more state variables which have a given base address.
Command line - A message sent by a processor to the state engine. Fields in the command line include command code, address and data. The processor is effectively requesting that the function indexed by the command code be performed on the state record at the given address. Parameters can be both supplied and returned in the general purpose data field.
State Element - A state element is a small, private memory which contains state variables accessible only via functions executed by the state element's control logic. Functions typically read a state variable, perform some modification and write a new value back. The result may also be recorded in a data field in the command line. The primary role of the state element is to manage the state access serialisation point by executing a simple function on memory at maximum speed. A specific implementation of a state element has already been described in this specification. State Cell - If there is more than one state variable in a record, it is permissible for the entire record to be stored as an entry within a single state element. However, as each field in the record would need to be processed in turn this would throttle the available bandwidth to the state. In the State Cell each field of the record is stored as a single state variable in its own State Element. These State Elements are then chained together in a pipeline. The command line passes from one Element to the next, the same address and control word being used at each stage to pick a different field from a common record and perform some function on it. State cell logic provides synchronisation between its constituent Elements which effectively make up a memory oriented pipelined processing system.
The primary role of the State Cell is thus to provide a means of constructing simple, pipelined processors which enable more complex state records to be handled at high speeds.
State Array - The embedded memory used in the State Elements of State Cells must be relatively small in volume for rapid (ideally single cycle) access. This places a limit on the number of instances of a state record which may be stored in a single State Cell. To increase the quantity of state, State Cells of a given type can be tiled to form a large State Array. Scaling during device layout is simplified by the State Array interconnect. The segmentation of an interconnection framework and the coupling of adjacent Cells in a tiled array using well defined interfaces is shown in Figure 2. The interconnect preserves order between accesses to the same State Cells. Since order preservation amongst command lines accessing different State Cells is not required, there is no need for the latency of command line accesses to different Cells across the array to be balanced. The Array is scalable in a simple way and is layout- friendly.
Increasing the total state storage volume by multiplying State Cells can also increase overall state access bandwidth as the throughput of an individual State Cell is likely to be a little lower than that of the interconnect. If the number of State Cells is increased to the point that the interconnect becomes the limiting factor then aggregate throughput can be further increased by providing multiple interconnect channels, each channel accessing a different portion of the array (ie. table). This is analogous to designing a memory system with multiple, independently addressable channels to increase random access bandwidth.
The primary role of the state array is to provide scalable capacity. It also provides a means for scaling address and data bandwidth.
State Engine - The State Engine combines State Arrays with all the additional glue logic and facilities that are required to construct a block which can be configured and accessed via a system bus. Components include:
• Bus interface logic
• System control logic - The state engine controller may issue (private) system commands to the state arrays. These commands are invoked by external blocks through accesses to the controller via the utility bus interface. Only (public) state commands may arrive via the main data flow interfaces. System commands configure the arrays or extract diagnostic information.
• Bypass logic - Bypass modes enable commands to skip arrays which they are not required to access. This will conserve power and bandwidth. The required extraction and insertion points can also be used by the system controller.
• Inter-array switch connectors - This involves a new application of (Banyan) switching technology for routing accesses between tables. It may only be required when there is more than one independent route through each State Array.
State Engine behaviours include:
• Message broadcasting - System commands can be broadcast throughout the memory arrays for retrieving status or passing configuration and control messages. This method is also used for loading microcode into state arrays.
• Multiple accesses - If multiple arrays are connected in a pipe then it is evident that each command line must contain different address and command information for each array. A single command issued from the processor thus results in multiple state accesses.
• Command line "morphing" - As command lines propagate from array to array they are used and sometimes updated as a result of each state element access. The data inserted into the command by state elements in one array could be used by the state elements in the next. Data and perhaps even addresses could be modified.
Details of the embodiment
The State Cell, which "stitches" State Elements together in a pipeline, is shown in Figure 3. A pipeline of State Elements making up the State Cell store the component variables for a set of state records. Commands to access these state records arrive at the first State Element. The control field is used to determine the update made to the first component of the state record, in this example via a microcoded controller. The command line is then passed on to the next State Element to update the next component of the state record. This is repeated for the length of the pipeline in the State Cell. The result from the final stage of the pipeline is returned to the requesting processor.
The architecture of the State Array, and the interconnection of State Engine components is illustrated in Figure 4. This uses standard/known routing and load balancing techniques to allocate incoming command lines to the appropriate state cells.
Additional features
Load balancing - It is possible that state records may be allocated dynamically on demand (and also deassigned). If multiple paths exist through a given array then it is desirable for the stored state to be spread evenly across the available State
Elements/Cells. The availability of state entries in such a system could be advertised by the Controller in such a way as to ensure that records are assigned from each
Element in turn, thus balancing the load.
In essence, the preferred implementation of the invention therefore provides the following features, in which all of the specified issues associated with high speed data lookup by parallel processors are addressed:
• A formal framework for creating a parallel coprocessor using smart memory (typically state elements).
• Single access, multiple lookups - A single access acts upon multiple, independent state tables within the state engine, ie. multiple lookups into different tables held in different memories as a result of a single request from the bus.
• Pipelined architecture - Lookups into different tables are not fired off from a point source into different memories. Instead, the access itself (in the form of a command line) is routed from table to table in a serial fashion. It is an object which travels through the State Engine.
• Command line "morphing" - As command lines propagate along the pipe from table to table they are used and sometimes updated as a result of each table access. The data inserted into the command by one table could be used by the state elements in the next.
• State cell concept - high throughput pipelined processing (scalable processing power)
• State array concept - "layout friendly" scheme for scaling quantity of state, bandwidth and load balancing
• State engine concept - multiple orthogonal lookups from a single command uses switching technology for multi-lane state engine architectures. Controller provides system commands for data and instruction broadcasts.
State Elements can interact. The problems addressed by the present invention are particularly suited to application to array processors accessing shared state in a Traffic Handling application. State engines were conceived as a way to arrange the state elements (required for managing state contention) in a way that addressed the additional issue of a high rate of state access.
However, State Engines can also be architected from the same or similar state elements to meet the needs of other applications - for instance meter management in the related area of Traffic Conditioning. State engines could therefore be used to deliver state element technology to any other application in which parallel (or even pipelined) processors access shared state at high rates. Additional optional features
System threads: - Background, system threads could be programmed to operate on the data in the state memory when commands from processors are not being serviced. For instance, could be useful for identifying state entries which are idle. Find free queue algorithm: - Find_free_queue system function. This is a background thread which implements a "Two strikes and out" algorithm for de-assigning state entries used to represent/manage queues which go idle (ie. empty). Special function units: - The "flag unit" and "address unit" are special function units designed to support the find free queue algorithm. The features they provide are considered to be of generic value and could be used by other algorithms (such as that required for maintaining meters in state elements)
Scheduling algorithm: - The information required by the Self-Clocked Fair Queueing algorithm cannot be mapped directly into the state element. It is represented in a form which makes access and manipulation more robust and efficient.
It can therefore be appreciated that the State Elements forming part of the present invention can provide the following features:
• Intelligent memory - The state element localises the serialisation of parallel data accesses at the memory end, not the processor end. This greatly reduces the latency commonly associated with the blocking of state.
• Functional versatility - The state element provides a number of (configureάV)programmed remote functions which may be performed on the stored data - functions would comprise a small number of data read, write, arithmetic operations and conditional accesses.
Flexibility - The functions can (but need not necessarily) be expressed in microcode so that the state element remains programmable and does not "tie" software executing on the processor to functions hardwired into the state element.
• System efficiency - The read/writeback occurs between the ALU and the memory inside the state element. Only the command travels across the system bus. This reduces the burden on the system bus as compared with conventional approaches.
• System simplicity - The read/modify/write is encapsulated within the state element and serialisation is inherently enforced by the state element logic. Processors can simultaneously issue commands which will cause a function to act on the same item of state without having to first negotiate with one-another.
• There is no need to sort results on return - there is an automatic return to the requesting processor.
It is recognised that contention is not an issue exclusive to Traffic Handling, therefore state elements could also be used as a general purpose tool in support of parallel processors in any application. Contention can arise when any two processors in a realtime (data flow processing) system require R/M/W access to a shared state variable. State Elements could therefore be used in conjunction with any parallel or pipelined arrangement of processors.

Claims

Claims
1. A parallel processor comprising state element means providing coherent parallel accesses to shared state.
2. A parallel processor as claimed in claim 1, wherein said parallel processor is an array processor.
3. A parallel processor as claimed in claim 2, wherein said array processor is a SIMD processor.
4. A parallel processor as claimed in any of the preceding claims, further comprising means to serialise and/or synchronise multiple accesses/updates to said shared state.
5. A parallel processor as claimed in any of the preceding claims, wherein said state comprises a single item of state.
6. A parallel processor as claimed in any of claims 1 to 4, wherein said state comprises multiple items of state.
7. A parallel processor as claimed in any of claims 1 to 4, wherein said state comprises a single storage location or a data structure in storage.
8. A parallel processor as claimed in claim 1, wherein operations on said state are carried out as a fixed or hardwired set of operations.
9. A parallel processor as claimed in claim 8, further comprising means to supply data to update said state.
10. A parallel processor as claimed in claim 8, further comprising means for sending a command and data to said state, whereby said operations are programmable.
11. A parallel processor as claimed in claim 1, further comprising a plurality of said state element means organised into state cell means, whereby operations on said state can be pipelined.
12. A parallel processor as claimed in claim 11, further comprising a plurality of said state cell means, whereby to allow multiple requests in relation to said state to be handled concurrently.
13. A parallel processor as claimed in claim 12, further comprising input and output interconnect means providing access to and from said state cell means, a bus interface for said input and output interconnect means, said bus interface interfacing with a system bus, and a control unit interconnected with said system bus for controlling accesses to said state.
14. A parallel processor as claimed in any of claims 11 to 13, wherein each said state element means comprises local memory, and each field of a data record is stored in a respective memory of a respective state element means.
15. A parallel processor as claimed in any of the preceding claims, wherein each said state element means comprises a local memory for said state, an arithmetic unit adapted to perform an operation on said state in said local memory, and command and control logic to control said operation.
16. A computer system comprising a parallel processor as claimed in any of the preceding claims.
17. A network processor comprising a parallel processor as claimed in any of the preceding claims.
18. A parallel processor as claimed in any of the preceding claims, implemented on a single silicon chip.
PCT/GB2003/004867 2002-11-11 2003-11-11 State engine for data processor WO2004044733A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2003283545A AU2003283545A1 (en) 2002-11-11 2003-11-11 State engine for data processor
GB0509997A GB2411271B (en) 2002-11-11 2003-11-11 State engine for data processor
US10/534,430 US7882312B2 (en) 2002-11-11 2003-11-11 State engine for data processor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0226249.1 2002-11-11
GBGB0226249.1A GB0226249D0 (en) 2002-11-11 2002-11-11 Traffic handling system

Publications (2)

Publication Number Publication Date
WO2004044733A2 true WO2004044733A2 (en) 2004-05-27
WO2004044733A3 WO2004044733A3 (en) 2005-03-31

Family

ID=9947583

Family Applications (4)

Application Number Title Priority Date Filing Date
PCT/GB2003/004893 WO2004045162A2 (en) 2002-11-11 2003-11-11 Traffic management architecture
PCT/GB2003/004867 WO2004044733A2 (en) 2002-11-11 2003-11-11 State engine for data processor
PCT/GB2003/004866 WO2004045161A1 (en) 2002-11-11 2003-11-11 Packet storage system for traffic handling
PCT/GB2003/004854 WO2004045160A2 (en) 2002-11-11 2003-11-11 Data packet handling in computer or communication systems

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/004893 WO2004045162A2 (en) 2002-11-11 2003-11-11 Traffic management architecture

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/GB2003/004866 WO2004045161A1 (en) 2002-11-11 2003-11-11 Packet storage system for traffic handling
PCT/GB2003/004854 WO2004045160A2 (en) 2002-11-11 2003-11-11 Data packet handling in computer or communication systems

Country Status (5)

Country Link
US (5) US7522605B2 (en)
CN (4) CN1736069B (en)
AU (4) AU2003283539A1 (en)
GB (5) GB0226249D0 (en)
WO (4) WO2004045162A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017164804A1 (en) * 2016-03-23 2017-09-28 Clavister Ab Method for traffic shaping using a serial packet processing algorithm and a parallel packet processing algorithm
US10924416B2 (en) 2016-03-23 2021-02-16 Clavister Ab Method for traffic shaping using a serial packet processing algorithm and a parallel packet processing algorithm

Families Citing this family (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004524617A (en) 2001-02-14 2004-08-12 クリアスピード・テクノロジー・リミテッド Clock distribution system
GB0226249D0 (en) * 2002-11-11 2002-12-18 Clearspeed Technology Ltd Traffic handling system
US7210059B2 (en) 2003-08-19 2007-04-24 Micron Technology, Inc. System and method for on-board diagnostics of memory modules
US7310752B2 (en) * 2003-09-12 2007-12-18 Micron Technology, Inc. System and method for on-board timing margin testing of memory modules
US7120743B2 (en) 2003-10-20 2006-10-10 Micron Technology, Inc. Arbitration system and method for memory responses in a hub-based memory system
US6944636B1 (en) * 2004-04-30 2005-09-13 Microsoft Corporation Maintaining time-date information for syncing low fidelity devices
US7310748B2 (en) * 2004-06-04 2007-12-18 Micron Technology, Inc. Memory hub tester interface and method for use thereof
US8316431B2 (en) * 2004-10-12 2012-11-20 Canon Kabushiki Kaisha Concurrent IPsec processing system and method
US20060101210A1 (en) * 2004-10-15 2006-05-11 Lance Dover Register-based memory command architecture
US20060156316A1 (en) * 2004-12-18 2006-07-13 Gray Area Technologies System and method for application specific array processing
EP1832054B1 (en) * 2004-12-23 2018-03-21 Symantec Corporation Method and apparatus for network packet capture distributed storage system
US20100195538A1 (en) * 2009-02-04 2010-08-05 Merkey Jeffrey V Method and apparatus for network packet capture distributed storage system
US7392229B2 (en) * 2005-02-12 2008-06-24 Curtis L. Harris General purpose set theoretic processor
US7746784B2 (en) * 2006-03-23 2010-06-29 Alcatel-Lucent Usa Inc. Method and apparatus for improving traffic distribution in load-balancing networks
US8065249B1 (en) 2006-10-13 2011-11-22 Harris Curtis L GPSTP with enhanced aggregation functionality
US7774286B1 (en) 2006-10-24 2010-08-10 Harris Curtis L GPSTP with multiple thread functionality
US8166212B2 (en) * 2007-06-26 2012-04-24 Xerox Corporation Predictive DMA data transfer
US7830918B2 (en) * 2007-08-10 2010-11-09 Eaton Corporation Method of network communication, and node and system employing the same
JP5068125B2 (en) * 2007-09-25 2012-11-07 株式会社日立国際電気 Communication device
US8521732B2 (en) 2008-05-23 2013-08-27 Solera Networks, Inc. Presentation of an extracted artifact based on an indexing technique
US8625642B2 (en) 2008-05-23 2014-01-07 Solera Networks, Inc. Method and apparatus of network artifact indentification and extraction
US8004998B2 (en) * 2008-05-23 2011-08-23 Solera Networks, Inc. Capture and regeneration of a network data using a virtual software switch
US20090292736A1 (en) * 2008-05-23 2009-11-26 Matthew Scott Wood On demand network activity reporting through a dynamic file system and method
JP5300355B2 (en) * 2008-07-14 2013-09-25 キヤノン株式会社 Network protocol processing apparatus and processing method thereof
US9213665B2 (en) * 2008-10-28 2015-12-15 Freescale Semiconductor, Inc. Data processor for processing a decorated storage notify
US8627471B2 (en) * 2008-10-28 2014-01-07 Freescale Semiconductor, Inc. Permissions checking for data processing instructions
CA2754181C (en) 2009-03-18 2016-08-02 Texas Research International, Inc. Environmental damage sensor
US8266498B2 (en) 2009-03-31 2012-09-11 Freescale Semiconductor, Inc. Implementation of multiple error detection schemes for a cache
US20110125748A1 (en) * 2009-11-15 2011-05-26 Solera Networks, Inc. Method and Apparatus for Real Time Identification and Recording of Artifacts
US20110125749A1 (en) * 2009-11-15 2011-05-26 Solera Networks, Inc. Method and Apparatus for Storing and Indexing High-Speed Network Traffic Data
US8472455B2 (en) * 2010-01-08 2013-06-25 Nvidia Corporation System and method for traversing a treelet-composed hierarchical structure
US8295287B2 (en) * 2010-01-27 2012-10-23 National Instruments Corporation Network traffic shaping for reducing bus jitter on a real time controller
US8990660B2 (en) 2010-09-13 2015-03-24 Freescale Semiconductor, Inc. Data processing system having end-to-end error correction and method therefor
US8504777B2 (en) 2010-09-21 2013-08-06 Freescale Semiconductor, Inc. Data processor for processing decorated instructions with cache bypass
US8667230B1 (en) 2010-10-19 2014-03-04 Curtis L. Harris Recognition and recall memory
KR20120055779A (en) * 2010-11-23 2012-06-01 한국전자통신연구원 System and method for communicating audio data based zigbee and method thereof
KR20120064576A (en) * 2010-12-09 2012-06-19 한국전자통신연구원 Apparatus for surpporting continuous read/write in asymmetric storage system and method thereof
US8849991B2 (en) 2010-12-15 2014-09-30 Blue Coat Systems, Inc. System and method for hypertext transfer protocol layered reconstruction
US8666985B2 (en) 2011-03-16 2014-03-04 Solera Networks, Inc. Hardware accelerated application-based pattern matching for real time classification and recording of network traffic
US8566672B2 (en) 2011-03-22 2013-10-22 Freescale Semiconductor, Inc. Selective checkbit modification for error correction
US8607121B2 (en) 2011-04-29 2013-12-10 Freescale Semiconductor, Inc. Selective error detection and error correction for a memory interface
US8990657B2 (en) 2011-06-14 2015-03-24 Freescale Semiconductor, Inc. Selective masking for error correction
US9525642B2 (en) 2012-01-31 2016-12-20 Db Networks, Inc. Ordering traffic captured on a data connection
US9100291B2 (en) 2012-01-31 2015-08-04 Db Networks, Inc. Systems and methods for extracting structured application data from a communications link
US9092318B2 (en) * 2012-02-06 2015-07-28 Vmware, Inc. Method of allocating referenced memory pages from a free list
US9665233B2 (en) * 2012-02-16 2017-05-30 The University Utah Research Foundation Visualization of software memory usage
WO2014110281A1 (en) 2013-01-11 2014-07-17 Db Networks, Inc. Systems and methods for detecting and mitigating threats to a structured data storage system
CN103338159B (en) * 2013-06-19 2016-08-10 华为技术有限公司 Polling dispatching implementation method and device
WO2015085087A1 (en) * 2013-12-04 2015-06-11 Db Networks, Inc. Ordering traffic captured on a data connection
JP6249403B2 (en) * 2014-02-27 2017-12-20 国立研究開発法人情報通信研究機構 Optical delay line and electronic buffer fusion type optical packet buffer control device
US10210592B2 (en) 2014-03-30 2019-02-19 Teoco Ltd. System, method, and computer program product for efficient aggregation of data records of big data
WO2016145405A1 (en) * 2015-03-11 2016-09-15 Protocol Insight, Llc Intelligent packet analyzer circuits, systems, and methods
KR102449333B1 (en) 2015-10-30 2022-10-04 삼성전자주식회사 Memory system and read request management method thereof
CN107786465B (en) * 2016-08-27 2021-06-04 华为技术有限公司 Method and device for processing low-delay service flow
WO2018081582A1 (en) * 2016-10-28 2018-05-03 Atavium, Inc. Systems and methods for random to sequential storage mapping
CN107656895B (en) * 2017-10-27 2023-07-28 上海力诺通信科技有限公司 Orthogonal platform high-density computing architecture with standard height of 1U
RU2718215C2 (en) * 2018-09-14 2020-03-31 Общество С Ограниченной Ответственностью "Яндекс" Data processing system and method for detecting jam in data processing system
US11138044B2 (en) * 2018-09-26 2021-10-05 Micron Technology, Inc. Memory pooling between selected memory resources
US11093403B2 (en) 2018-12-04 2021-08-17 Vmware, Inc. System and methods of a self-tuning cache sizing system in a cache partitioning system
EP3866417A1 (en) * 2020-02-14 2021-08-18 Deutsche Telekom AG Method for an improved traffic shaping and/or management of ip traffic in a packet processing system, telecommunications network, network node or network element, program and computer program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751987A (en) * 1990-03-16 1998-05-12 Texas Instruments Incorporated Distributed processing memory chip with embedded logic having both data memory and broadcast memory
US6097403A (en) * 1998-03-02 2000-08-01 Advanced Micro Devices, Inc. Memory including logic for operating upon graphics primitives

Family Cites Families (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4914650A (en) * 1988-12-06 1990-04-03 American Telephone And Telegraph Company Bandwidth allocation and congestion control scheme for an integrated voice and data network
US5187780A (en) * 1989-04-07 1993-02-16 Digital Equipment Corporation Dual-path computer interconnect system with zone manager for packet memory
US5280483A (en) * 1990-08-09 1994-01-18 Fujitsu Limited Traffic control system for asynchronous transfer mode exchange
US5765011A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams
ATE180586T1 (en) * 1990-11-13 1999-06-15 Ibm PARALLEL ASSOCIATIVE PROCESSOR SYSTEM
JP2596718B2 (en) * 1993-12-21 1997-04-02 インターナショナル・ビジネス・マシーンズ・コーポレイション How to manage network communication buffers
US5949781A (en) * 1994-08-31 1999-09-07 Brooktree Corporation Controller for ATM segmentation and reassembly
US5513134A (en) 1995-02-21 1996-04-30 Gte Laboratories Incorporated ATM shared memory switch with content addressing
US5633865A (en) * 1995-03-31 1997-05-27 Netvantage Apparatus for selectively transferring data packets between local area networks
DE69841486D1 (en) * 1997-05-31 2010-03-25 Texas Instruments Inc Improved packet switching
US6757798B2 (en) * 1997-06-30 2004-06-29 Intel Corporation Method and apparatus for arbitrating deferred read requests
US5956340A (en) * 1997-08-05 1999-09-21 Ramot University Authority For Applied Research And Industrial Development Ltd. Space efficient fair queuing by stochastic Memory multiplexing
US6088771A (en) * 1997-10-24 2000-07-11 Digital Equipment Corporation Mechanism for reducing latency of memory barrier operations on a multiprocessor system
US6052375A (en) * 1997-11-26 2000-04-18 International Business Machines Corporation High speed internetworking traffic scaler and shaper
US6359879B1 (en) * 1998-04-24 2002-03-19 Avici Systems Composite trunking
CA2331820A1 (en) * 1998-05-07 1999-11-11 Cabletron Systems, Inc. Multiple priority buffering in a computer network
US6314489B1 (en) * 1998-07-10 2001-11-06 Nortel Networks Limited Methods and systems for storing cell data using a bank of cell buffers
US6356546B1 (en) * 1998-08-11 2002-03-12 Nortel Networks Limited Universal transfer method and network with distributed switch
US6829218B1 (en) * 1998-09-15 2004-12-07 Lucent Technologies Inc. High speed weighted fair queuing system for ATM switches
US6396843B1 (en) * 1998-10-30 2002-05-28 Agere Systems Guardian Corp. Method and apparatus for guaranteeing data transfer rates and delays in data packet networks using logarithmic calendar queues
SE9803901D0 (en) * 1998-11-16 1998-11-16 Ericsson Telefon Ab L M a device for a service network
US6246682B1 (en) * 1999-03-05 2001-06-12 Transwitch Corp. Method and apparatus for managing multiple ATM cell queues
US6952401B1 (en) * 1999-03-17 2005-10-04 Broadcom Corporation Method for load balancing in a network switch
US6574231B1 (en) * 1999-05-21 2003-06-03 Advanced Micro Devices, Inc. Method and apparatus for queuing data frames in a network switch port
US6671292B1 (en) * 1999-06-25 2003-12-30 Telefonaktiebolaget Lm Ericsson (Publ) Method and system for adaptive voice buffering
US6643298B1 (en) * 1999-11-23 2003-11-04 International Business Machines Corporation Method and apparatus for MPEG-2 program ID re-mapping for multiplexing several programs into a single transport stream
US7102999B1 (en) * 1999-11-24 2006-09-05 Juniper Networks, Inc. Switching device
ATE392074T1 (en) * 2000-02-28 2008-04-15 Alcatel Lucent ARRANGEMENT FACILITY AND ARRANGEMENT PROCESS
US6662263B1 (en) * 2000-03-03 2003-12-09 Multi Level Memory Technology Sectorless flash memory architecture
ATE331369T1 (en) * 2000-03-06 2006-07-15 Ibm SWITCHING DEVICE AND METHOD
US6907041B1 (en) * 2000-03-07 2005-06-14 Cisco Technology, Inc. Communications interconnection network with distributed resequencing
CA2301973A1 (en) * 2000-03-21 2001-09-21 Spacebridge Networks Corporation System and method for adaptive slot-mapping input/output queuing for tdm/tdma systems
US6975629B2 (en) * 2000-03-22 2005-12-13 Texas Instruments Incorporated Processing packets based on deadline intervals
US7139282B1 (en) * 2000-03-24 2006-11-21 Juniper Networks, Inc. Bandwidth division for packet processing
CA2337674A1 (en) * 2000-04-20 2001-10-20 International Business Machines Corporation Switching arrangement and method
JP4484317B2 (en) * 2000-05-17 2010-06-16 株式会社日立製作所 Shaping device
US6937561B2 (en) 2000-06-02 2005-08-30 Agere Systems Inc. Method and apparatus for guaranteeing data transfer rates and enforcing conformance with traffic profiles in a packet network
JP3640160B2 (en) * 2000-07-26 2005-04-20 日本電気株式会社 Router device and priority control method used therefor
DE60119866T2 (en) * 2000-09-27 2007-05-10 International Business Machines Corp. Switching device and method with separate output buffers
US20020062415A1 (en) * 2000-09-29 2002-05-23 Zarlink Semiconductor N.V. Inc. Slotted memory access method
US6647477B2 (en) * 2000-10-06 2003-11-11 Pmc-Sierra Ltd. Transporting data transmission units of different sizes using segments of fixed sizes
US6871780B2 (en) * 2000-11-27 2005-03-29 Airclic, Inc. Scalable distributed database system and method for linking codes to internet information
US6888848B2 (en) * 2000-12-14 2005-05-03 Nortel Networks Limited Compact segmentation of variable-size packet streams
US7035212B1 (en) * 2001-01-25 2006-04-25 Optim Networks Method and apparatus for end to end forwarding architecture
US20020126659A1 (en) * 2001-03-07 2002-09-12 Ling-Zhong Liu Unified software architecture for switch connection management
US6728857B1 (en) * 2001-06-20 2004-04-27 Cisco Technology, Inc. Method and system for storing and retrieving data using linked lists
US7382787B1 (en) * 2001-07-30 2008-06-03 Cisco Technology, Inc. Packet routing and switching device
US7349403B2 (en) * 2001-09-19 2008-03-25 Bay Microsystems, Inc. Differentiated services for a network processor
US6900920B2 (en) * 2001-09-21 2005-05-31 The Regents Of The University Of California Variable semiconductor all-optical buffer using slow light based on electromagnetically induced transparency
US20030081623A1 (en) * 2001-10-27 2003-05-01 Amplify.Net, Inc. Virtual queues in a single queue in the bandwidth management traffic-shaping cell
US7215666B1 (en) * 2001-11-13 2007-05-08 Nortel Networks Limited Data burst scheduling
US20030145086A1 (en) * 2002-01-29 2003-07-31 O'reilly James Scalable network-attached storage system
US20040022094A1 (en) * 2002-02-25 2004-02-05 Sivakumar Radhakrishnan Cache usage for concurrent multiple streams
US6862639B2 (en) * 2002-03-11 2005-03-01 Harris Corporation Computer system including a receiver interface circuit with a scatter pointer queue and related methods
US7126959B2 (en) * 2002-03-12 2006-10-24 Tropic Networks Inc. High-speed packet memory
US6928026B2 (en) * 2002-03-19 2005-08-09 Broadcom Corporation Synchronous global controller for enhanced pipelining
US20030188056A1 (en) * 2002-03-27 2003-10-02 Suresh Chemudupati Method and apparatus for packet reformatting
US7239608B2 (en) * 2002-04-26 2007-07-03 Samsung Electronics Co., Ltd. Router using measurement-based adaptable load traffic balancing system and method of operation
JP3789395B2 (en) * 2002-06-07 2006-06-21 富士通株式会社 Packet processing device
US20040039884A1 (en) * 2002-08-21 2004-02-26 Qing Li System and method for managing the memory in a computer system
US6950894B2 (en) * 2002-08-28 2005-09-27 Intel Corporation Techniques using integrated circuit chip capable of being coupled to storage system
US7180899B2 (en) * 2002-10-29 2007-02-20 Cisco Technology, Inc. Multi-tiered Virtual Local area Network (VLAN) domain mapping mechanism
GB0226249D0 (en) * 2002-11-11 2002-12-18 Clearspeed Technology Ltd Traffic handling system
KR100532325B1 (en) * 2002-11-23 2005-11-29 삼성전자주식회사 Input control method and apparatus for turbo decoder
GB2421158B (en) * 2003-10-03 2007-07-11 Avici Systems Inc Rapid alternate paths for network destinations
US7668100B2 (en) * 2005-06-28 2010-02-23 Avaya Inc. Efficient load balancing and heartbeat mechanism for telecommunication endpoints

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5751987A (en) * 1990-03-16 1998-05-12 Texas Instruments Incorporated Distributed processing memory chip with embedded logic having both data memory and broadcast memory
US6097403A (en) * 1998-03-02 2000-08-01 Advanced Micro Devices, Inc. Memory including logic for operating upon graphics primitives

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ELLIOTT D G ET AL: "COMPUTATIONAL RAM: IMPLEMENTING PROCESSORS IN MEMORY" IEEE DESIGN & TEST OF COMPUTERS, IEEE COMPUTERS SOCIETY. LOS ALAMITOS, US, vol. 16, no. 1, January 1999 (1999-01), pages 32-41, XP000823403 ISSN: 0740-7475 *
INGERSOLL S ET AL: "Dataflow computation with intelligent memories emulated on field-programmable gate arrays (FPGAs)" MICROPROCESSORS AND MICROSYSTEMS, IPC BUSINESS PRESS LTD. LONDON, GB, vol. 26, no. 6, 10 August 2002 (2002-08-10), pages 263-280, XP004372301 ISSN: 0141-9331 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017164804A1 (en) * 2016-03-23 2017-09-28 Clavister Ab Method for traffic shaping using a serial packet processing algorithm and a parallel packet processing algorithm
US10924416B2 (en) 2016-03-23 2021-02-16 Clavister Ab Method for traffic shaping using a serial packet processing algorithm and a parallel packet processing algorithm

Also Published As

Publication number Publication date
GB2413031A (en) 2005-10-12
WO2004045160A2 (en) 2004-05-27
GB2412035B (en) 2006-12-20
WO2004045162A2 (en) 2004-05-27
CN1736066A (en) 2006-02-15
US20110069716A1 (en) 2011-03-24
GB2411271B (en) 2006-07-26
GB0226249D0 (en) 2002-12-18
CN1735878A (en) 2006-02-15
GB2412035A (en) 2005-09-14
WO2004044733A3 (en) 2005-03-31
GB2412537B (en) 2006-02-01
CN100557594C (en) 2009-11-04
GB0509997D0 (en) 2005-06-22
AU2003283539A1 (en) 2004-06-03
GB2413031B (en) 2006-03-15
US20050246452A1 (en) 2005-11-03
US7522605B2 (en) 2009-04-21
WO2004045160A8 (en) 2005-04-14
CN1736066B (en) 2011-10-05
AU2003283544A1 (en) 2004-06-03
GB0511588D0 (en) 2005-07-13
GB2411271A (en) 2005-08-24
WO2004045162A3 (en) 2004-09-16
US7882312B2 (en) 2011-02-01
AU2003283559A1 (en) 2004-06-03
AU2003283545A1 (en) 2004-06-03
CN1736068B (en) 2012-02-29
WO2004045160A3 (en) 2004-12-02
GB2412537A (en) 2005-09-28
WO2004045161A1 (en) 2004-05-27
US8472457B2 (en) 2013-06-25
GB0511587D0 (en) 2005-07-13
CN1736069A (en) 2006-02-15
GB0511589D0 (en) 2005-07-13
CN1736069B (en) 2012-07-04
US7843951B2 (en) 2010-11-30
US20050243829A1 (en) 2005-11-03
CN1736068A (en) 2006-02-15
US20050257025A1 (en) 2005-11-17
AU2003283545A8 (en) 2004-06-03
US20050265368A1 (en) 2005-12-01

Similar Documents

Publication Publication Date Title
US7882312B2 (en) State engine for data processor
US9787612B2 (en) Packet processing in a parallel processing environment
US11151033B1 (en) Cache coherency in multiprocessor system
US11157428B1 (en) Architecture and programming in a parallel processing environment with a tiled processor having a direct memory access controller
US10210117B2 (en) Computing architecture with peripherals
US6209020B1 (en) Distributed pipeline memory architecture for a computer system with even and odd pids
KR20210030282A (en) Host proxy on gateway
CN100440151C (en) Context pipelines
US20170351555A1 (en) Network on chip with task queues
US6272516B1 (en) Method and apparatus for handling cache misses in a computer system
CN110417670A (en) Traffic management for high bandwidth exchange
US5935235A (en) Method for branching to an instruction in a computer program at a memory address pointed to by a key in a data structure
KR20210029725A (en) Data through gateway
Govindarajan et al. Design and performance evaluation of a multithreaded architecture
US9727499B2 (en) Hardware first come first serve arbiter using multiple request buckets
Gottlieb et al. The NYU ultracomputer—designing a MIMD, shared-memory parallel machine
US9164794B2 (en) Hardware prefix reduction circuit
US20150058551A1 (en) Pico engine pool transactional memory architecture
Panda et al. Software Managed Distributed Memories in MPPAs
Wang et al. Damq sharing scheme for two physical channels in high performance router
Balou et al. The design and implementation of VOOM: a parallel virtual Object Oriented machine
Fang Design of an ATM switch and implementation of output scheduler
NZ716954B2 (en) Computing architecture with peripherals

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: GB0509997.3

Country of ref document: GB

WWE Wipo information: entry into national phase

Ref document number: 20038A8223X

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 10534430

Country of ref document: US

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Ref document number: JP