EP1163596A1 - Methods and apparatus for facilitating direct memory access - Google Patents

Methods and apparatus for facilitating direct memory access

Info

Publication number
EP1163596A1
EP1163596A1 EP00911873A EP00911873A EP1163596A1 EP 1163596 A1 EP1163596 A1 EP 1163596A1 EP 00911873 A EP00911873 A EP 00911873A EP 00911873 A EP00911873 A EP 00911873A EP 1163596 A1 EP1163596 A1 EP 1163596A1
Authority
EP
European Patent Office
Prior art keywords
data
recited
dma engine
memory
dma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00911873A
Other languages
German (de)
French (fr)
Other versions
EP1163596A4 (en
Inventor
Henry Stracovsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Infineon Technologies AG
Original Assignee
Infineon Technologies AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Infineon Technologies AG filed Critical Infineon Technologies AG
Publication of EP1163596A1 publication Critical patent/EP1163596A1/en
Publication of EP1163596A4 publication Critical patent/EP1163596A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers

Definitions

  • the present invention relates in general to computing systems and, more particularly, to direct memory access of a memory in a data processing system. More particularly still, the present invention is directed towards a method and apparatus that facilitates the direct accessing of a memory device by a high speed processor.
  • high performance processors are used to perform processing on packet oriented objects.
  • An example of this is an Ethernet to ATM protocol bridge.
  • the processor is typically called on to examine and modify some wrapper that encompasses a data pay load, for example.
  • the mechanics of data transfer are such that typically the data is deposited and subsequently moved out of main memory.
  • microprocessor oriented processing is typically an order of magnitude faster than main memory. It is not unusual for this type of application for a processor to spend as much as 50 percent of its cycles waiting for memory access to be completed.
  • a computing system having a central processing unit (CPU) and a main system memory
  • a DMA engine coupled to CPU performs a first DMA process that identifies a desired data set. Once identified, the DMA engine moves the identified desired data set from a main system memory to a memory segment that is temporally closer to the processor than is the main system memory.
  • the segment of memory is a scratch pad type memory that is incorporated into the processor.
  • a method for moving a desired data set from the main system memory to a memory segment that is temporally closer to the CPU than is the main system memory is described.
  • a plurality of data sets that includes the desired data set is distributed across a buffer pool included in the main system memory.
  • the desired data set is then identified as being stored in an associated local buffer that is part of the buffer pool.
  • the identified desired data set is fetched from the local buffer and moved from the local buffer to the memory segment that is temporally closer to the CPU than is the main system memory.
  • the segment of memory is a scratch pad type memory that is incorporated into the processor.
  • FIG. 1 shows a computing system in accordance with an embodiment of the invention
  • Fig. 2 illustrates the format of a data packet transmitted in a representative Ethernet network compliant with the IEEE 802.3 (1985) standard
  • FIG. 3 shows a flowchart detailing a process that describes an operation of the computing system in accordance with an embodiment of the invention
  • Fig. 4 shows a flowchart detailing a particular implementation of the identification shown in Fig. 3 in accordance with an embodiment of the invention
  • Fig. 5 illustrates a typical, general-purpose computer system suitable lor implementing the present invention.
  • Substantial improvements in memory access can be achieved by using a hardware based DMA (direct memory access) that moves pertinent data either to a cache memory or some high speed scratch pad memory that is temporally closer to a processor than is main memory.
  • DMA direct memory access
  • the processor supports an update operation of a snoop hit of a data write to memory by an alternate master.
  • the DMA performs a memory to memory transfer from main memory to the scratch pad memory.
  • a special DMA channel is typically required as data packets accumulate in memory before they are processed by the CPU.
  • the inventive DMA process performs a DMA fetch operation by fetching data into a ping-pong type buffer arrangement.
  • a ping-pong type buffer arrangement In this arrangement, at least two scratch pads or cache segments are used in such a way that data is fetched into one buffer or segment of cache while the processor is operating on the other buffer or segment of cache.
  • pertinent data instances are defined by an offset from a packet start and a packet size.
  • the offset portion is divided by the buffer size, the integer portion indicates that number of buffers that must be traversed and the remainder indicates the offset into the buffer itself.
  • the computing system 100 includes an I/O device 102 coupled by way of a DMA channel 104 to a main memory 106.
  • the main memory 106 is typically formed of dynamic random access memory (DRAM) arranged to store data and code.
  • the main memory 106 is, in turn, coupled to a DMA controller unit 108 that includes a DMA engine 110 configured as either hardware or software.
  • the DMA controller unit 108 also includes several data buffers 112-1 through 112-n where only 110-1 and 110-2 are shown.
  • each of the data buffers 112 are arranged to store a portion, or portions, of a data or data packet that has been pre-fetched as determined by the DMA engine 110.
  • the buffers 112 are coupled in a ping pong type arrangement in order to prevent the accumulation of a large number of data packets in the buffers 112. In this way, the pipelining of the DMA channel 104 substantially improves overall processor throughput.
  • the pre-fetched data is then made available to a processor unit 114 having, in some embodiments, an L2 type cache 116 memory coupled thereto.
  • the fetched data is stored in a memory segment 115 that is incorporated as part of the processor 114. It should be noted that when a cache memory is used, it must be able to update a copy of shared data from the bus when shared dirty data is broadcast to the bus.
  • FIG. 2 illustrates the format of a data packet transmitted in a representative Ethernet network compliant with the IEEE 802.3 (1985) standard.
  • a packet generally includes a preamble 202 which is 8 bytes long. The last byte (or octet) in the preamble is a start frame delimiter (not shown). After the start frame delimiter, a destination address (DA) 204 which is 6 bytes is used to identify the node that is to receive the Ethernet packet. Following DA 204, is a source address (SA) 206 which is 6 bytes long, SA 206 is used to identify the transmitting node directly on the transmitted packet.
  • DA destination address
  • SA source address
  • a length/type field (L/T) 208 (typically 2 bytes) is generally used to indicate the length and type of the data field that follows. As is well known in the art, if a length is provided, the packet is classified as an 802.3 packet, and if the type field is provided, the packet is classified as an Ethernet packet.
  • LLC data 210 The following data field is identified as LLC data 210 since the data field also includes information that may have been encoded by an LLC layer described below.
  • a pad 212 is also shown following LLC data 210.
  • a 4 byte cyclic redundancy check (CRC) field 214 is appended to the end of a packet in order to check for corrupted packets at a receiving end.
  • CRC cyclic redundancy check
  • the processor 114 would only be interested in a portion, or portion, of the Ethernet frame 200 (such the source address data field SA 206 and/or the destination address field DA 204). In these cases, the DMA engine 110 would parse a particular Ethernet frame and stored the parsed result in any number of arbitrary buffers included in the buffer pool 118 that are logically linked by, for example, an associated set of descriptors.
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • TCP takes care of keeping track of the individual units of data (called packets) that a message is divided into for efficient routing through the Internet. For example, when an HTML file is sent to a client from a Web server, the Transmission Control Protocol (TCP) program layer in that server divides the file into one or more packets, numbers the packets, and then forwards them individually to the IP program layer. Although each packet has the same destination IP address, it may get routed differently through the network. At the other end (the client program), TCP reassembles the individual packets and waits until they have arrived to forward them to the client.
  • TCP Transmission Control Protocol
  • TCP is known as a connection-oriented protocol, which means that a connection is established and maintained until such time as the message or messages to be exchanged by the application programs at each end have been exchanged.
  • TCP is responsible for ensuring that a message is divided into the packets that IP manages and for reassembling the packets back into the complete message at the other end.
  • a data packet with associated with the TCP program layer would have associated with it a TCP header that includes all information related to, for example, source and destination addresses.
  • the DMA engine 110 breaks the data up into several portions, only some of which may be pertinent to the current processor task. Since data packets can be relatively large, the DMA engine 110 distributes incoming data (in some embodiments based upon selecting pertinent portions of the data to be processed by the processor 114) into a series of buffers that in the described embodiment take the form of a buffer pool 118 that are logically defined by a particular memory management scheme. Since packets may thus span several buffers and so pertinent data from a wrapper may be located in any of a number of arbitrary buffers.
  • the buffers In order to fetch a data packet, the buffers must be logically linked.
  • the linking of buffers can be performed in any number of ways, such as in a descriptor ring, a linked list of descriptors, or simply as a linked list of buffers.
  • the DMA engine During the prefetch operation, the DMA engine must therefore be capable of "walking down" the list to the location of the pertinent data and then transfer only this data. If multiple instances of data are required, the DMA engine must continue to the next instance, transfer it to the scratch pad buffer or cache segment and if so required, proceed to the next instance.
  • data is pre-fetched from a particular one of buffers of the buffer pool 118 as required by the processor 114 based upon a descriptor which links the particular buffer to the data portion stored therein and stored in the buffer 112-2 where it is made available to the processor 114.
  • the DMA engine 110 can use the pointer to determine the next buffer for which data must be pre-fetched and stored in the buffer 112-1 which is made available to the processor 114 in a ping pong style management scheme.
  • a snooping protocol such as one protocol known in the art as the "Illinois' snooping protocol, can be used.
  • This particular protocol utilizes a local cache to update an entry from another cache or, as in the described embodiment, the DMA engine 110.
  • a programmer would define a
  • an I/O event causes the DMA channel 104 to bring data into the system memory 106 by distributing it over the buffer pool 120. Then the processor 114 starts a processing task on selected data, referred to as desired data, from a particular data set which is typically in the form of a data packet.
  • the processor 114 then initiates another DMA channel (not shown) to fetch the desired data set (i.e., selected) from a next packet to be processed from main memory to a location in the memory segment 115 (in the form of either a scratch pad or a cache memory) such the fetch occurs in parallel with the processing of the current packet by the processor 114.
  • the second DMA channel fetches the desired data into a designated location from the main memory 106 and then notifies the processor 114 when the fetching task is completed.
  • Fig. 3 a process 300 describing an operation of the computing system
  • the process 300 begins at 302 by the processor initiating a DMA process which in some embodiments can be include accessing data from a DMA register.
  • a descriptor is retrieved for the current data packet after which at 305, appropriate data offsets associated with the desired data are calculated.
  • a local buffer is identified that contains the desired data.
  • the local buffer containing the desired data is identified by way of a "pointer walk" process described in further detail in Fig. 4. Once the desired local buffer has been identified, the data offset in the identified local buffer identified is calculated at 308 after which the desired data packet (or portion) is moved to a selected cache element at 310.
  • the cache element is temporally closer to the processor than is, for example, a main memory.
  • a determination is made whether or not the calculated offset is the last offset. If it is determined that the calculated offset is not the last offset, then control is passed back to 306 where the DMA process is instantiated. However, if it is determined that the calculated offset is the last offset, then the processor is notified that appropriate data is available to be retrieved from the cache element at 314.
  • FIG. 4 shows a flowchart of a process 400 describing a particular implementation of the identification operation 306, and more particularly, the pointer walk process mentioned above, in accordance with a particular embodiment of the invention. It should be noted that although the identification operation described is directed at the pointer walk process carried that is typically carried out the by DMA engine 110, that any appropriate identification process can in fact be used with the invention.
  • the pointer walk process 400 begins at 402 by adding a current buffer size to a passed data accumulator (DPA).
  • DPA passed data accumulator
  • DPA passed data accumulator
  • Fig. 5 illustrates a typical, general-purpose computer system 500 suitable for implementing the present invention.
  • the computer system 500 includes any number of processors 502 (also referred to as central processing units, or CPUs) that are coupled to memory devices including primary storage devices 504 (typically a read only memory, or ROM) and primary storage devices 506 (typically a random access memory, or RAM).
  • processors 502 also referred to as central processing units, or CPUs
  • memory devices including primary storage devices 504 (typically a read only memory, or ROM) and primary storage devices 506 (typically a random access memory, or RAM).
  • primary storage devices 504 typically a read only memory, or ROM
  • RAM random access memory
  • Computer system 500 or, more specifically, CPUs 502, may be arranged to support a virtual machine, as will be appreciated by those skilled in the art.
  • ROM acts to transfer data and instructions uni- directionally to the CPUs 502, while RAM is used typically to transfer data and instructions in a bi-directional manner.
  • CPUs 502 may generally include any number of processors.
  • Both primary storage devices 504, 506 may include any suitable computer-readable media.
  • a secondary storage medium 508, which is typically a mass memory device, is also coupled bi- directionally to CPUs 502 and provides additional data storage capacity.
  • the mass memory device 508 is a computer-readable medium that may be used to store programs including computer code, data, and the like.
  • mass memory device 508 is a storage medium such as a hard disk or a tape which generally slower than primary storage devices 504, 506.
  • Mass memory storage device 508 may take the form of a magnetic or paper tape reader or some other well-known device. It will be appreciated that the information retained within the mass memory device 508, may, in appropriate cases, be incorporated in standard fashion as part of RAM 506 as virtual memory.
  • a specific primary storage device 504 such as a CD- ROM may also pass data uni-directionally to the CPUs 502.
  • CPUs 502 are also coupled to one or more input/output devices 510 that may include, but are not limited to, devices such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • CPUs 502 optionally may be coupled to a computer or telecommunications network, e.g., an Internet network or an intranet network, using a network connection as shown generally at 512. With such a network connection, it is contemplated that the CPUs 502 might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • Such information which is often represented as a sequence of instructions to be executed using CPUs 502, may be received from and outputted to the network, for example, in the form of a computer data signal embodied in a carrier wave.
  • the above-described devices and materials will be familiar to those of skill in the computer hardware and software arts.

Abstract

Methods and apparatus for facilitating a direct memory access in a computing system are disclosed. In one embodiment (Figure), a computing system (100) having a central processing unit (CPU) (114) and a main system memory (106), a DMA engine (110) coupled to CPU (114) performs a first DMA process that identifies a desired data set. Once identified, the DMA engine (110) moves the identified desired data set from a main system memory (106) to a memory segment (115) that is temporally closer to the process or (114) than is the main system memory (106).

Description

METHODS AND APPARATUS FOR FACILITATING DIRECT MEMORY ACCESS
CROSS-REFERENCE TO A RELATED APPLICATION
This application is takes priority under 35 U.S. C. §119(e) of International Patent
Application No 60/121,022 filed February 22, 1999 (Attorney Docket No.: SCHIP003P) naming Henry Stracovsky as inventor and assigned to the assignee of the present application which is also incorporated herein by reference for all purposes.
FIELD OF THE INVENTION
The present invention relates in general to computing systems and, more particularly, to direct memory access of a memory in a data processing system. More particularly still, the present invention is directed towards a method and apparatus that facilitates the direct accessing of a memory device by a high speed processor.
BACKGROUND OF THE INVENTION
In may applications, high performance processors are used to perform processing on packet oriented objects. An example of this is an Ethernet to ATM protocol bridge. In these situations, the processor is typically called on to examine and modify some wrapper that encompasses a data pay load, for example. The mechanics of data transfer are such that typically the data is deposited and subsequently moved out of main memory. Unfortunately, this is very unfavorable to microprocessor oriented processing as a high performance CPU is typically an order of magnitude faster than main memory. It is not unusual for this type of application for a processor to spend as much as 50 percent of its cycles waiting for memory access to be completed. Since most high performance microprocessors are isolated from main memory by a set of cache memories, it is possible to speed up performance of the application by manually (i.e., under program control) invalidating portions of the cache into which pertinent data from a new packet will be read in. While this technique can significantly improve performance, unfortunately, however, it is processor and cache architecture specific and thus requires a significant understanding of the underlying hardware by the programmer. Furthermore, optimized code (as would be required in this case) is almost never portable requiring, therefore, new optimization for every new product. Therefore, what is desired are improved platform independent techniques for accessing memory in a high speed processing environment.
Summary of the Invention
To achieve the foregoing, and in accordance with the purpose of the present invention, methods, apparatus for facilitating a direct memory access in a computing system. In one embodiment, a computing system having a central processing unit (CPU) and a main system memory, a DMA engine coupled to CPU performs a first DMA process that identifies a desired data set. Once identified, the DMA engine moves the identified desired data set from a main system memory to a memory segment that is temporally closer to the processor than is the main system memory.
In a preferred embodiment, the segment of memory is a scratch pad type memory that is incorporated into the processor.
In another embodiment, in a computing system having a central processing unit (CPU) and a main system memory arranged to store data, a method for moving a desired data set from the main system memory to a memory segment that is temporally closer to the CPU than is the main system memory is described. A plurality of data sets that includes the desired data set is distributed across a buffer pool included in the main system memory. The desired data set is then identified as being stored in an associated local buffer that is part of the buffer pool. The identified desired data set is fetched from the local buffer and moved from the local buffer to the memory segment that is temporally closer to the CPU than is the main system memory.
In a preferred embodiment, the segment of memory is a scratch pad type memory that is incorporated into the processor.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Fig. 1 shows a computing system in accordance with an embodiment of the invention; Fig. 2 illustrates the format of a data packet transmitted in a representative Ethernet network compliant with the IEEE 802.3 (1985) standard;
Fig. 3 shows a flowchart detailing a process that describes an operation of the computing system in accordance with an embodiment of the invention;
Fig. 4 shows a flowchart detailing a particular implementation of the identification shown in Fig. 3 in accordance with an embodiment of the invention; and Fig. 5 illustrates a typical, general-purpose computer system suitable lor implementing the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Reference will now be made in detail to a preferred embodiment of the invention. An example of the preferred embodiment is illustrated in the accompanying drawings. While the invention will be described in conjunction with a preferred embodiment, it will be understood that it is not intended to limit the invention to one preferred embodiment. To the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
Substantial improvements in memory access can be achieved by using a hardware based DMA (direct memory access) that moves pertinent data either to a cache memory or some high speed scratch pad memory that is temporally closer to a processor than is main memory. When a cache memory is used, the processor supports an update operation of a snoop hit of a data write to memory by an alternate master. However, when the high speed scratch memory is used, the DMA performs a memory to memory transfer from main memory to the scratch pad memory. In either case, a special DMA channel is typically required as data packets accumulate in memory before they are processed by the CPU. This means that if data is placed in cache or high speed scratch pad as it is initially brought in from the I/O device, large amounts of cache or scratch pad are used up thus displacing other potentially valuable data or code. If sufficient number of packets are queued up it is possible to overflow the cache or scratch pad, thus potentially creating a net loss in performance. Therefore, in a preferred embodiment, the inventive DMA process performs a DMA fetch operation by fetching data into a ping-pong type buffer arrangement. In this arrangement, at least two scratch pads or cache segments are used in such a way that data is fetched into one buffer or segment of cache while the processor is operating on the other buffer or segment of cache.
In a preferred embodiment, pertinent data instances are defined by an offset from a packet start and a packet size. When the offset portion is divided by the buffer size, the integer portion indicates that number of buffers that must be traversed and the remainder indicates the offset into the buffer itself.
Turning now to Fig. 1, a computing system 100 is shown in accordance with an embodiment of the invention. The computing system 100 includes an I/O device 102 coupled by way of a DMA channel 104 to a main memory 106. The main memory 106 is typically formed of dynamic random access memory (DRAM) arranged to store data and code. The main memory 106 is, in turn, coupled to a DMA controller unit 108 that includes a DMA engine 110 configured as either hardware or software. The DMA controller unit 108 also includes several data buffers 112-1 through 112-n where only 110-1 and 110-2 are shown. In the described embodiment, each of the data buffers 112 are arranged to store a portion, or portions, of a data or data packet that has been pre-fetched as determined by the DMA engine 110. In a preferred embodiment, the buffers 112 are coupled in a ping pong type arrangement in order to prevent the accumulation of a large number of data packets in the buffers 112. In this way, the pipelining of the DMA channel 104 substantially improves overall processor throughput.
In the described embodiment, the pre-fetched data is then made available to a processor unit 114 having, in some embodiments, an L2 type cache 116 memory coupled thereto. In a preferred embodiment, the fetched data is stored in a memory segment 115 that is incorporated as part of the processor 114. It should be noted that when a cache memory is used, it must be able to update a copy of shared data from the bus when shared dirty data is broadcast to the bus.
Fig. 2 illustrates the format of a data packet transmitted in a representative Ethernet network compliant with the IEEE 802.3 (1985) standard. A packet generally includes a preamble 202 which is 8 bytes long. The last byte (or octet) in the preamble is a start frame delimiter (not shown). After the start frame delimiter, a destination address (DA) 204 which is 6 bytes is used to identify the node that is to receive the Ethernet packet. Following DA 204, is a source address (SA) 206 which is 6 bytes long, SA 206 is used to identify the transmitting node directly on the transmitted packet. After the SA 206, a length/type field (L/T) 208 (typically 2 bytes) is generally used to indicate the length and type of the data field that follows. As is well known in the art, if a length is provided, the packet is classified as an 802.3 packet, and if the type field is provided, the packet is classified as an Ethernet packet.
The following data field is identified as LLC data 210 since the data field also includes information that may have been encoded by an LLC layer described below. A pad 212 is also shown following LLC data 210. As is well known in the art, if a given Ethernet packet is less than 64 bytes, most media access controllers add a padding of l's and 0's following LLC data 210 in order to increase the Ethernet packet size to at least 64 bytes. Once pad 212 is added, if necessary, a 4 byte cyclic redundancy check (CRC) field 214 is appended to the end of a packet in order to check for corrupted packets at a receiving end. As used herein, a "frame" should be understood to be a sub-portion of data contained within a packet.
Typically, in an Ethernet type network, the processor 114 would only be interested in a portion, or portion, of the Ethernet frame 200 (such the source address data field SA 206 and/or the destination address field DA 204). In these cases, the DMA engine 110 would parse a particular Ethernet frame and stored the parsed result in any number of arbitrary buffers included in the buffer pool 118 that are logically linked by, for example, an associated set of descriptors. Although not shown, another well known packetized data format referred to as TCP (Transmission Control Protocol) is a method (protocol) used along with the Internet Protocol (IP) to send data in the form of message units between computers over the Internet. While IP takes care of handling the actual delivery of the data, TCP takes care of keeping track of the individual units of data (called packets) that a message is divided into for efficient routing through the Internet. For example, when an HTML file is sent to a client from a Web server, the Transmission Control Protocol (TCP) program layer in that server divides the file into one or more packets, numbers the packets, and then forwards them individually to the IP program layer. Although each packet has the same destination IP address, it may get routed differently through the network. At the other end (the client program), TCP reassembles the individual packets and waits until they have arrived to forward them to the client. TCP is known as a connection-oriented protocol, which means that a connection is established and maintained until such time as the message or messages to be exchanged by the application programs at each end have been exchanged. TCP is responsible for ensuring that a message is divided into the packets that IP manages and for reassembling the packets back into the complete message at the other end.
As with an Ethernet frame, a data packet with associated with the TCP program layer would have associated with it a TCP header that includes all information related to, for example, source and destination addresses.
In a similar manner as with the Ethernet frame 200, in a TCP based communication system, since the processor 114 may only be interested in a small portion of the TCP header (the destination address, for example) the DMA engine 110 breaks the data up into several portions, only some of which may be pertinent to the current processor task. Since data packets can be relatively large, the DMA engine 110 distributes incoming data (in some embodiments based upon selecting pertinent portions of the data to be processed by the processor 114) into a series of buffers that in the described embodiment take the form of a buffer pool 118 that are logically defined by a particular memory management scheme. Since packets may thus span several buffers and so pertinent data from a wrapper may be located in any of a number of arbitrary buffers. In order to fetch a data packet, the buffers must be logically linked. The linking of buffers can be performed in any number of ways, such as in a descriptor ring, a linked list of descriptors, or simply as a linked list of buffers. During the prefetch operation, the DMA engine must therefore be capable of "walking down" the list to the location of the pertinent data and then transfer only this data. If multiple instances of data are required, the DMA engine must continue to the next instance, transfer it to the scratch pad buffer or cache segment and if so required, proceed to the next instance.
As directed by the DMA engine 110, data is pre-fetched from a particular one of buffers of the buffer pool 118 as required by the processor 114 based upon a descriptor which links the particular buffer to the data portion stored therein and stored in the buffer 112-2 where it is made available to the processor 114. By linking each of the buffers of the buffer pool 118, the DMA engine 110 can use the pointer to determine the next buffer for which data must be pre-fetched and stored in the buffer 112-1 which is made available to the processor 114 in a ping pong style management scheme. In some embodiments, it is possible to provide the desired data to the processor cache instead of a scratch pad memory. Such a scheme is possible if the processor cache supports the replacement of cache lines from a front side bus. As an example, a snooping protocol, such as one protocol known in the art as the "Illinois' snooping protocol, can be used. This particular protocol utilizes a local cache to update an entry from another cache or, as in the described embodiment, the DMA engine 110. In this implementation, a programmer would define a
"shared" memory region that would be initially induced by program fetches into the cache and then subsequently updated by the DMA engine 110. If the data were to be updated, it would have to be explicitly written into main memory or provided to another DMA process if that DMA process accepted coherency updates from the cache. In another embodiment, an I/O event causes the DMA channel 104 to bring data into the system memory 106 by distributing it over the buffer pool 120. Then the processor 114 starts a processing task on selected data, referred to as desired data, from a particular data set which is typically in the form of a data packet. The processor 114 then initiates another DMA channel (not shown) to fetch the desired data set (i.e., selected) from a next packet to be processed from main memory to a location in the memory segment 115 (in the form of either a scratch pad or a cache memory) such the fetch occurs in parallel with the processing of the current packet by the processor 114. In this way, the second DMA channel fetches the desired data into a designated location from the main memory 106 and then notifies the processor 114 when the fetching task is completed. Referring now to Fig. 3, a process 300 describing an operation of the computing system
100 is illustrated in accordance with an embodiment of the invention. The process 300 begins at 302 by the processor initiating a DMA process which in some embodiments can be include accessing data from a DMA register. At 304, a descriptor is retrieved for the current data packet after which at 305, appropriate data offsets associated with the desired data are calculated. Next, at 306, a local buffer is identified that contains the desired data. In one embodiment, the local buffer containing the desired data is identified by way of a "pointer walk" process described in further detail in Fig. 4. Once the desired local buffer has been identified, the data offset in the identified local buffer identified is calculated at 308 after which the desired data packet (or portion) is moved to a selected cache element at 310. In a preferred embodiment, the cache element is temporally closer to the processor than is, for example, a main memory. At 312, a determination is made whether or not the calculated offset is the last offset. If it is determined that the calculated offset is not the last offset, then control is passed back to 306 where the DMA process is instantiated. However, if it is determined that the calculated offset is the last offset, then the processor is notified that appropriate data is available to be retrieved from the cache element at 314.
Turning now to Fig. 4, shows a flowchart of a process 400 describing a particular implementation of the identification operation 306, and more particularly, the pointer walk process mentioned above, in accordance with a particular embodiment of the invention. It should be noted that although the identification operation described is directed at the pointer walk process carried that is typically carried out the by DMA engine 110, that any appropriate identification process can in fact be used with the invention.
In the described embodiment, the pointer walk process 400 begins at 402 by adding a current buffer size to a passed data accumulator (DPA). At 404, a determination is made whether or not the value of the DPA is greater than the desired data offset. If it is determined that the DPA value is greater than the desired data offset, then the pointer walk process is determined to be completed at 406 and the process 400 stops. Otherwise, control is passed to 408 where a determination is made whether or not the correct descriptor is the last descriptor in the descriptor chain. If it is determined that the correct descriptor is in fact the last descriptor in the chain, then control is passed to 410 where an error flag is thrown. Otherwise, control is passed to 412 where the next descriptor is fetched after which control is passed back to 402.
Fig. 5 illustrates a typical, general-purpose computer system 500 suitable for implementing the present invention. The computer system 500 includes any number of processors 502 (also referred to as central processing units, or CPUs) that are coupled to memory devices including primary storage devices 504 (typically a read only memory, or ROM) and primary storage devices 506 (typically a random access memory, or RAM).
Computer system 500 or, more specifically, CPUs 502, may be arranged to support a virtual machine, as will be appreciated by those skilled in the art. One example of a virtual machine that is supported on computer system 500 will be described below with reference to Fig. 5. As is well known in the art, ROM acts to transfer data and instructions uni- directionally to the CPUs 502, while RAM is used typically to transfer data and instructions in a bi-directional manner. CPUs 502 may generally include any number of processors. Both primary storage devices 504, 506 may include any suitable computer-readable media. A secondary storage medium 508, which is typically a mass memory device, is also coupled bi- directionally to CPUs 502 and provides additional data storage capacity. The mass memory device 508 is a computer-readable medium that may be used to store programs including computer code, data, and the like. Typically, mass memory device 508 is a storage medium such as a hard disk or a tape which generally slower than primary storage devices 504, 506. Mass memory storage device 508 may take the form of a magnetic or paper tape reader or some other well-known device. It will be appreciated that the information retained within the mass memory device 508, may, in appropriate cases, be incorporated in standard fashion as part of RAM 506 as virtual memory. A specific primary storage device 504 such as a CD- ROM may also pass data uni-directionally to the CPUs 502. CPUs 502 are also coupled to one or more input/output devices 510 that may include, but are not limited to, devices such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers. Finally, CPUs 502 optionally may be coupled to a computer or telecommunications network, e.g., an Internet network or an intranet network, using a network connection as shown generally at 512. With such a network connection, it is contemplated that the CPUs 502 might receive information from the network, or might output information to the network in the course of performing the above-described method steps. Such information, which is often represented as a sequence of instructions to be executed using CPUs 502, may be received from and outputted to the network, for example, in the form of a computer data signal embodied in a carrier wave. The above-described devices and materials will be familiar to those of skill in the computer hardware and software arts.
Although only a few embodiments of the present invention have been described, it should be understood that the present invention may be embodied in many other specific forms without departing from the spirit or the scope of the present invention. By way of example, operations involved with performing the inventive DMA process may be reordered. Operations may also be removed or added without departing from the spirit or the scope of the present invention.

Claims

What is claimed is: CLAIMS:
1. In a computing system having a central processing unit (CPU) arranged to perform executable instructions, a direct memory access (DMA) engine coupled to CPU that performs a first DMA process arranged to fetch a first desired data set from a main system memory to a memory segment that is temporally closer to the processor than is the main system memory substantially simultaneously with the processing of a second desired data set by the CPU.
2. A DMA engine as recited in claim 1 , wherein the DMA engine is coupled to a buffer pool that contains data that is distributed across the buffer pool, wherein the distributed data includes the desired data set.
3. A DMA engine as recited in claim 2, wherein the buffer pool is included in the main system memory.
4. A DMA engine as recited in claim 3, wherein the distributed data is transferred from an I/O device to the buffer pool by a second DMA process.
5. A DMA engine as recited in claim 4, wherein the second DMA process is performed by the DMA engine.
6. A DMA engine as recited in claim 5, wherein the data takes the form of a data packet wherein the data packet includes embedded information.
7. A DMA engine as recited in claim 6, wherein the embedded information is used to identify the desired data.
8. A DMA engine as recited in claim 7, wherein the data packet is an ATM data packet and wherein the embedded information is an ATM header.
9. A DMA engine as recited in claim 8, wherein the first process
10. A DMA engine as recited in claim 6, wherein first DMA process fetches the desired data substantially simultaneously with the CPU operating on a previously fetched data set.
11. A DMA engine as recited in claim 1 , wherein the DMA engine utilizes a shared memory snooping cache coherency protocol wherein shared data from a snooped bus is updated.
12. A DMA engine as recited in claim 11 , wherein the shared memory snooping cache coherency protocol is an Illinois protocol.
13. In a computing system having a central processing unit (CPU) and a main system memory arranged to store data, a method for moving a desired data set from the main system memory to a memory segment that is temporally closer to the CPU than is the main system memory, comprising: distributing a plurality of data sets to a buffer pool included in the main system memory, wherein the plurality of data sets includes the desired data set; identifying the desired data set that is stored in an associated local buffer, wherein the local buffer is included in the buffer pool; fetching the identified desired data set from the local buffer; and moving the identified desired data set from the local buffer to the memory segment.
14. A method as recited in claim 13, further comprising: operating on the moved desired set by the CPU; and fetching a second desired data set from a second local buffer substantially simultaneously with the operating.
15. A method as recited in claim 14, wherein the distributed plurality of data sets are transferred from an I/O device as directed by a DMA engine.
16. A method as recited in claim 15, wherein the data set is a data packet having embedded information that is used by the DMA engine to identify the desired data.
17. A method as recited in claim 16, wherein the embedded information is a header.
18. A method as recited in claim 17, wherein data packet is an ATM data packet.
EP00911873A 1999-02-22 2000-02-18 Methods and apparatus for facilitating direct memory access Withdrawn EP1163596A4 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12102299P 1999-02-22 1999-02-22
PCT/US2000/004247 WO2000051004A1 (en) 1999-02-22 2000-02-18 Methods and apparatus for facilitating direct memory access
US121022P 2008-12-09

Publications (2)

Publication Number Publication Date
EP1163596A1 true EP1163596A1 (en) 2001-12-19
EP1163596A4 EP1163596A4 (en) 2004-12-01

Family

ID=22393995

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00911873A Withdrawn EP1163596A4 (en) 1999-02-22 2000-02-18 Methods and apparatus for facilitating direct memory access

Country Status (6)

Country Link
EP (1) EP1163596A4 (en)
JP (1) JP2002538522A (en)
KR (1) KR20010102285A (en)
CN (1) CN1153153C (en)
AU (1) AU3369700A (en)
WO (1) WO2000051004A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8634415B2 (en) 2011-02-16 2014-01-21 Oracle International Corporation Method and system for routing network traffic for a blade server
US9489327B2 (en) 2013-11-05 2016-11-08 Oracle International Corporation System and method for supporting an efficient packet processing model in a network environment
EP3066568B1 (en) * 2013-11-05 2019-09-11 Oracle International Corporation System and method for supporting efficient packet processing model and optimized buffer utilization for packet processing in a network environment
KR20190123984A (en) 2018-04-25 2019-11-04 에스케이하이닉스 주식회사 Memory system and operating method thereof
CN112506437A (en) * 2020-12-10 2021-03-16 上海阵量智能科技有限公司 Chip, data moving method and electronic equipment
CN114691562A (en) * 2020-12-29 2022-07-01 中科寒武纪科技股份有限公司 Method, device and equipment for DMA operation, integrated circuit chip and board card

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4481578A (en) * 1982-05-21 1984-11-06 Pitney Bowes Inc. Direct memory access data transfer system for use with plural processors
US5175841A (en) * 1987-03-13 1992-12-29 Texas Instruments Incorporated Data processing device with multiple on-chip memory buses
US5524265A (en) * 1994-03-08 1996-06-04 Texas Instruments Incorporated Architecture of transfer processor

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4608631A (en) * 1982-09-03 1986-08-26 Sequoia Systems, Inc. Modular computer system
US5003465A (en) * 1988-06-27 1991-03-26 International Business Machines Corp. Method and apparatus for increasing system throughput via an input/output bus and enhancing address capability of a computer system during DMA read/write operations between a common memory and an input/output device
US5359723A (en) * 1991-12-16 1994-10-25 Intel Corporation Cache memory hierarchy having a large write through first level that allocates for CPU read misses only and a small write back second level that allocates for CPU write misses only
US5603050A (en) * 1995-03-03 1997-02-11 Compaq Computer Corporation Direct memory access controller having programmable timing
US5987590A (en) * 1996-04-02 1999-11-16 Texas Instruments Incorporated PC circuits, systems and methods
US5893153A (en) * 1996-08-02 1999-04-06 Sun Microsystems, Inc. Method and apparatus for preventing a race condition and maintaining cache coherency in a processor with integrated cache memory and input/output control

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4481578A (en) * 1982-05-21 1984-11-06 Pitney Bowes Inc. Direct memory access data transfer system for use with plural processors
US5175841A (en) * 1987-03-13 1992-12-29 Texas Instruments Incorporated Data processing device with multiple on-chip memory buses
US5524265A (en) * 1994-03-08 1996-06-04 Texas Instruments Incorporated Architecture of transfer processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO0051004A1 *

Also Published As

Publication number Publication date
EP1163596A4 (en) 2004-12-01
CN1153153C (en) 2004-06-09
WO2000051004A1 (en) 2000-08-31
JP2002538522A (en) 2002-11-12
KR20010102285A (en) 2001-11-15
CN1352773A (en) 2002-06-05
AU3369700A (en) 2000-09-14

Similar Documents

Publication Publication Date Title
US20180375782A1 (en) Data buffering
US10015117B2 (en) Header replication in accelerated TCP (transport control protocol) stack processing
US8843655B2 (en) Data transfer, synchronising applications, and low latency networks
US7668165B2 (en) Hardware-based multi-threading for packet processing
JP5066707B2 (en) TCP / IP offload device with reduced sequential processing
US6813653B2 (en) Method and apparatus for implementing PCI DMA speculative prefetching in a message passing queue oriented bus system
US7447230B2 (en) System for protocol processing engine
US5541920A (en) Method and apparatus for a delayed replace mechanism for a streaming packet modification engine
US7194517B2 (en) System and method for low overhead message passing between domains in a partitioned server
JPH0934818A (en) Method and apparatus for shortened waiting time data reception at inside of data-processing system using packet- based architecture
KR20040010789A (en) A software controlled content addressable memory in a general purpose execution datapath
US8161197B2 (en) Method and system for efficient buffer management for layer 2 (L2) through layer 5 (L5) network interface controller applications
US20040047361A1 (en) Method and system for TCP/IP using generic buffers for non-posting TCP applications
US7441179B2 (en) Determining a checksum from packet data
US7552232B2 (en) Speculative method and system for rapid data communications
EP1338965A2 (en) Data transfer
WO2000051004A1 (en) Methods and apparatus for facilitating direct memory access
JP2009301101A (en) Inter-processor communication system, processor, inter-processor communication method and communication method
US7284075B2 (en) Inbound packet placement in host memory
US7089367B1 (en) Reducing memory access latencies from a bus using pre-fetching and caching
WO1992015054A1 (en) Data transfer between a data storage subsystem and host system
Won et al. Reducing Data Transfer Time in User-Level Network Protocols.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010803

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB IE IT

A4 Supplementary search report drawn up and despatched

Effective date: 20041015

17Q First examination report despatched

Effective date: 20050214

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20050628