CN103345429A - High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU - Google Patents

High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU Download PDF

Info

Publication number
CN103345429A
CN103345429A CN2013102423985A CN201310242398A CN103345429A CN 103345429 A CN103345429 A CN 103345429A CN 2013102423985 A CN2013102423985 A CN 2013102423985A CN 201310242398 A CN201310242398 A CN 201310242398A CN 103345429 A CN103345429 A CN 103345429A
Authority
CN
China
Prior art keywords
memory access
cpu
read request
data
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102423985A
Other languages
Chinese (zh)
Other versions
CN103345429B (en
Inventor
刘垚
陈明扬
陈明宇
阮元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310242398.5A priority Critical patent/CN103345429B/en
Publication of CN103345429A publication Critical patent/CN103345429A/en
Application granted granted Critical
Publication of CN103345429B publication Critical patent/CN103345429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a high-concurrency access and storage accelerator and accelerating method based on an on-chip RAM, and a processor adopting the method. The access and storage accelerator is independent of an on-chip Cache and an MSHR and connected with the on-chip RAM and an internal storage controller, and an uncompleted access and storage request is sent to the internal storage controller and an internal storage system through the access and storage accelerator. Therefore, the problem that a universal processor is limited in concurrency access and storage number in Internet and cloud computing application is solved, and high-concurrency access and storage is accelerated.

Description

High concurrent memory access accelerated method, accelerator and CPU based on RAM on the sheet
Technical field
The invention belongs to computer realm, relate to the structural design of CPU inside, relate in particular to a kind of high concurrent memory access accelerated method, accelerator and CPU based on RAM on the sheet.
Background technology
Along with the development of internet and cloud computing, high concurrent data processor is more and more.The a large amount of concurrent load that this class method need handle to ask the form of (request) or operation (job) to submit to usually, the core business of these concurrent loads is usually directed to processing and the analysis of mass data.This class method uses multithreading or multi-process usually, has lower memory access to rely between thread or the process or do not have memory access to rely on.
Therefore this class is used and can be sent a large amount of concurrent memory access requests to memory system.This concurrency to the memory access system has proposed challenge.If the concurrency of memory access system is not high enough, will becomes this class and use the raising bottleneck of performance.
Figure 1 shows that typical CPU storage organization.When CPU need read data, arrive first among the Cache and search, if there has been desired data (Cache hit) among the Cache, then directly data are returned to CPU.If CPU does not find desired data (Cache miss) in Cache, CPU can arrive in the main memory (Main Memory) desired data is got back among the Cache.
One group of register MSHR(Miss Status Handling Registers is arranged among the Cache), be specifically designed to record be sent to the cache of internal memory miss the information of read request (being the request of Cache miss).The information of MSHR record comprises the destination register of Cache Line address, read request etc.After main memory was finished read request, returned the data of this Cache Line, the information of record just was used for filling corresponding Cache Line and return data to destination register.The read request of each Cache Miss will take of MSHR.After MSHR was taken, the read request of new Cache Miss will be stopped, and can not send to main memory.Therefore, the read request of not finishing of MSHR support (refer to that read request sends, but the data of read request is not returned also.This read request is not also finished, so also will record this request by MSHR) number, be one of key factor that determines memory access system concurrency.
At present, the number of not finishing read request of the MSHR of more typical processor support is generally less.Cortex-A9 processor for example, the MSHR of L2Cache only supports 10 uncompleted read requests.When application program is sent a large amount of concurrent memory access requests to memory system, and these request localities be when lower (therefore a large amount of Cache Miss can occur), and MSHR will be taken rapidly, becomes the bottleneck of total system.
Figure 2 shows that the storage architecture of certain type processor, this processor has proposed a kind of brand-new memory access mode, can support the transmission of a large amount of concurrent memory access requests in theory.
This processor is by 1 PPE(Power Processor Element), 8 SPE(Synergistic Processing Element, synergetic unit), 1 MIC(Memory Interface Controller, memory i/f controller), 1 EIB(Element Interconnect Bus, the cell interconnection bus) form.
Pay close attention to the memory access mechanism of synergetic unit SPE below.
Each SPE is a microprocessor, and its program may operate in the storage unit of local 256KB (RAM).When SPE need obtain data from main memory (Main Memory), need first initialization DMAC(Direct Memory Access Controller), parameters such as request source address, destination address and request length are write in the DMAC control formation.DMAC moves from main memory this locality stores from being about to data according to the parameter in the formation.
The number of the concurrent request that this mechanism is supported in theory only is subject to storable order number in the command dma formation, or the limited capacity of going up RAM.But this mechanism has two defectives:
1. because each dma operation needs some parameters of input earlier before starting, as source address, destination address, size of data, TAG mark and direction etc., this process will take several instruction cycles.When if SPE need concurrently read a large amount of small grain size data, the efficient of DMA transmission is lower;
2.DMA the efficient of condition managing is low.At first, program need be prepared enough spaces for the return data of read request, and this scheme lacks free-space administration mechanism, and local storage space utilization factor will reduce greatly behind the long-play; Secondly, processor obtains the mode that the DMA completion status has adopted the software polling mode bit, and when the number of memory access request increased, efficient was not high.
Summary of the invention
In order to solve the problems of the technologies described above, the objective of the invention is to propose a kind of high concurrent memory access accelerator based on RAM on the sheet and utilize the method that RAM manages a large amount of concurrent memory access requests on the sheet, solve general processor limited problem of concurrent memory access number in internet and cloud computing application, accelerate high concurrent memory access.
Specifically, the invention discloses a kind of high concurrent memory access accelerator based on RAM on the sheet, this memory access accelerator is independent of Cache and MSHR on the sheet, links to each other with Memory Controller Hub with RAM on the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
Described high concurrent memory access accelerator based on RAM on the sheet, the number that this memory access accelerator is supported waits to finish the memory access request only depends on the capacity of RAM on the sheet, is not subjected to the restriction of MSHR item number.
Described high concurrent memory access accelerator based on RAM on the sheet has a read request table in this addressable space, be used for depositing the information of read request, intrinsic id of each correspondence of this read request table number.
Described high concurrent memory access accelerator based on RAM on the sheet, each of this read request table has three territories, is used for depositing type, address and the data of this read request, and wherein type field and address field are inserted by CPU, and data field is inserted by this memory access accelerator.
Described high concurrent memory access accelerator based on RAM on the sheet, when the data field of this read request table is excessive, can a store data pointer, data pointer points to the storage address of return data, and the storage address of return data is distributed by CPU.
Described high concurrent memory access accelerator based on RAM on the sheet, each of this read request table is three kinds of states: idle, new read request and finished read request, original state is idle, when CPU has the memory access request, request is inserted, state becomes new read request, the memory access accelerator mails to Memory Controller Hub with this request, behind the return data data is inserted data field, and state becomes finishes read request, CPU fetches data from data field and handles, and the back state of finishing dealing with is got back to the free time.
Described high concurrent memory access accelerator based on RAM on the sheet, each round-robin queue comprises a head pointer and a tail pointer, the head pointer of idle queues and tail pointer and the head pointer of having finished formation are the variablees in the software, are responsible for maintenance by CPU; Head pointer, the tail pointer of new read request queue and the tail pointer of having finished formation are hardware registers, and the head pointer of new read request queue is responsible for maintenance by the memory access accelerator; The tail pointer of new read request queue safeguards jointly that by CPU and memory access accelerator CPU only writes, and the memory access accelerator is read-only; Finished the tail pointer of formation and safeguarded jointly that by CPU and memory access accelerator CPU is read-only, the memory access accelerator is only write.
The present invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprise a memory access accelerator that is independent of Cache and MSHR on the sheet is set, RAM links to each other with Memory Controller Hub on this memory access accelerator and the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
Described high concurrent access method based on RAM on the sheet, CPU writes the memory access request addressable space of RAM on the sheet on the sheet, the request of reading of this memory access accelerator is carried out, for read request, after pending data returns from memory system, this memory access accelerator is put into data this space and is notified CPU, and CPU is, and data are handled.
Described high concurrent access method based on RAM on the sheet has a read request table of preserving read request in this addressable space, be used for depositing the information of read request, intrinsic id of each correspondence of this read request table number.
Described high concurrent access method based on RAM on the sheet, each of this read request table has three territories, is used for depositing type, address and the data of this read request, and wherein type field and address field are inserted by CPU, and data field is inserted by this memory access accelerator.
Described high concurrent access method based on RAM on the sheet, when the data field of this read request table is excessive, can a store data pointer, data pointer points to the storage address of return data, and the storage address of return data is distributed by CPU.
Described high concurrent access method based on RAM on the sheet, each of this read request table is three kinds of states: idle, new read request and finished read request, original state is idle, when CPU has the memory access request, request is inserted, state becomes new read request, the memory access accelerator mails to Memory Controller Hub with this request, behind the return data data is inserted data field, and state becomes finishes read request, CPU fetches data from data field and handles, and the back state of finishing dealing with is got back to the free time.
The present invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprises that CPU initiates the step of read request:
Step S701, the idle queues state on the CPU query piece in the RAM addressable space judges whether idle queues is empty, CPU judges that idle queues for empty condition is: the head pointer of idle queues overlaps with tail pointer, if empty, then returns, if not empty then goes to step S702.
Step S702, CPU gets id from the idle queues head of the queue;
Step S703, CPU fill in type field and the address field of the read request list item corresponding with this id;
Step S704, CPU write this id the tail of the queue of new read request queue;
The new read request queue tail of the queue pointer that step S705, CPU will upgrade is passed to the memory access accelerator;
Step S706, CPU judge whether to continue to initiate read request, if, go to step S701, if not, return.
The present invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprises that CPU handles read request return data step:
Step S801, the state of formation has been finished in the CPU inquiry, judges that whether finished formation is empty (CPU judges that having finished formation is: the head pointer of having finished formation overlaps with tail pointer) for empty condition, if empty, then returns; If not empty then goes to step S802;
Step S802, CPU gets id from the head of the queue of finishing formation;
Step S803, the data field of the read request list item that the CPU operation is corresponding with this id;
Step S804, CPU write this id the tail of the queue of idle queues;
Step S805, CPU judge whether to continue operation, if, then go to step S801, if not, then return.
The present invention also discloses a kind of high concurrent access method based on RAM on the sheet, it is characterized in that, comprises the step of memory access accelerator processing read request:
Whether step S901, the memory access accelerator new request queue of inquiry in real time is empty, and if not empty then goes to step S902, if empty, then always in this step inquiry;
Step S902, the memory access accelerator is got id from the head of the queue of new read request queue;
Step S903, the memory access accelerator takes out type field and the address field of the read request list item corresponding with this id;
Step S904, the memory access accelerator is fetched data from internal memory, be written to the data field of the read request list item corresponding with this id;
Step S905, memory access accelerator write this id the tail of the queue of finishing formation.
The present invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprising:
Step 1, when CPU initiated write request, whether round-robin queue is write in inspection earlier full, if discontented, then insert type, the address of write request and write data;
Step 2, memory access accelerator detect writes round-robin queue's non-NULL, then automatically from writing type, address and the data that round-robin queue's head pointer reads write request;
Step 3, the memory access accelerator is issued Memory Controller Hub with write request.
The present invention also discloses a kind of processor that adopts each high concurrent access method among the claim 1-17 or high concurrent memory access device.
Technique effect of the present invention:
1, utilizes that RAM preserves one or more read request tables (Read table) on the sheet, each content of read request table comprises the necessary information of all read requests such as request type territory, destination address field and data field, RAM records all information of concurrent request on the sheet because the present invention uses, and the quantity of concurrent request only is subject to the size of RAM on the sheet.
2, each of read request table is divided into 3 classes by solicited status: idle class, newly ask class and finish class, and the entry address of each class request item uses respectively round-robin queue to store, be convenient to the state of read request is managed.The present invention uses a large amount of reading and writing solicited message of round-robin queue's management, avoids the mode bit of a plurality of requests of poll, and the number of times of inquiry significantly reduces, thereby a large amount of concurrent and incoherent small grain size memory access requests are had tangible acceleration effect.
3, the non-dummy status of the round-robin queue by " newly asking class " judges whether to initiate accessing operation, by reading the content of " newly asking class " round-robin queue, obtain read request RAM address stored on sheet, so, the memory access accelerator is out of order initiation accessing operation voluntarily, does not need software to control, thereby supports the out of order execution of memory access request, out of order returning, convenient to a large amount of memory access requests realizations scheduling targetedly.
4, the non-dummy status of CPU software by " finishing class " round-robin queue judged whether that read request finishes, CPU is by reading the content of " finishing class " round-robin queue, obtain read request return data address among the RAM on sheet, avoid the mode bit of a plurality of requests of CPU poll, improve the software search efficiency.
Description of drawings
Figure 1 shows that existing typical CPU storage organization;
Figure 2 shows that the storage architecture of certain type processor;
Figure 3 shows that the location drawing of memory access accelerator of the present invention on processor;
Figure 4 shows that the Read table in the addressable space among the present invention;
Figure 5 shows that among the Read table among the present invention the status change of each;
Figure 6 shows that the state that uses three round-robin queue's management read requests among the present invention;
Figure 7 shows that the step of CPU initiation read request among the present invention;
Figure 8 shows that the step of CPU processing read request return data among the present invention;
Figure 9 shows that the step of memory access accelerator processing read request among the present invention;
Figure 10 shows that a round-robin queue of the use management write request among the present invention.
Embodiment
The present invention is directed to the limited problem of the concurrent memory access request of general processor number, propose the concept of " memory access accelerator ".The memory access accelerator is another path between CPU and the internal memory Memory.
Figure 3 shows that the location drawing of memory access accelerator of the present invention on processor, it walks around high-speed cache Cache and MSHR, and the number of supporting of not finishing read request is Duoed at least one the order of magnitude than MSHR.Therefore, by the memory access accelerator, application program can mail to memory system with more memory access request, thereby improves the concurrency of memory access.Processor comprises CPU1, RAM3, memory access accelerator 4, Cache2, MSHR3, Memory Controller Hub 6, internal memory 7.
The memory access accelerator needs CPU to have addressable ram space on the sheet, and CPU writes this ram space with the memory access request, and the request of reading of memory access accelerator is carried out.If read request, after pending data returned from Memory, the memory access accelerator was put into data the space and is notified CPU, and the data of CPU are handled then.
Figure 4 shows that the read request table (Read table) in the addressable space among the present invention, a read request table of preserving read request will be arranged in the addressable space, be called Read table.
Intrinsic id of each correspondence of Read table number, can deposit the information of read request in the Read table item.Each has three territory: type, addr, data, is respectively applied to deposit type, address and the data of this read request.Whether the type territory is used for the additional information that coding needs, as the priority of the data length of request, request, be read request of a separation/polymerization (scatter/gather) type etc.Use the type territory, add auxiliary hardware, just can realize the unsupported senior memory access function of some current architectures.Type territory and addr territory are inserted by CPU, and the data territory is inserted by the memory access accelerator.
Among the Read table each can be divided into three kinds of states: idle, new read request (not sending to Memory Controller Hub as yet), the read request of having finished (Memory Controller Hub has returned the data of this request, and data have been inserted the data territory).
Figure 5 shows that among the Read table among the present invention the status change of each, for one among the Read table, original state is idle free; When CPU has the memory access request, request is inserted this, this state just becomes new read request new read; The memory access accelerator mails to Memory Controller Hub with this request, behind the return data data is inserted this data territory, and this state just becomes completeness request finished read; CPU fetches data from the data territory and handles, and the state of this item has just been got back to idle free after finishing dealing with.
In said process, have the problem of three keys to need to solve:
1.CPU how to find idle position among the Read table when sending request?
Does 2. how the memory access accelerator find the position of new request item?
3.CPU how to obtain the position that read request is returned item?
For this reason, the present invention proposes a kind of request management method based on round-robin queue.
Figure 6 shows that the state that uses three round-robin queue's management read requests among the present invention, three round-robin queues are: idle cycles formation (free entry queue), new read request round-robin queue (new read queue) and finish read request round-robin queue (finished read queue) is respectively applied to store among the Read table idle, the new id of read request item and completed read request item.These three round-robin queues are all in addressable space.Each formation has two pointers: the head of team and tail of the queue tail are respectively applied to indicate the position of queue heads and rear of queue.Among the figure, A returns, and CPU can determine to continue to initiate read request or can begin deal with data again.Before not returning, be not allow other operations to insert, after returning, just can initiate operation again.
Illustrating CPU uses the memory access accelerator to initiate the process of the operation of read request:
1. when CPU need read internal storage data, whether inquiry free entry queue was empty earlier.If empty, then the Read table in the key diagram 4 is taken fully, does not temporarily also have idle Read table item available; If not empty illustrates that then idle item is available in addition among the Read table.As shown in Figure 6, judge whether free entry queue is that empty condition is: pointer head1 overlaps with pointer tail1.
2.CPU from the head of the queue of free entry queue take out one id number, find the address of the Read table item of this id correspondence, Read table item is inserted in type and the addr territory of request newly.Simultaneously, CPU will be stored in the tail of the queue of new read queue for this id number.
The operating process of the round-robin queue of CPU is shown in dotted line among Fig. 61, and after this operation is finished, id3 will be moved to the position of tail2 from the head1 position.
3.CPU the tail2 pointer is moved one backward, and new tail2 pointer is sent to the memory access accelerator.
4. the memory access accelerator is by relatively head2 and tail2 pointer judge whether new read queue is empty.When the memory access accelerator detects new read queue non-NULL, then take out new id from new read queue head of the queue automatically, find uncompleted read request item corresponding among the Read table to handle by id, the data of asking are turned back in the data territory of Read table.After finishing dealing with, this id is write the tail of the queue of finished read queue.
Shown in dotted line among Fig. 62, after this operation is finished, id9 will be moved to the position of tail3 from the head2 position to the memory access accelerator to the operating process of round-robin queue.
5.CPU when needing deal with data, check earlier whether finished read queue is empty.Inspection method remains contrast head and tail pointer.If finished read queue non-NULL, then take out one id number, find the Read table item of this id correspondence, this data territory is handled.After finishing dealing with, this id is written to the tail of the queue of new read queue.
The operating process of the round-robin queue of CPU is shown in dotted line among Fig. 63, and after this operation is finished, id2 will be moved to the position of tail1 from the head3 position.
6. above process can repeat.
The processing of write request is simple relatively.Because write request is without return data, the memory access accelerator only needs that the write request that CPU give is passed to Memory Controller Hub and gets final product, so its management structure can be simplified greatly.
Figure 10 shows that a round-robin queue of the use management write request among the present invention, only need one of use to write round-robin queue (write queue) and can manage write request.Here formation is directly put in type, addr and the data territory of write request.The same with three formation queue of top, write queue also need be in addressable space.
The use-pattern of writing round-robin queue (write queue) is as follows:
During write request that CPU need send out new, checking write queue earlier, whether full (whether when sending out write request, needing to determine earlier also to have living space in the write queue can temporal data.Expire and just represent do not had the space on the ram, just can not send write request again.If discontented, then insert type, the address of write request and write data to the position of tail4 indication.
The memory access accelerator detects write queue non-NULL, and (explanation has data, illustrate that write request is not finished in addition, the memory access accelerator will ask take out to carry out automatically), then read type, addr and the data of write request automatically from the head4 pointer, write request is issued Memory Controller Hub.
To sum up, the invention discloses a kind of high concurrent memory access accelerator based on RAM on the sheet, this memory access accelerator is independent of Cache and MSHR on the sheet, links to each other with Memory Controller Hub with RAM on the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
This number of not finishing the memory access request that this memory access accelerator is supported only depends on the capacity of RAM on the sheet, is not subjected to the restriction of MSHR item number.RAM is the RAM with addressable space on this sheet, CPU writes this addressable space with the memory access request on the sheet, the request of reading of memory access accelerator is carried out, for read request, after pending data returns from memory system, this memory access accelerator is put into data this addressable space and is notified CPU, and the data of CPU are handled then.
RAM is the RAM of CPU on the sheet on this sheet, perhaps is independent of CPU on the sheet.
A read request table is arranged in this addressable space, be used for depositing the information of read request, intrinsic id of each correspondence of this read request table number.
Each of this read request table has three territories, is used for depositing type, address and the data of this read request, and wherein type field and address field are inserted by CPU, and data field is inserted by this memory access accelerator.
Each of this read request table is three kinds of states: idle, new read request and finished read request, original state is idle, when CPU has the memory access request, request is inserted, and state becomes new read request, and the memory access accelerator mails to Memory Controller Hub with this request, behind the return data data are inserted data field, state becomes finishes read request, and CPU fetches data from data field and handles, and the back state of finishing dealing with is got back to the free time.
These three kinds of states manage by three round-robin queues, and each round-robin queue comprises the position indicator pointer of queue heads and rear of queue.
The invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprise a memory access accelerator that is independent of Cache and MSHR on the sheet is set, RAM links to each other with Memory Controller Hub on this memory access accelerator and the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
Described high concurrent access method based on RAM on the sheet, CPU writes the memory access request addressable space of RAM on the sheet on the sheet, the request of reading of this memory access accelerator is carried out, for read request, after pending data returns from memory system, this memory access accelerator is put into data this space and is notified CPU, and CPU is, and data are handled.
The invention also discloses a kind of high concurrent access method based on RAM on the sheet, it is characterized in that, comprise that CPU initiates the step of read request:
Step S701, the idle queues state on the CPU query piece in the RAM addressable space judges whether idle queues is empty, if empty, then returns, and if not empty then goes to step S702.CPU judges that idle queues for empty condition is: the head pointer of idle queues overlaps with tail pointer.
Step S702, CPU gets id from the idle queues head of the queue;
Step S703, CPU fill in type field and the address field of the read request list item corresponding with this id;
Step S704, CPU write this id the tail of the queue of new read request queue;
The new read request queue tail of the queue pointer that step S705, CPU will upgrade is passed to the memory access accelerator;
Step S706, CPU judge whether to continue to initiate read request, if, go to step S701, if not, return.
The invention also discloses a kind of high concurrent access method based on RAM on the sheet, it is characterized in that, comprise that CPU handles read request return data step:
Step S801, the state of formation has been finished in the CPU inquiry, judges that whether finished formation is empty, if empty, then returns; If not empty then goes to step S802; The memory access accelerator judges that having finished formation for empty condition is: the head pointer of having finished formation overlaps with tail pointer.
Step S802, CPU gets id from the head of the queue of finishing formation;
Step S803, the data field of the read request list item that the CPU operation is corresponding with this id;
Step S804, CPU write this id the tail of the queue of idle queues;
Step S805, CPU judge whether to continue operation, if, then go to step S801, if not, then return.
The invention also discloses a kind of high concurrent access method based on RAM on the sheet, it is characterized in that, comprise the step of memory access accelerator processing read request:
Whether step S901, the memory access accelerator new request queue of inquiry in real time is empty, and if not empty then goes to step S902, if empty, then always in this step inquiry;
Step S902, the memory access accelerator is got id from the head of the queue of new read request queue;
Step S903, the memory access accelerator takes out type field and the address field of the read request list item corresponding with this id;
Step S904, the memory access accelerator is fetched data from internal memory, be written to the data field of the read request list item corresponding with this id;
Step S905, memory access accelerator write this id the tail of the queue of finishing formation.
Described high concurrent access method based on RAM on the sheet judges whether idle queues is that empty condition is: queue head pointer and rear of queue hands coincide.
This memory access accelerator can out of orderly send out of order returning to a large amount of concurrent read requests.
The invention also discloses a kind of high concurrent access method based on RAM on the sheet, comprising:
Step 1, when CPU initiated write request, whether round-robin queue is write in inspection earlier full, if discontented, then insert type, the address of write request and write data;
Step 2, memory access accelerator detect writes round-robin queue's non-NULL, then automatically from writing type, address and the data that round-robin queue's head pointer reads write request;
Step 3, the memory access accelerator is issued Memory Controller Hub with write request.
The invention also discloses a kind of processor that adopts above-mentioned access method and memory access device.
The present invention has following characteristics:
1, memory access granularity is flexible: the memory access granular information is coded in the type territory.The memory access granularity is not subjected to the restriction of instruction set and Cache Line.Each data of memory access all are that software is needed, and the effective rate of utilization of main memory bandwidth is improved.
2, can realize some senior memory access functions: by specify the memory access type in the type territory, the memory access accelerator is resolved and is carried out then, can realize such as senior accessing operations such as scatter/gather, chained list read-writes.
3, some upper layer information of type territory portability as thread number, priority etc., make the memory access accelerator can do some senior QoS scheduling.
4, addressable space uses SRAM could bring into play the effect of accelerator better.In this design, CPU and memory access accelerator need just can be finished a request several times to Read table, formation and queue pointer's read-write, so the read or write speed of addressable space must be enough fast, just can play accelerating effect.SRAM is more a lot of soon than DRAM access speed, is suitable for use in here.
Technique effect of the present invention:
1, utilizes that RAM preserves one or more read request tables (Read table) on the sheet, each content of read request table comprises the necessary information of all read requests such as request type territory, destination address field and data field, RAM records all information of concurrent request on the sheet because the present invention uses, and the quantity of concurrent request only is subject to the size of RAM on the sheet.
2, each of read request table is divided into 3 classes by solicited status: idle class, newly ask class and finish class, and the entry address of each class request item uses respectively round-robin queue to store, be convenient to the state of read request is managed.The present invention uses a large amount of reading and writing solicited message of round-robin queue's management, avoids the mode bit of a plurality of requests of poll, and the number of times of inquiry significantly reduces, thereby a large amount of concurrent and incoherent small grain size memory access requests are had tangible acceleration effect.
3, the non-dummy status of the round-robin queue by " newly asking class " judges whether to initiate accessing operation, by reading the content of " newly asking class " round-robin queue, obtain read request RAM address stored on sheet, so, the memory access accelerator is out of order initiation accessing operation voluntarily, does not need software to control, thereby supports the out of order execution of memory access request, out of order returning, convenient to a large amount of memory access requests realizations scheduling targetedly.
4, the non-dummy status of CPU software by " finishing class " round-robin queue judged whether that read request finishes, CPU is by reading the content of " finishing class " round-robin queue, obtain read request return data address among the RAM on sheet, avoid the mode bit of a plurality of requests of CPU poll, improve the software search efficiency.

Claims (18)

1. high concurrent memory access accelerator based on RAM on the sheet, it is characterized in that, this memory access accelerator is independent of Cache and MSHR on the sheet, links to each other with Memory Controller Hub with RAM on the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
2. according to claim 1 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that the number that this memory access accelerator is supported waits to finish the memory access request only depends on the capacity of RAM on the sheet, is not subjected to the restriction of MSHR item number.
3. according to claim 1 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that a read request table is arranged in this addressable space, be used for depositing the information of read request, intrinsic id of each correspondence of this read request table number.
As described in the claim 3 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that each of this read request table has three territories, be used for depositing type, address and the data of this read request, wherein type field and address field are inserted by CPU, and data field is inserted by this memory access accelerator.
As described in the claim 4 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that, when the data field of this read request table is excessive, can a store data pointer, data pointer points to the storage address of return data, and the storage address of return data is distributed by CPU.
As described in the claim 3 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that, each of this read request table is three kinds of states: idle, new read request and finished read request, original state is idle, when CPU has the memory access request, request is inserted, state becomes new read request, the memory access accelerator mails to Memory Controller Hub with this request, behind the return data data are inserted data field, state becomes finishes read request, and CPU fetches data from data field and handles, and the back state of finishing dealing with is got back to the free time.
As described in the claim 6 based on the high concurrent memory access accelerator of RAM on the sheet, it is characterized in that, each round-robin queue comprises a head pointer and a tail pointer, and the head pointer of idle queues and tail pointer and the head pointer of having finished formation are the variablees in the software, are responsible for maintenance by CPU; Head pointer, the tail pointer of new read request queue and the tail pointer of having finished formation are hardware registers, and the head pointer of new read request queue is responsible for maintenance by the memory access accelerator; The tail pointer of new read request queue safeguards jointly that by CPU and memory access accelerator CPU only writes, and the memory access accelerator is read-only; Finished the tail pointer of formation and safeguarded jointly that by CPU and memory access accelerator CPU is read-only, the memory access accelerator is only write.
8. high concurrent access method based on RAM on the sheet, it is characterized in that, comprise a memory access accelerator that is independent of Cache and MSHR on the sheet is set, RAM links to each other with Memory Controller Hub on this memory access accelerator and the sheet, does not finish the memory access request and mails to Memory Controller Hub to memory system by this memory access accelerator.
As described in the claim 8 based on the high concurrent access method of RAM on the sheet, it is characterized in that, CPU writes the memory access request addressable space of RAM on the sheet on the sheet, the request of reading of this memory access accelerator is carried out, for read request, after pending data returned from memory system, this memory access accelerator was put into data this space and is notified CPU, and CPU is, and data are handled.
As described in the claim 9 based on the high concurrent access method of RAM on the sheet, it is characterized in that a read request table of preserving read request is arranged in this addressable space, be used for depositing the information of read request, intrinsic id of each correspondence of this read request table number.
11. as described in the claim 10 based on the high concurrent access method of RAM on the sheet, it is characterized in that each of this read request table has three territories, be used for depositing type, address and the data of this read request, wherein type field and address field are inserted by CPU, and data field is inserted by this memory access accelerator.
12. as described in the claim 11 based on the high concurrent access method of RAM on the sheet, it is characterized in that, when the data field of this read request table is excessive, can a store data pointer, data pointer points to the storage address of return data, and the storage address of return data is distributed by CPU.
13. as described in the claim 10 based on the high concurrent access method of RAM on the sheet, it is characterized in that, each of this read request table is three kinds of states: idle, new read request and finished read request, original state is idle, when CPU has the memory access request, request is inserted, state becomes new read request, the memory access accelerator mails to Memory Controller Hub with this request, behind the return data data are inserted data field, state becomes finishes read request, and CPU fetches data from data field and handles, and the back state of finishing dealing with is got back to the free time.
14. the high concurrent access method based on RAM on the sheet is characterized in that, comprises that CPU initiates the step of read request:
Step S701, the idle queues state on the CPU query piece in the RAM addressable space judges whether idle queues is empty, CPU judges that idle queues for empty condition is: the head pointer of idle queues overlaps with tail pointer, if empty, then returns, if not empty then goes to step S702.
Step S702, CPU gets id from the idle queues head of the queue;
Step S703, CPU fill in type field and the address field of the read request list item corresponding with this id;
Step S704, CPU write this id the tail of the queue of new read request queue;
The new read request queue tail of the queue pointer that step S705, CPU will upgrade is passed to the memory access accelerator;
Step S706, CPU judge whether to continue to initiate read request, if, go to step S701, if not, return.
15. the high concurrent access method based on RAM on the sheet is characterized in that, comprises that CPU handles read request return data step:
Step S801, the state of formation has been finished in the CPU inquiry, judges whether be empty, CPU judges that having finished formation for empty condition is: the head pointer of having finished formation overlaps with tail pointer, if empty, then returns if having finished formation; If not empty then goes to step S802;
Step S802, CPU gets id from the head of the queue of finishing formation;
Step S803, the data field of the read request list item that the CPU operation is corresponding with this id;
Step S804, CPU write this id the tail of the queue of idle queues;
Step S805, CPU judge whether to continue operation, if, then go to step S801, if not, then return.
16. the high concurrent access method based on RAM on the sheet is characterized in that, comprises the step of memory access accelerator processing read request:
Whether step S901, the memory access accelerator new request queue of inquiry in real time is empty, and if not empty then goes to step S902, if empty, then always in this step inquiry;
Step S902, the memory access accelerator is got id from the head of the queue of new read request queue;
Step S903, the memory access accelerator takes out type field and the address field of the read request list item corresponding with this id;
Step S904, the memory access accelerator is fetched data from internal memory, be written to the data field of the read request list item corresponding with this id;
Step S905, memory access accelerator write this id the tail of the queue of finishing formation.
17. the high concurrent access method based on RAM on the sheet is characterized in that, comprising:
Step 1, when CPU initiated write request, whether round-robin queue is write in inspection earlier full, if discontented, then insert type, the address of write request and write data;
Step 2, memory access accelerator detect writes round-robin queue's non-NULL, then automatically from writing type, address and the data that round-robin queue's head pointer reads write request;
Step 3, the memory access accelerator is issued Memory Controller Hub with write request.
18. processor that adopts among the claim 1-17 each.
CN201310242398.5A 2013-06-19 2013-06-19 High concurrent memory access accelerated method, accelerator and CPU based on RAM on piece Active CN103345429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310242398.5A CN103345429B (en) 2013-06-19 2013-06-19 High concurrent memory access accelerated method, accelerator and CPU based on RAM on piece

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310242398.5A CN103345429B (en) 2013-06-19 2013-06-19 High concurrent memory access accelerated method, accelerator and CPU based on RAM on piece

Publications (2)

Publication Number Publication Date
CN103345429A true CN103345429A (en) 2013-10-09
CN103345429B CN103345429B (en) 2018-03-30

Family

ID=49280227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310242398.5A Active CN103345429B (en) 2013-06-19 2013-06-19 High concurrent memory access accelerated method, accelerator and CPU based on RAM on piece

Country Status (1)

Country Link
CN (1) CN103345429B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206229A1 (en) * 2013-06-28 2014-12-31 华为技术有限公司 Accelerator and data processing method
CN105354153A (en) * 2015-11-23 2016-02-24 浙江大学城市学院 Implement method for data exchange and cache of tightly-coupled heterogeneous multi-processor
WO2016134656A1 (en) * 2015-02-28 2016-09-01 华为技术有限公司 Method and device for allocating hardware acceleration instructions to memory controller
CN109086228A (en) * 2018-06-26 2018-12-25 深圳市安信智控科技有限公司 High-speed memory chip with multiple independent access channels
CN109582600A (en) * 2017-09-25 2019-04-05 华为技术有限公司 A kind of data processing method and device
CN110688238A (en) * 2019-09-09 2020-01-14 无锡江南计算技术研究所 Method and device for realizing queue of separated storage
CN115292236A (en) * 2022-09-30 2022-11-04 山东华翼微电子技术股份有限公司 Multi-core acceleration method and device based on high-speed interface

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5813031A (en) * 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US20040107240A1 (en) * 2002-12-02 2004-06-03 Globespan Virata Incorporated Method and system for intertask messaging between multiple processors
WO2005066796A1 (en) * 2003-12-22 2005-07-21 Matsushita Electric Industrial Co., Ltd. Cache memory and its controlling method
CN101221538A (en) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 System and method for implementing fast data search in caching
US20080222396A1 (en) * 2007-03-09 2008-09-11 Spracklen Lawrence A Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces
US7761682B2 (en) * 2006-02-07 2010-07-20 International Business Machines Corporation Memory controller operating in a system with a variable system clock
CN102073596A (en) * 2011-01-14 2011-05-25 东南大学 Method for managing reconfigurable on-chip unified memory aiming at instructions
US20130111175A1 (en) * 2011-10-31 2013-05-02 Jeffrey Clifford Mogul Methods and apparatus to control generation of memory access requests

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5813031A (en) * 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US20040107240A1 (en) * 2002-12-02 2004-06-03 Globespan Virata Incorporated Method and system for intertask messaging between multiple processors
WO2005066796A1 (en) * 2003-12-22 2005-07-21 Matsushita Electric Industrial Co., Ltd. Cache memory and its controlling method
US7761682B2 (en) * 2006-02-07 2010-07-20 International Business Machines Corporation Memory controller operating in a system with a variable system clock
US20080222396A1 (en) * 2007-03-09 2008-09-11 Spracklen Lawrence A Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces
CN101221538A (en) * 2008-01-24 2008-07-16 杭州华三通信技术有限公司 System and method for implementing fast data search in caching
CN102073596A (en) * 2011-01-14 2011-05-25 东南大学 Method for managing reconfigurable on-chip unified memory aiming at instructions
US20130111175A1 (en) * 2011-10-31 2013-05-02 Jeffrey Clifford Mogul Methods and apparatus to control generation of memory access requests

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206229A1 (en) * 2013-06-28 2014-12-31 华为技术有限公司 Accelerator and data processing method
CN105988952B (en) * 2015-02-28 2019-03-08 华为技术有限公司 The method and apparatus for distributing hardware-accelerated instruction for Memory Controller Hub
WO2016134656A1 (en) * 2015-02-28 2016-09-01 华为技术有限公司 Method and device for allocating hardware acceleration instructions to memory controller
CN105988952A (en) * 2015-02-28 2016-10-05 华为技术有限公司 Method and apparatus for assigning hardware acceleration instructions to memory controllers
CN105354153B (en) * 2015-11-23 2018-04-06 浙江大学城市学院 A kind of implementation method of close coupling heterogeneous multi-processor data exchange caching
CN105354153A (en) * 2015-11-23 2016-02-24 浙江大学城市学院 Implement method for data exchange and cache of tightly-coupled heterogeneous multi-processor
CN109582600A (en) * 2017-09-25 2019-04-05 华为技术有限公司 A kind of data processing method and device
CN109582600B (en) * 2017-09-25 2020-12-01 华为技术有限公司 Data processing method and device
CN109086228A (en) * 2018-06-26 2018-12-25 深圳市安信智控科技有限公司 High-speed memory chip with multiple independent access channels
CN109086228B (en) * 2018-06-26 2022-03-29 深圳市安信智控科技有限公司 High speed memory chip with multiple independent access channels
CN110688238A (en) * 2019-09-09 2020-01-14 无锡江南计算技术研究所 Method and device for realizing queue of separated storage
CN110688238B (en) * 2019-09-09 2021-05-07 无锡江南计算技术研究所 Method and device for realizing queue of separated storage
CN115292236A (en) * 2022-09-30 2022-11-04 山东华翼微电子技术股份有限公司 Multi-core acceleration method and device based on high-speed interface
CN115292236B (en) * 2022-09-30 2022-12-23 山东华翼微电子技术股份有限公司 Multi-core acceleration method and device based on high-speed interface

Also Published As

Publication number Publication date
CN103345429B (en) 2018-03-30

Similar Documents

Publication Publication Date Title
CN103345429A (en) High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU
CN101727414B (en) Technique for communicating interrupts in a computer system
CN107526546B (en) Spark distributed computing data processing method and system
CN100573446C (en) The technology of execute store disambiguation
US8086765B2 (en) Direct I/O device access by a virtual machine with memory managed using memory disaggregation
US9304920B2 (en) System and method for providing cache-aware lightweight producer consumer queues
CN102541468B (en) Dirty data write-back system in virtual environment
CN105095116A (en) Cache replacing method, cache controller and processor
CN103647807A (en) Information caching method, device and communication apparatus
WO2015066489A2 (en) Efficient implementations for mapreduce systems
CN105339908A (en) Methods and apparatus for supporting persistent memory
US11093410B2 (en) Cache management method, storage system and computer program product
US8583873B2 (en) Multiport data cache apparatus and method of controlling the same
CN101504594A (en) Data storage method and apparatus
CN102968386B (en) Data supply arrangement, buffer memory device and data supply method
CN110275840A (en) Distributed process on memory interface executes and file system
CN109716306A (en) Dynamic input/output coherence
CN109783012A (en) Reservoir and its controller based on flash memory
JP2021034052A (en) Memory system having different kinds of memory, computer system including the same, and data management method therefor
CN109716305A (en) Asynchronous cache operation
CN105190577A (en) Coalescing memory access requests
CN109144749A (en) A method of it is communicated between realizing multiprocessor using processor
CN109213425A (en) Atomic commands are handled in solid storage device using distributed caching
CN109983443A (en) Realize the technology of bifurcated nonvolatile memory fast-loop driver
US11899970B2 (en) Storage system and method to perform workload associated with a host

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant