CN105005537A

CN105005537A - Management method for cache system of computer

Info

Publication number: CN105005537A
Application number: CN201510497721.2A
Authority: CN
Inventors: 邹阳; 王去非
Original assignee: GUANGZHOU YOUBEIDA INFORMATION TECHNOLOGY Co Ltd
Current assignee: GUANGZHOU YOUBEIDA INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-08-13
Filing date: 2015-08-13
Publication date: 2015-10-28

Abstract

The invention discloses a management method for a cache system of a computer. The management method is characterized in that the cache is composed of a plurality of cache lines, wherein each cache line comprises a plurality of data words; at the same time each cache line is divided into a plurality of subsets according to the address; each subset corresponds to one or the plurality of data words; each subset is provided with one or more local Sub-block identification bits; when operations like cache query and cache fill adopt granularity of the address corresponding to the subsets of the cache lines, the state and historical information of the corresponding subset of the cache lines can be recorded according to the address granularity corresponding to the subset of the cache lines, and the state and historical information can be stored in the local Sub-block identification bits corresponding to the subset. According to the management method for the cache system of the computer, provided by the invention, the prefetching ability of the CPU system for commands and data can be ensured, before the commands and the data are really used, commands and data are fetched back to the CPU from the cache or other storing mechanisms according to the request sent in advance, and the operating rate is obviously improved.

Description

A kind of management method of Computer Cache system

The application is original applying number is 201210464057.8, and the applying date is 2012.11.16, and invention and created name is the divisional application of the patent of invention of " a kind of management method of Computer Cache system ".

Technical field

The present invention relates to a kind of management algorithm of Computer Cache system, specifically refer to a kind of management method of cpu cache system.

Background technology

At present, computer system accesses such as internal memory and other rudimentary memory devices (as hard disk and the network equipment) delay that Shi Junyou is very large.For access memory, after the access command that CPU sends data and instruction, the time of offer 100 nanosecond just can obtain data, and this is equivalent to the time that core cpu performs the instruction of hundreds of bar.Because the use of cpu system to instruction and data has certain rule, therefore according to these rules, we just can design the instruction and data that various means conjecture CPU will use, and with for subsequent use in these contents to CPU of looking ahead in advance.Like this when CPU wants actual these instruction and datas of use, do not need to wait for, these instruction and datas can be obtained immediately.Therefore, look ahead (Prefetch) be a kind of means of the average access latency that effectively can reduce CPU access memory and other rudimentary memory devices.

But the effect of looking ahead in actual applications depends on two conditions: the first, the accuracy of looking ahead, in time whether whether the data of also namely looking ahead and instruction, can be used practically; The second, the instruction and data of looking ahead is to the exclusion souring of useful instruction and data existing in cpu cache.The average retardation of internal storage access effectively can be reduced although look ahead, but the content of looking ahead can replace existing useful instruction and data in cpu cache, and in the cpu cache that these prefetched contents replace, existing useful content will be expended time in read in CPU in the future again.Dealing with improperly therefore to prefetching content, can increase cpu cache error, increases CPU to the access times of internal memory, infringement performance.

Summary of the invention

The object of the invention is to overcome current CPU when looking ahead, in cpu cache, existing useful data can prefetched content replace, thus the defect of buffer memory error, reduction performance can be increased, a kind of management method of a kind of Computer Cache system that can effectively address the aforementioned drawbacks is provided.

The present invention is achieved through the following technical solutions: a kind of management method of Computer Cache system, this buffer memory is made up of multiple cache lines, and each cache lines comprises multiple data word, each cache lines is divided into multiple subset by address simultaneously, each subset one or more data word corresponding; Each subset arranges one or more local Sub-block flag; When the granularity of the address that the operation such as caching query and filling adopts cache lines subset corresponding, according to state and the historical information of address granularity record buffer memory line respective subset corresponding to cache lines subset, and information is kept in local Sub-block flag corresponding to this subset.

Further, each cache lines subset arranges a local and uses Sub-block Used flag; Whole cache lines arranges one or more overall identification position, and its management process is as follows:

When cache lines first time loads buffer memory, except the local of the subset corresponding to the address accessed uses Sub-block Used home position 1, the local of other subsets uses Sub-block Used home position 0;

When cache lines hits in the buffer, if the local of the subset corresponding to address of hit uses Sub-block Used, flag is 0, then put 1; If the local of the subset corresponding to address of hit uses Sub-block Used, flag has been 1, then change overall identification position.

For guaranteeing result of use, have an overall situation hit Global Hit flag in each cache lines, its management process is as follows:

When cache lines first time loads buffer memory, overall situation hit Global Hit home position is 0, except the local of the subset corresponding to the address accessed uses Sub-block Used home position 1, the local of other subsets uses Sub-block Used home position 0;

When cache lines hits in the buffer, if the local of the subset corresponding to address of hit uses Sub-block Used, flag is 0, then put 1; If the local of the subset corresponding to address of hit uses Sub-block Used, flag has been 1, then putting overall situation hit Global Hit flag is 1;

During replacement, first replacing overall situation hit Global Hit flag is the cache lines of 0, and rear replacement overall situation hit Global Hit flag is the cache lines of 1.

The present invention comparatively prior art compares, and has the following advantages and beneficial effect:

(1) the present invention can guarantee that cpu system tool carries out looking ahead of instruction and data effectively, can before instruction and data be truly used, instruction and data gets back to CPU from internal memory or other storing mechanisms by the request of sending in advance, average access latency can be reduced significantly, thus improve arithmetic speed.

(2) instruction and data of looking ahead in cpu system of the present invention both can deposit in separately a block cache district, and also can deposit in same buffer memory with the instruction and data of non-prefetched, usable range is wider.

(3) cache replacement algorithm of the present invention's employing, can guarantee the stability that cpu system runs, and farthest reduces buffer memory error, internally deposits to minimum number and conduct interviews.

(4) the present invention also has the killing ability excessively preventing WLRU cache replacement algorithm, therefore can guarantee result of use of the present invention.

Accompanying drawing explanation

Fig. 1 is the CPU inner structure schematic diagram that the present invention relates to.

Fig. 2 is the storage organization schematic diagram of the cache lines in the embodiment of the present invention 1.

Fig. 3 is the process flow diagram selecting to be replaced cache lines when cache lines occurs to replace in the embodiment of the present invention 1.

Fig. 4 A and Fig. 4 B is the storage organization schematic diagram of the cache lines in the embodiment of the present invention 2.

Fig. 5 A and Fig. 5 B is the process flow diagram selecting to be replaced cache lines when cache lines occurs to replace in the embodiment of the present invention 2.

Fig. 6 is the storage organization schematic diagram of the cache lines in the embodiment of the present invention 3.

Fig. 7 is the process flow diagram selecting to be replaced cache lines when cache lines occurs to replace in the embodiment of the present invention 3.

Fig. 8 is that the instruction and data place cache lines of looking ahead that the present invention relates to is inserting a kind of process flow diagram of the process in buffer memory.

Fig. 9 A is that the present invention looks ahead internal storage access order the first dispatching method in Memory Controller Hub.

Fig. 9 B is that the present invention looks ahead the second dispatching method of internal storage access order in Memory Controller Hub.

Figure 10 is a kind of process flow diagram of the process of the buffer memory management method process " prefetch hit prefetch hit " that the present invention relates to.

Figure 11 A and Figure 11 B is the cache lines design Storage that one of the present invention prevents the buffer memory management method of " false hit " phenomenon.

Figure 12 is that one of the present invention prevents WLRU from replacing the design of algorithm " mistake is killed and wounded ".

Figure 13 is that another kind of the present invention prevents WLRU from replacing the design of algorithm " mistake is killed and wounded ".

Embodiment

Below in conjunction with specific embodiment, the present invention is set forth, but embodiments of the present invention are not limited thereto.

Embodiment 1

As shown in Figures 1 to 3, CPU individual chips 100 inside in cpu system of the present invention is integrated with core cpu 110, L2 cache 130, internal storage access controller MMU 140 and four main memory accesses.CPU topworks 116, first order instruction buffer 112 (i.e. L1-I Cache) and first order data buffer storage 114 (i.e. L1-D Cache) is built-in with in core cpu 110.L2 cache 130 directly and core cpu 110 carry out exchanges data, described four main memory accesses (i.e. main memory access 1, main memory access 2 154, main memory access 3 156 and main memory access 4 158) are then connected with internal storage access controller MMU 140, to accept its supervisory instruction.

Internal storage access controller MMU 140 exchanges data with the filling mechanism of the instruction and data of core cpu 110.What the first buffer memory of the CPU individual chips 100 in Fig. 1 adopted is the structure of the storage that instruction and data separates: instruction is deposited in first order instruction buffer 112, and deposit data is in first order data buffer storage 114.Cpu cache is positioned at and the memory block on core cpu 110 same chip, and the read-write of cpu cache postpones to be significantly less than the internal memory being positioned at CPU individual chips 100 outside, namely designed in Fig. 1 four memory modules 120 be independently connected with four main memory accesses, memory modules 122, memory modules 124 and memory modules 126.At present, cpu cache uses high-speed read-write circuit usually, and such as SRAM manufactures, and internal memory is manufactured by DRAM circuit.

Fig. 2 describes a kind of storage organization schematic diagram of cache lines, and this cache lines has memory block, TAG memory block 260, Data 270, and 5 flags.5 described flags are: V flag 210, H flag 220, A flag 230, D flag 240 and P flag 250.Wherein, V flag 210 represents cache lines is legal effective (Valid); Mistake that H flag 220 represents that cache lines is hit (Hit), when cache lines loads at first, H flag 220 is set to zero, if cache line hit, is then set to 1; A flag 230 identifies this cache lines and has been replaced algorithm assigns (Allocated); D flag 240 represents that the content of this cache lines is once by modified (Dirty), after being replaced out buffer memory, needs the content write memory after by change; P flag 250 represents Prefetch, if this flag is set to 1, what represent that this cache lines stores is the content of looking ahead.

For the instruction and data of non-prefetched, when it is inserted in cpu cache, the P flag 250 of respective cache line will be set to zero, to show and to store the difference of cache lines of the content of looking ahead.P flag 250 be the cache lines of 1 after hit, P flag 250 can be cleared.P flag 250 be the cache lines of 1 after first time hit, its hit H flag 220 can decline 1.

As shown in Figure 3, the cache lines of the band P flag of the present embodiment selects the process being replaced cache lines as follows when occurring to replace:

(310) system cloud gray model judge whether not containing the illegal cache lines of effective information; That then replacing V flag is the cache lines of zero, is used for depositing the cache lines newly inserted, and terminates process; No, then perform step (320);

(320) determine whether that H flag and P flag are the cache lines of zero simultaneously; That then replacement H flag and P flag are all the cache lines of zero, and terminate process; No, then perform step (330);

(330) determine whether that H flag is zero and the non-vanishing cache lines of P flag; That then replacing H flag is zero and P flag is the cache lines of 1, and terminates process; No, then perform step (340);

(340) determine whether that H flag is the cache lines of zero; That then replacing H flag is the cache lines of zero, and terminates process; No, then replacing H flag is the cache lines of 1, and terminates process.

Embodiment 2

As shown in Fig. 4 A, 5A, the every bar cache lines in the present embodiment has memory block, TAG memory block 450, Data 460, and 4 flags: V flag 410, H flag 420, A flag 430 and D flag 440.Wherein, V flag 410 represents cache lines is legal effective (Valid); Mistake that H flag 420 represents that cache lines is hit (Hit), when cache lines loads at first, H flag 420 is set to zero, if cache line hit, is then set to 1; A flag 430 identifies this cache lines and has been replaced algorithm assigns (Allocated), and this flag is replaced for reminding replacement algorithm not want duplicate allocation same cache lines; D flag 440 represents that the content of this cache lines is once by modified (Dirty), after being replaced out buffer memory, needs the content write memory after by change.

The present embodiment is compared with embodiment 1, and its difference is to look ahead not used for difference in the structure of this cache lines the P flag 250 of (Prefetch) content and instant (Demand Fetch) content.When hit H flag be the number of the cache lines of 1 reach certain threshold value or meet certain condition time, hitting H flag in buffer memory is that the hit H flag of the cache lines of 1 will be reset in whole or in part.

Fig. 4 B compares with Fig. 4 A, a Used flag that every bar cache lines is many, is called for short U flag 451.When cache lines first time manned buffer memory, U flag 451 puts 1.When replacing, U flag 451 is the priority of the cache lines of 1 is the cache lines of 0 higher than U flag 451.That is, under equal conditions, U flag 451 be 0 cache lines be replaced away prior to the cache lines that U flag 451 is 1.The number being set to the cache lines of 1 when U flag 451 exceedes certain threshold value, or after reaching certain condition, and in buffer memory, U flag 451 is that the U flag 451 of the cache lines of 1 will be reset in whole or in part.

As shown in Figure 5A, adopt the replacement process during storage organization of the cache lines of the present embodiment Fig. 4 A as follows:

(510) system cloud gray model judge the cache lines of V flag whether promising zero; Then replace the cache lines that this is illegal, and terminal procedure; No, then perform step (520);

(520) cache lines of H flag whether promising zero is judged; Then replace the cache lines that this is illegal, and terminate process; No, then replacing H flag is the cache lines of 1, and terminates process.

As shown in Figure 5 B, the replacement process when adopting the cache lines storage organization shown in Fig. 4 B is as follows:

(530) system cloud gray model judge the cache lines of V flag whether promising zero; Then replace this cache lines, and terminal procedure; No, then perform step (540);

(540) determine whether that H flag and U flag are all the cache lines of zero; Then replace this cache lines, and terminal procedure; No, then perform step (550);

(550) H flag has been determined whether to be zero, U flag has been the cache lines of 1; Then replace this cache lines, and terminal procedure; No, then perform step (560);

(560) replacing H flag is the cache lines of 1, terminal procedure.

Embodiment 3

As shown in Figure 6,7, the cache lines in the present embodiment has memory block, TAG memory block 670, Data 680, and 6 flags: V flag 610, H flag 620, A flag 630, D flag 640, P flag 650 and U flag 660.

Wherein, V flag 610 represents cache lines is legal effective (Valid); Mistake that H flag 620 represents that cache lines is hit (Hit), when cache lines loads at first, H flag 620 is set to zero, if cache line hit, is then set to 1; A flag 630 identifies this cache lines and has been replaced algorithm assigns (Allocated); D flag 640 represents that the content of this cache lines is once by modified (Dirty), after being replaced out buffer memory, needs the content write memory after by change; If P flag 650 1, represent that this cache lines is for (Prefetch) content of looking ahead, if this cache lines of null representation is instant (Demand Fetch) content; U flag 660, when cache lines is loaded into buffer memory for the first time, is set to 1, represents that this cache lines is fresh content.

The present embodiment, compared with embodiment 1, adds a U flag 660 in its cache lines.According to the different characteristic of the practical service environment of cpu system, in order to control the prefetch data residence time in the buffer, for (Prefetch) content of looking ahead, U flag 660 can be set to 1 or zero.

The replacement process of the buffer memory of the storage organization of the cache lines of the present embodiment is as follows:

(710) system cloud gray model judge whether that H flag, P flag and U flag are all the cache lines of zero; Then perform step (720) and replace the cache lines that H flag, P flag and U flag are all zero, and terminate process; No, then perform step (730);

(730) H flag and U flag has been judged whether to be all zero, P flag has been the cache lines of 1; That then performing step (740), to replace H flag and U flag be zero, and P flag is the cache lines of 1, and terminate process; No, then perform step (750);

(750) H flag and P flag has been judged whether to be all zero, U flag has been the cache lines of 1; Be, then perform step (760) to replace H flag and P flag be all zero, U flag be 1 cache lines, and terminate process; No, then perform step (770);

(770) H flag has been judged whether to be zero, P flag and U flag has been all the cache lines of 1; Be, then perform step (780) to replace H flag be zero, P flag and U flag be all 1 cache lines, and terminate process; No, then perform step (715);

(715) H flag has been judged whether to be 1, P flag and U flag has been all the cache lines of 0; Be, then perform step (725) to replace H flag be 1, P flag and U flag be all 0 cache lines, and terminate process; No, then perform step (735);

(735) H flag and P flag has been judged whether to be all 1, U flag has been the cache lines of 0; Be, then perform step (745) to replace H flag and P flag be all 1, U flag be 0 cache lines, and terminate process; No, then perform step (755);

(755) H flag and U flag has been judged whether to be all 1, P flag has been the cache lines of zero; Be, then perform step (765) to replace H flag and U flag be all 1, P flag be zero cache lines, and terminate process; No, then perform step (775);

(775) replace H flag, P flag and U flag are all the cache lines of 1, and terminate process.

The cache lines at the instruction and data place of looking ahead inserts process in buffer memory as shown in Figure 8, the cache lines of the instruction and data namely first obtained looking ahead by its address maps in corresponding group (Set) of cpu cache.If the Hit flag of the cache lines in this group is all set to 1, or the threshold value that the number that in group, Hit flag has been set to the cache lines of 1 presets more than, such as half or 3/4 cache lines be provided with Hit flag, the cache lines of the instruction and data obtained of then this time looking ahead is not inserted in buffer memory, is abandoned; Otherwise the cache lines that continuing looks ahead obtains insert operation.

Fig. 9 A is a kind of dispatching method of internal storage access order in Memory Controller Hub MMU of looking ahead, as shown in the figure, namely do you, when system brings into operation, first judge that access to content is looked ahead? then judge that whether main memory access that the address of looking ahead internal storage access order maps is as idle; No, then perform this internal storage access order terminate process of looking ahead.The main memory access that the address of internal storage access order maps if look ahead is for idle, then this time of looking ahead is looked ahead internal storage access order cancellation, otherwise just abandons internal storage access of this time looking ahead.

Fig. 9 B is the another kind of dispatching method of internal storage access order in Memory Controller Hub MMU of looking ahead, and as shown in the figure, namely arranging a waiting list at each main memory access, being mapped to the internal storage access order of this main memory access for depositing all interior way addresses.When this queue full, or reach the upper limit of setting, then internal storage access order cancellation of looking ahead, does not enter the waiting list of main memory access; Only have the waiting list when main memory access not reach the upper limit, prefetched command just enters the waiting list of main memory access.

Figure 10 describes the one design of the process of the buffer memory management method process " prefetch hit prefetch hit " that the present invention relates to.

Prefetch order of the looking ahead address of asking likely Already in cpu cache suffered, this situation is called as " prefetch hit prefetch hit ".In the design of Figure 10, the storage organization of cache lines adopts the design of Fig. 4.WLRU cache replacement algorithm processes to " prefetch hit Prefetch Hit " the hit hit that congruence is same as common internal storage access.As shown in Figure 10, in judgement 1010, if the address of prefetched command is hit in the buffer, then the H flag 420 arranging hit cache lines is 1, the operation when internal storage access order just like non-prefetched is hit in the buffer, and stops this prefetch operation.

Embodiment 4

In order to reduce the space expense of memory address mark (Address Tag), cpu cache often uses the corresponding multiple internal storage data word (Memory Words) of an address designation, and namely a cache lines has multiple internal storage data word.This generates " false hit (False Hit) " phenomenon." false hit " refers to and judges to create hit at cpu cache, but the address of the in fact CPU not same internal storage data word of repeated accesses veritably." false hit " Producing reason is because the cache lines of cpu cache is greater than CPU physical memory access granularity, the whether CPU same address of repeated accesses veritably so cpu cache cannot correctly judge.

" false hit " phenomenon also can occur in multistage cpu cache.In multistage cpu cache, if the cache line sizes of upper level buffer memory (L1Cache) (Cache Line Size) is less than the cache line sizes of next stage buffer memory (L2 Cache), the phenomenon of " false hit " also can be produced.Example as shown in Figure 11 A, the cache lines of first order buffer memory is 32 bytes, and the cache lines of second level buffer memory is 128 bytes.The size of second level cache lines is four times of first order cache lines.Because the cache lines of first order buffer memory is less than the second level, if first order buffer memory does not repeatedly access the address belonging to four 32 bytes of same second level cache lines from second level buffer memory, second level buffer memory cannot be distinguished, and can take for this cache lines and be hit three times in the buffer memory of the second level.We claim this phenomenon to be " false hit " of multi-level buffer.

Corresponding with " false hit ", real hit is upper level buffer memory in CPU or multi-level buffer repeated accesses same address or same address section really.Citing, in the memory address space of 32, memory address 0x123abc80 to memory address 0x123abcff belongs to the cache lines of the second level buffer memory of same 128 byte longs; And be that in the first buffer memory of 32 byte longs, memory address 0x123abcc0 then belongs to the first order buffer memory cache lines being different from 0x123abc80 at cache lines.If CPU is access memory address 0x123abc80 and memory address 0x123abcc0 successively, do not hit at first order buffer memory, but then have accessed same cache lines twice at second level buffer memory, being mistakenly considered is hit.

" false hit " phenomenon can make the replacement algorithm of cpu cache produce erroneous judgement, take for the cache contents that some cache lines are high values, thus mistakenly they are preserved the long period, waste rare spatial cache, more buffer memory is caused to slip up, the performance of infringement CPU.Cache line sizes is larger, and the phenomenon of " false hit " also occurs more frequently.

Figure 11 B describes the storage mode arrangement that one prevents the cache lines of the method for " false hit ".Adding four " local uses (Sub-block Used) " flags compared to Fig. 4, Figure 11 B, is SU0 1150, SU1 1151, SU2 1152, SU3 1153 respectively.The cache lines storage mode that Figure 11 B describes is corresponding with the example in Figure 11 A.Second level buffer memory adopts the cache lines of 128 bytes, and first order buffer memory adopts the cache lines of 32 bytes, and the cache line sizes of second level buffer memory is four times of the cache line size of first order buffer memory.Generally speaking, the cache lines of second level buffer memory is of a size of the N of the cache lines of first order buffer memory doubly, then should set N number of " local uses flag "." local uses flag " is divided into N number of little interval, local address (Sub-block) the cache lines of second level buffer memory by the size of the cache lines of first order buffer memory, and record the use history of cache lines in the interval, local address that this is less of second level buffer memory by a flag, therefore gain the name " local uses flag ".In the example of Figure 11 A, the cache lines of second level buffer memory is 128 bytes, is four times of cache line sizes 32 byte of first order buffer memory, therefore Figure 11 B is arranged four " local uses flag ", SU0 1150, SU1 1151, SU2 1152 and SU3 1153.In the example of Figure 11 B, suppose that memory address overall length is 32 (highest address bit is numbered 31, and lowest address bit number is 0), so the 6th and the 5th of memory address will be used for mapping corresponding " local uses flag ".If these two of memory address is 00, then corresponding SU0 1150; If be 01, then corresponding SU1 1151; If be 10, then corresponding SU2 1152; If be 11, then corresponding SU3 1153.

With " local uses flag ", SU0 1150, SU1 1151, SU2 1152, SU3 1153, corresponding, " the hit Hit flag " H 1120 in Figure 11 B, is called as " overall situation hit (GlobalHit) " flag herein.

When CPU accesses certain memory address, if produce buffer memory error, second level buffer memory will be loaded into cache lines corresponding to this memory address.When cache lines is loaded into, except " local uses flag " in interval, local address corresponding to this memory address is set to except one, other " local uses flag " and " overall situation hits flag " H 1120 is set to zero.In the example of Figure 11 A, if cause cache lines to be loaded into the interval that the memory address of accessing is in second 32 byte of the cache lines of the second level buffer memory of 128 byte longs, namely the 6th and the 5th of its address is 01, then SU1 1151 is set to one, and SU0 1150, SU2 1152, SU3 1153 is by zero setting; If the address of causing when cache lines is loaded into is in the interval of first 32 byte, namely the 6th and the 5th of its address is 00, then SU0 1150, is set to one, and SU1 1151, SU2 1152, SU3 1153 is by zero setting.When cache lines is loaded into, " overall situation hit flag " H 1120 is always set to zero.

After cache lines is loaded into, if the TAG field of the address of certain internal storage access is identical with the comparative result of TAG 1160 content of certain cache lines of second level buffer memory, just at this moment can't determine must be real hit.We also by the use history between the partial zones analyzed further corresponding to internal storage access address, just can determine whether it is real hit.If " local uses flag " in the corresponding topical interval of the cache lines of the second level buffer memory corresponding to the address of this internal storage access is zero, then putting this flag is one.

This is not once real hit, but " false hit ".In the example of fig. 11, suppose that the address accessed is positioned between 32 byte regions of the cache lines of second level buffer memory, then putting SU0 1150 is one; If address is positioned between second 32 byte regions, then putting SU1 1151 is one; If address is positioned at the interval of the 3rd 32 bytes, then putting SU2 1152 is one; If address is positioned between last 32 byte regions, then putting SU3 1153 is one.Change " local uses flag ", SU0 1150, SU1 1151, SU2 1152, SU3 1153, in any one, be set to the process of from zero, " overall situation hit flag ", H 1120, to zero be remained, unless certain " local use flag " is again accessed when being one, the situation of namely real hit.

" if local uses flag ", SU0 1150, SU1 1151, SU2 1152, SU3 1153, in certain be a fewly set to one, and the address section corresponding to this " local use flag " is dropped in the address of follow-up internal storage access, then this is real hit.Now, will put " overall situation hit flag " H 1120 is one.Be for the moment at setting " overall situation hit flag " H 1120, likely have some " locally to use flag " and still remain zero.

In Replacement Decision process, " overall situation hit flag " H 1120 is set to the cache lines of, and the cache lines not being set to compared to " overall situation hit flag " has higher priority to be retained in the buffer.Under the prerequisite that other situations are suitable, the cache lines being first zero by " overall situation hit flag " H 1120 is replaced out buffer memory by replacement algorithm.

The design of " overall situation " and " local uses " flag also can be used in lru algorithm alleviates " false hit " to lru algorithm harmful effect.Specific practice is, arranges " overall situation use (Global Used) " flag and multiple " local uses (Sub-block the Used) " overall situation of flag in order to record buffer memory line and the service condition of local address space.When cache lines loads, the overall situation uses flag to be set to 1, and in buffer memory use procedure from now on, is likely cleared." local uses " flag is when cache lines loads, and except just except the local of the local address space corresponding to accessed address use flag is set to 1, other local uses flag to be all set to 0.If certain local address space is accessed, then the local of putting corresponding to this local address space uses flag to be 1; When if the local that certain local address space is corresponding uses flag to be 1, accessed again, then put the overall situation and use flag to be 1.Replace time, the overall situation use flag be 1 cache lines have right of priority to be retained, the overall situation use flag be zero cache lines will first be replaced away.

" local uses flag " is utilized to look ahead.So-called " local use flag " records the use record in certain interval, local address in a larger cache lines.The information of " local use flag " can be used for triggering and look ahead (Pre-fetch).In the example of Figure 11 A, the size of cache lines is 128 bytes, has four " local uses flag ".If four " local use flag " are all set to one, represent that memory address space that this cache lines is corresponding is probably experiencing the access of order traversal.

Now, in order to reduce average memory access latencies, the order prefetched command to address near memory address corresponding to this cache lines can be sent.Suppose, if memory address corresponding to this cache lines is A, when this cache lines four " local uses flag " be all a period of time, just can send memory address A+k, (k is cache line size to the prefetched command of A+2k etc., and to the cache lines of 32 bytes, k is 32; For the cache lines of 128 bytes, k is 128) (note, k can be negative, and such as in storehouse, address is downward growth).

In some applied environment, can more " radical " look ahead.Need not wait until that " local use flag " is all one just send prefetched command.According to the feature of embody rule environment, a threshold value can be set, if the number that " local uses flag " is exceedes this threshold value, just send prefetched command.In the example of Figure 11 B, threshold value of looking ahead can be set as 2, if the cache lines corresponding to memory address A has two " local uses flag " to be one, just send the prefetched command of the memory address to A+128, A+256 etc.

The parameter of looking ahead can be associated with the record case of " local uses flag ".Such as, the length of looking ahead, how many internal memory bytes of namely looking ahead can be a function of the state of the zone bit of in " local use flag ".If in " local use flag " of a cache lines be one number many, can more " radical " look ahead, more byte of looking ahead.If in " local use flag " of a cache lines be one number few, then a little less byte of looking ahead.

Carry out order except utilizing " local uses flag " to look ahead, can also utilize the prefetched command of the delivering other types of " local use flag ", look ahead (Stride Prefetch) and based on look ahead (the Prefetch Based On History-Buffer) of history in such as interval." local uses flag " provides the good opportunity of a service condition about certain address section, can be used in various forecasting method.

Prevent WLRU from replacing the design of " cross and kill and wound " of algorithm.Replace algorithm compared to LRU, WLRU replaces algorithm can replace out buffer memory by the address that can not be used again quickly.This is why WLRU replaces algorithm has better performance for the application that data volume is large reason compared to LRU replacement algorithm.But for some application, particularly when the capacity of buffer memory is less, new cache contents was likely replaced out buffer memory at it by WLRU too quickly before again accessing, thus caused more buffer memory to slip up.This is that phenomenon " crosses and kill and wound (Over Killing) " to WLRU replacement algorithm.

Replacing in algorithm at WLRU, the cache lines be hit, namely hit the cache lines that Hit flag is set to 1, having higher priority when replacing than the cache lines (its hit Hit flag is 0) just loading buffer memory.Hit the number that Hit flag is the cache lines of 1 in restriction buffer memory, the probability that the new cache lines loaded can be made to be replaced reduces, and enables the new cache lines loaded stop the longer time, thus alleviates " cross and kill and wound " phenomenon that WLRU replaces algorithm.

Figure 12 describes a kind of WLRU of preventing replacement algorithm and crosses the design killed and wounded.The example of Figure 12 is relevant (the 16 way set associative) buffer memorys of 16 tunnel groups, each group totally 16 cache lines.This design use counter 1210 is monitored in buffer memory group and is hit the number that Hit zone bit has put the cache lines of 1.When the value of counter 1210 exceedes a certain threshold value, such as 13, then reset the hit Hit flag of cache lines.Can be the hit Hit flag all resetting all cache lines, also only can reset the hit Hit flag of part of cache line.Counter 1210 also can be monitored in buffer memory group and hit the cache lines number that Hit flag does not put 1, if the number not putting the cache lines of 1 is lower than threshold value, and such as 3, then the hit Hit flag of cache lines all or part of in buffer memory group is reset.This threshold value can be preset or dynamically set according to the feature of application program.

In order to reduce the adding circuit difficulty of counter, in the illustration in fig 12, also can 16 two or more subsets of cache line division, such as cache lines 0 is a subset to cache lines 7, cache lines 8 is a subset to cache lines 15, and the hit Hit flag respectively in subset of computations is the number of the cache lines of 1.If the counter of subset is more than a threshold value, then reset the hit Hit flag of cache lines all or part of in subset.

Adopt the approximate data of combinational logic circuit calculate hit Hit flag be 1 cache lines number can reduce the complexity of circuit further and improve the computing velocity of circuit.When the output of combinational logic circuit is 1, start the hit Hit flag removing cache lines.Hit Hit flag in the buffer memory calculated by combinational logic circuit be 1 cache lines number be approximate, out of true, but its result of calculation can be tolerated.

Figure 13 is another circuit design of preventing WLRU replacement algorithm from " crossing kill and wound " more simple and quick than Figure 11.In Figure 13, without the need to calculating the number that the hit Hit flag of cache lines is 1 or 0, but " with (And) door " logic is used to replace adding circuit simply.One " with door " is connected the hit Hit flag of any one group of cache lines, if the hit Hit flag of this group cache lines is all 1, then " with door " exports is 1, now resets the hit Hit flag of this group cache lines in whole or in part.In example in Figure 13, for the purpose of simple, the hit Hit flag of every four cache lines is connected with one " with door ".The hit Hit flag of cache lines 0 to 3 is connected with " with door " 1320, and the hit Hit flag of cache lines 12 to 15 is connected with " with door " 1330.For " with door " 1320, if the hit Hit flag of cache lines 0 to 3 is all 1, then " with door " 1320 exports is 1, now, resets all or part of hit Hit flag of cache lines 0 to 3.

In the design of Figure 13, the number with door can also be increased, such as, increase by 4 and door: cache lines 0,4,8,12, is connected to one and door; Cache lines 1,5,9,13 are connected to second and door; Cache lines 2,6,10,14 are connected to the 3rd and door; Cache lines 3,7,11,15 are connected to the 4th and door.When the output of these and door is 1, clear all or the hit Hit flag of partial buffering line.In general, according to the different characteristic of application program, the setting of removing the combinational logic condition of hit Hit flag can find balance between accurate and circuit complexity.

In above process, reset the hit Hit flag of cache lines, all can reset, also only can reset a part, such as reset the cache lines of wherein half.During zero out portions, a pseudorandom pointer can be adopted to determine the cache lines that will reset.If reset the cache lines of half, pointer only needs 1 bit wide.When the value of pointer is zero, reset the hit Hit flag of the cache lines that lower half is divided; When the value of pointer is 1, reset the hit Hit flag of numbering at the cache lines of height half part.In the illustration in fig 12, if each cache lines resetting half, when pointer is 0, reset cache lines 0 to 7, when pointer is 1, reset cache lines 8 to 15.Often do and once reset after action, the value upset of pointer, become 1 or become 0 from 1 from 0.

Anti-" cross and kill and wound " design that WLRU replaces algorithm can ensure that new cache contents can have the suitable residence time in the buffer by this period of residence time, WLRU replaces the value that algorithm can judge cache contents effectively, thus retain the cache contents of high value, replace unworthy cache contents as soon as possible.The parameter preventing " cross and kill and wound " to design can be pre-set, also can be the feature dynamic-configuration according to application program.

As mentioned above, just the present invention can be realized preferably.

Claims

1. a management method for Computer Cache system, is characterized in that: this buffer memory is made up of multiple cache lines, and each cache lines comprises multiple data word, and each cache lines is divided into multiple subset by address simultaneously, each subset one or more data word corresponding; Each subset arranges one or more local Sub-block flag; When the granularity of the address that the operation such as caching query and filling adopts cache lines subset corresponding, according to state and the historical information of address granularity record buffer memory line respective subset corresponding to cache lines subset, and information is kept in local Sub-block flag corresponding to this subset;

Meanwhile, each cache lines subset arranges a local and uses Sub-block Used flag; Whole cache lines arranges one or more overall identification position, and its management process is as follows:

2. the management method of a kind of Computer Cache system according to claim 1, is characterized in that, have an overall situation hit Global Hit flag in each cache lines, its management process is as follows: