CA1283218C - Variable address mode cache - Google Patents

Variable address mode cache

Info

Publication number
CA1283218C
CA1283218C CA000534687A CA534687A CA1283218C CA 1283218 C CA1283218 C CA 1283218C CA 000534687 A CA000534687 A CA 000534687A CA 534687 A CA534687 A CA 534687A CA 1283218 C CA1283218 C CA 1283218C
Authority
CA
Canada
Prior art keywords
cache
directory
entry
cpu
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA000534687A
Other languages
French (fr)
Inventor
James Gerald Brenza
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of CA1283218C publication Critical patent/CA1283218C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • G06F12/1063Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache the data cache being concurrently virtually addressed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels

Abstract

ABSTRACT

The disclosure provides a data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. L1 and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A
CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA
contains L1 cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e.
absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.

Description

This invention relates to data processing systems which contain a multilevel storage hierarchy, in which one or more levels contain a cache (i.e. high speed buffer) to speed up the access of data and/or instructions between a CPU and storage.

BACKGROUND

The prior art teaches data processing systems which contain a multilevel storage hierarchy having one or more caches, in which the cache in the lowest hierarchy level Ll is directly accessible (i.e. private) to a single CPU, in order to be in close proximity the CPU for fast access.
Each cache contains lines of data having a line length convenient to the respective cache, wherein the different caches may have different line lengths. The prior art also teaches having a second level (L2) cache which may have a line length that is a multiple of the line length in each entry in the lowest level cache (Ll).

In the prior art, main frame CPUs often include an instruction unit as a source of requested addresses, a translation lookaside buffer (TLB), an Ll cache and directory at its lowest hierarchy level, and an L2 cache and directory at its next hierarchy level.

Cache efficiency is important to system performance.
An important parameter for measuring cache efficiency is the average time duration from when a storage request address is available from the CPU instruction unit until the requested data is available to the instruction unit.
This duration is usually measured in numbers of machine cycles. Cache efficiency increases as this parameter decreases.

T?~ 35--~0~
~2~2~a Conventional systems may operate in the following manner: A requested storage address from the instruction unit may be real, absolute, or virtual. If virtual, the page address (containing the requested address) may have been previously translated by dynamic address translation (DAT) means in the system whlch put the page's real or absolute address in a TLB entry, which is now accessed in the TLB by the requested address to obtain the translated address. If no TLB entry contains the required translation, the requested virtual address is translated by DAT, which puts the translation into the TLB, from which it may be later accessed. Thereafter the requested virtual address only requires a TLB lookup and compare to obtain the corresponding translated real/virtual address from the TLB, until it is later replaced in the entry after a period of nonuse.

The DAT translates a virtual address to a real address, which is put into the TLB in a uniprocessor.

But if the CPU is in a multiprocessor, a prefix address is added to the translated real address to make it into an absolute address, and the virtual request's absolute address is then put into the TLB.

If the CPU requests a real address, no translation is done, but if the CPU is in a multiprocessor a prefi~
address is added to the requested real address to make it into an absolute address.

CPU requested real addresses have been handled in different ways by prior CPUs; some have put the real/absolute address in the TLB in the same manner as is done with virtual addresses, while others have used a bypass path around the TLB to the Ll cache for an access attempt in the cache, in order to avoid using TLB space for an address not requiring translation.

~" ~,5-0~7 æB~8 The DAT operation in the IBM Sys-tem/370 architecture uses a segment table descriptor (STD), comprised of a segment table origin (STO) and a segment ta~le length (STL).

In systems using multiple address spaces, a STO is part of each requested virtual address for identifying the virtual address space containing the requested virtual address. STOs (or STO identifiers) have previously been put in each TLB entry as part of the virtual address. The STO in the accessed TLB entry must be compared with the STO
provided with each requested virtual address in finding any TLB address translation. Thereafter only the translated address is used in accessing the requested data in the cache, and in main storage when needed. Some prior systems uses a STO identifier table to contain all recently used STOs and corresponding assigned STO identifiers that have fewer bits than the STO; and the STO identifier is put in the TLB instead of the STO to allow a smaller size TLB
circuit array, since smaller arrays allow faster access.

In the conventional cache directory, a set associative arrangement was provided, in which a row in the cache directory (called a "congruence class") was selected by each address provided by the instruction unit (whether real/absolute, or virtual). And each row comprised a set of entries (callecl bins or bin identifier) which were handled associatively, i.e. each congruence class was set associative. In this manner the directory row selection was being made before TLB address translation was completed, in order to obtain selection of a cache ccngruence class before the TLB translated address was available, which speeded up operation on the critical cache path in the CPU.

~ 07 ~a3~a In the conventional cache, only translated addresses are put into the cache directorv. That is, a real/absolute address representation is provided in each used cache directory entry. This real address was read out of each directory entry in the congruence class sel~cted by each instruction uniL requested address. The set of directory readout real addresses arrived at respective comparator circuits at about the same time that the TLB translated address arrived at these circuits, and a simultaneous comparison was made to find which, if any, of the plural addresses from the selected congruence class matched the translated requested address, i.e. this is the set associative comparison for the cache.

This prior operation resulted in requiring a TLB hit before a Ll cache hit could be obtained. If a TLB miss occurred, the Ll cache determination had to wait until the TLB miss operation was completed by a DAT operation, with the Ll cache operation being restarted after the DAT
operation for the current CPU request had put the new translation into the TLB. A TLB miss required a dynamic address translation (DAT), that may require two accesses of translation tables in main storage, which is relatively slow.

It is noted that known commercially used Ll cache directories do not contain virtual addresses. Their cache addresses are real/absolute addresses so they can be compared with TLB outputted real/absolute addresses.
Virtual address values cannot be compared with real/absolute address values, since a virtual address may be translated into any real page address available in main storage.

_~5-~07 ~8 Accordingly, the conventional L1 cache directory requires two serially occurring compare operations before a corresponding Ll directory address can be found to exist or not exist, i.e. L1 cache hit or miss. If an L1 hit occurs, the data ~usuall~ a double-word) is accessed in the L1 cache and it is sent to the CPU.

USA patent 4,495,575 has a single buffer corresponding to an Ll cache, which is not a private CPU cache because it is accessed by I/O channels as well as a CPU. Its cache directory entries each have "sum data" comprised of a space ID and a block address which are compared to a space ID and a block address in the virtual address in a register received from the CPU or channel. Upon a buffer miss, an address conversion table supplies a real address to MM
to obtain the data.

In all prior cache systems, a Ll cache miss requires an access of the requested data from the next higher level in the storage hierarchy, which commonlv has been main storage in large systems.

If a L2 level cache exists in the system, L2 is accessed instead of main storage to provide the requested data to both Ll and the CPU if L2 contains the data. If the L2 cache does not contain the requested data, main storage is accessed for it, with the access time for determining the L2 cache miss being added to the overall access time for the requested data. A real/absolute address is conventionally used to access the L2 cache directory, which requires the output of the TLB when a virtual address is being requested by the CPU.

In all prior caches, the occurrence of a TLB miss may occur independently of a Ll cache directory miss.
Fortunately most CPU requests (over 90~ hit in both the TLB and cache, which is the reason for the existence of the TLB and caches.

'`-Ss-i~7 ~ ~

~283~L8 A basic requirement of L2 caches is that ~hey must have a large size -to be effective, such as several times larger than the Ll cache. Hence L2 has the likelihood of containing data from many more pages in main storage than does Ll. However a fundamental problem may e~ist in that the TLB is not usually large enough to contain all the page translations representing the data existing in L2. The result is that even though a requested line of data may exist in the L2 cache, its TLB entry may have been replaced before the current request is made, so that a TLB miss results, and its related DAT operation must be completed for the TLB in such prior systems before the L2 cache can be accessed to obtain data already there.

In prior USA patent 4,464,712, the page-translating TLB entries correspond to page-size lines in the L2 cache, which has a L2 cache directory separate from the TLB (i.e.
DLAT). Absolute-addresses outputted by the TLB upon each TLB entry replacement operation locate and control the settings of replacement-candidate flag bits R in the L2 entries to control the LRU replacement selection of line entries in the L~ cache directory. This requires a TLB/L2 relationship in which the L2 cache has an L2 line size equal to the TLB controlled page size (e.g. 4096 bytes).

BRIEF SUMMARY OF THE INVENTION

It is the primary object of this invention to increase cache size privately available to a CPU while at the same time decreasing the critical path for cache accesses. The invention supports the CPU use of switchable mode addressing wherein the CPU can arbitrarily switch between virtual addressing and real storage addressing (such as occurs when using IBM*S/370 logical addressing by "program * Registered Trade Mark ~L283;2~3 status word" (PSw) that switches the "dvnamic address transla~ion" (DAT) state on and off.) ~he invention allows switchable mode addressing by the CPU to be easily handled by the cache.

It is a feature of this invention to provide an address mode indicator for each cache directory entry to indicate whether the entry represents a real/absolu-te address or a virtual address. This indicator enables the elimination of TLB operation from the critical CPU-to-cache access path even though the CPU uses switchable mode logical addressing.

It is a another feature of this invention to provide an address mode flag field with each cache directory entry to indicate whether the address represented in the entry is a real/absolute address not translated from any address space, or if it is a virtual address. Use of the address mode flag field eliminates the TLB operation from the critical CPU to cache access path when the CPU uses switchable mode logical addressing.

It is a still another feature of thls invention to provide an address mode indicator with each cache directory entry (as an alternative to having any address mode flag field) by using a predetermined value, or range of values within an address space name in each cache directory entry to indicate when the address represented in the entry is:
(1) a real/absolute address not translated from any address space, and the address space name field does not represent any address space name, or (2) a virtual address, and the address space name field represents the name of the address space containing the virtual address. This control over the address space name field content can eliminate the TLB
operation from the critical CPU-to-cache access path when the CPU uses switchable mode logical addressing.

-~;~

It is a further object of this invention to allo~l the use of plural levels of private caches to expand the cache size accessible to a CPU while eliminating the TJ.B
operation from the critical path to all levels Gf private cache accesses for a processor request.

It is another object of this invention to provide a cache design arrangement that provides simplification of synonym resolution. This invention avoids the conventional exponential (i.e. 2 to the n power) increase in synonym resolution complexity as cache size ls increased. Cache size is increased by increasing the number (n) of logical address bits in the cache address.

It is a still further object of this invention to eliminate the need for plural cache directories where plural levels of caches are used to improve the average CPU
access time for storage requests~

It is a further object of this invention to avoid cache directory synchronization problems among plural levels of caches.

It is a another object of this invention to provide a single cache directory common to plural levels of caches private to one processor to reduce the cache directory costs and hardware requirements.

It is a further object of this invention to provide a single cache directory for plural levels of private caches for a processor in which each directory entry has an address mode field or indicator for indicating whether or not dynamic address translation (DAT) is used with the address represented in the cache entry.

~3321~

It is a another object of this invention to provide a single cache directory for plural levels of private caches for a processor in which each directory entry represents a line of data or instructions in the highest level private cache and also indica-tes the location of any part of the respective highest level line in each lower-level cache.

It is another object of this invention to enable the DAT and TLB to operate in parallel while the CPU is accessing data or instructions in any level of the CPUs private caches.

It is a still another object of this invention to eliminate the use of TLB operations in a processor's critical path that accesses data or instructions available in any of the processor's private cachesO

It is a still further object of this invention to eliminate the use of TLB operations in a processor's critical path for accessing data or instructions available in a first-level store-in cache of a processor.

It is another object of this invention to eliminate the use of TLB operations for a processor to use logical address requests, as long as the storage accesses requested by the processor are found in any of plural private caches of the processor.

It is a still further object of this invention to use TLB operations only when a requested storage access is not found in the one of plural caches private to the requesting processor.

It is a further object of this invention to provide a single directory to handle all plural levels of private 3n caches, any of which may be a store-through cache or a store-in cache.
.

~I.Z~33~

It is another object of this invention to provide a single cache directory to handle plural levels of private caches for a processor in which the lowest level is a store-in cache and the highest level is a store-through cache.

It is a further object of this invention to provide a single cache directory to handle two levels of private caches for a processor in which the first (lowest) level has a store-in cache and the second (and highest) level has a store-through cache.

It is another object of this invention to provlde a control array for handling supplementary information required by a lowest level store-in cache (such as for handling its line castouts), while using a common directory for handling the CPU accessing information for the plural levels of private caches.

It is another object of this invention to pxovide a control array (associated with a common directory for plural levels of private caches) for identifying the set associative location in a respective private store-in cache level that contains a data line to be castout oE that level.

It is another object of this invention to provide a control array (associated with a common directory for plural levels of private caches) for handling flagging requirements of a lower private cache level(s).

It is another object of this invention to provide a control array ~associated with a particular cache level among plural levels of private caches managed by a common directory), in which the control array entries correspond _~5-(1(~7 --].1--to the entries in the common directory, and each control arrav entry contains a bin number field for indicating the set-associative location in the next higher level cache for locating the higher level line containing the respective lower level line.

It is a further object of this invention to provide a single directory that signals a cache hit for a processor request available in any of plural private caches of a CPU
and accesses the requested data or instructions in the lowest-level cache containing the data.

It is a still further object of this invention to provide a unique system for detecting and using synonym cache entries.

It is a another object of this invention to provide a system for resolving and using synonym cache entries which may be in any or plural levels of a multilevel private cache system by using a cross-interrogation directory system of a multiprocessor.

It is a further object of this invention to provide a unique system for casting out and invalidating cache entries requested by another processor in a multiprocessor system.

It is a another object of this invention to provide a system for uniquely finding cross-interrogation hits in any CPU in a multiprocessor for cast out or invalidation, in which cross-interrogation hits in the system are communicated by a cross-interrogation directory system that detects the hits.

This invention relates to a data processing system which contains a multi-level storage hierarchy, and may have plural hierarchy levels private to a single CPU (not ' ~%a32~

shared with any other processo~), in which the caches are in close proximity to each other and to the CPU. The lowest cache level L1 is the Eastest level for accessing CPU requests, and Ll has the smallest storage capacity of all levels. The next higher cache level L2 is the next fastest level for accessing CPU request~s, and L2 has a much larger storage capacity than Ll. Other still higher cache levels, L3 etc., may also be provided to obtain a larger cache storage capacity, but they are slower levels for accessing CPU requests. Thus the CPU access time gets longer as an access request needs to go higher in the hierarchy levels to get requested data or instructions.

The respective cache size at each private hierarchy level is flexibly designed to containing any number of lines of data with a line length convenient to the respective cache. Each lower level private cache uses a line length which is a sub-multiple of the line length (e.g. in bytes) of its higher level private cache(s). Any sub-multiple may theoretically be used, and the sub-multiples may be different among different cache levels. Thus the length of each line in a L1 cache is a sub-multiple paxt of the length of each line in its L2 cache. That is, each L2 line is comprised of plural L1 lines, and each L1 line is a sub-multiple part of the L2 line. Hence the L2 line length may be several times the L1 line length.

A CPU request "hits" in the common directory if its requested address is for data available in any private cache of the directory, and the requested data is accessed in the lowest level (fastest) cache in which the requested data or instructions are available.

Each CPU request address (a switchable logical address) simultaneously addresses a congruence class in the common cache directory and a respective congruence class in ~3~a each of the CPU's private caches. Anv potential "hit" for the reauested access is determined by the directory for the cache locations in these congruence classes.

Each entry in the common directory can represent:

(a) a respective line in the highest level private cache, and (b) the location of each part of the line available in any one or more lower level private caches.

That is, corresponding parts in all private cache levels are located through a single directory entry; which is then a 'common directory entry" that manages the corresponding parts in all of the private caches. Thus each directory entry: (a) represents a line in the highest level (largest) private cache ~containing all line parts), and (b) keeps track of every part of that same line that may be copied into any other cache(s) in that hierarchy~

To keep track of all copied parts of each highest-level line, each directory entry contains a line presence field (LPF) for enabling the entry to manage all of its line parts in all private caches. To do this, the LPF with each directory entry indicates:

(1) each other cache level containing a copied part, (2) which part of the line is copied therein, and (3) the set~associative position in each other cache containing the part (if it is a set-associative cache).

Items (1) and (2) may be combined into one LPF indicator if there are only two private cache levels, and item (3) need not be used for any cache-level not using set-associativity.

~2~33Z~

The lP~ is not needed in a directory that itandles only a sLngle private cache for a processor. Thus lPFs are used by this invention in a common directory servicing plural pr~vate caches.

Many large systems presently use the IBM System/370 architecture in which the CPU can switch address modes at any time between real/absolute address mode and virtual address mode, in which the interpretation of the effective CPU storage addresses in a program being executed are controlled by the address mode currently existing. The CPU
addressing mode can switch at any time between the virtual mode (e.g. virtual addresses with a STO, or STO identifier) and the real mode (e.g. real or absolute addresses without any STO, or STO identifier); this switchable type of addressing is callecl "S/370 logical addressing", and it is controlled by PSW bit 5, called the DAT mode bit.

In the invention, each valid cache directory entry must indicate the address mode used by the request which generated the entry. Therefore, the indicated address mode can arbitrarily vary from one directory entry to another for any current state of the cache directory.

CPU accessing of the hierarchy in this invention is to the lowest-level (fastest) cache containing the CPU
requested data in its plural private caches. For example where their are two private caches L1 and L2, a hit in either cache avoids any use of the TLB in the critical path; that is, a miss in L1 but a hit in L2 avoids involving the TLB in the data accessing operation.

In a preferred form of the invention, each common directory entry contains a nuntber of fields in addition to the LPF and the logical address representation field f) l ' '`1 ~2832~a including: an invalid (I~ field, a STO (or STO identifier) field, a change (CII) field, and a DAT ON/OFE field unless the STO field is uniquely controlled to additionally perform the DAT ON/OFF function. (For ~xample, a STO value of zero may be used to indicate the special case of the associated logical address being a real or absolute address, which prevents the zero STO value from being an address space identifier.) Other flag fields can be added to the directory entries to identify special conditions for the associated line of cache data, such as an exclusive/readonly (EX) field in the cache directories in an MP, and a co~mon bit (C) to handle common virtual storage areas in an MVS
environment.

lS The I field indicates if the directory entry represents any data in any of the plural caches; if on it indicates the entry does not represent any valid data, but if off it indicates the entry represents a valid line in at least the highest-level cache. The CH field indicates if the directory entry represented-data has been written-into (i.e. changed) in any of the plural caches; if on it indicates the represented data is changed, but if off it indicates the valid line has not been changed. The exclusive/readonly (EX) flag field (usually a single bit) is used in the cache directories of an MP to indicate whether the line represented by the entry can exist only in a single cache at a time for exclusive CPU access, or whether that line is allowed to be simultaneously represented in plural CPU directories to allow shared access by plural CPUs.

An intermediate-order group of bits taken from the middle portion of the requested logical address is used as a select address for selecting a candidate congruence class in the common directory and in each cache. The locations O !_~, ~2~%~8 of all set-associative entries in each congruence class are prede~ermined, and they are readout of the directory as candidate entries when the congruence class is accessed.

In each readout candidate entry, the location of the I
field, the address representation, the LPF and other entry fields is also predetermined.

The I field in each candidate entry is tested to determine if the requested line exist (i.e. is valid) in the highest-level cache. If the I field is on in all entries in the selected congruence class, the requested line is not in any private cache, and a cache miss is signaled. A cache directory entry is then assigned for the missed request. The congruence class for the new directory entry is determined by the intermediate-order group in the requesting address. One of the set-associative locations for the entry within that congruence class is assigned by a cache directory LRU replacement circuit. The I field in that entry is set on, and the entry content is generated, including an LPF with assigned field and subfields determining the location assignments for the required line in the highest-order cache and for each of its part(s) in each lower order cache. Simultaneously the required line fetch signals are sent to main storage. The fetched line and its required parts are copied into the assigned locations in all of the caches. This main memory line fetch uses the translated page address outputted from the TLB, and a low-order group of bits from the requested logical address defining the required page in the conventional manner.

If a CPU request finds the I field is off in at least one candidate entry, the requested data may be in the highest-order cache and may be in the one (or more, if they ~ 5~007 ~283;~1~

~17-exist) low~r-order cache(s). Then the acldress mode indicator or field is e~amined for each valid entry to determine if the entry's-represented address is real/absolute or virtual. If entry's-represented logical address is virtual, its STO (or STO identifier) field and-its logical address representation field are compared to the CPU request's STO ~or STO identifier) and a high-order group of bits in its logical address, respectively. If the entry's logical address is real/absolute, the STO does not define any address space name in the comparison but only acts as a real/absolute indicator. If the compared fields are equal for any validly candidate entry, a cache hit exist for it.

The LPF must be examined in this cache hit entry to determine in which of the lower-level cache(s), if any, the requested data may be contained, and its set-associative location in its congruence class. To do this, a low-order bit (or predetermined group of bits) next to the intermediate-order group is also taken from the requesting logical address to locate the correct subfield (and sub-sub- field, if it exists) within the LPF in each readout entry. A presence bit located at the beginning of this subfield is tested to determine if the requested line exist ti.e. is valid) in the lowest-level cache.

A lowest-order group of bits is also taken from the requested logical address and is used to select the requested data unit in the selected line.

If the line presence bit in the selected LPF sub-field is off, the requested line is not in the associated cache, and the requested line is accessed in a higher-level cache which is the lowest-level cache containing the requested data which is then copied from the higher-order cache into the lower-order cache.

~Z~32~

-]8-Whenever the processor writes data into a low-level cache which is a store-in cache, the same data is not written into any higher-order cache. But if each higher-level cache is a "store-through" cache, whenever a line is castout of the low-level cache, it will he stored in each higher-level cache and in main storage.

Even when a TLB miss and a DAT operation occur while there is a cache hit in the directory, the cache access is obtained in the above described manner without involving (or waiting for) the DAT or TLB operations. The DAT
operation for the TLB miss will occur in parallel with the cache accessing operations. Since the higher-level cache may hold lines from many more pages than the TLB can hold translations for, it is very possible that the translation will not be in the TLB for a page containing a line presently available in at least the highest-level cache.
While the DAT and TLB are operating for a CPU request, the requested data for the same request may then be transferred to the CPU from the lowest-level cache having the requested data.

Thus, the requested data is accessed in the lowest-order (fastest) cache level in which the data is available.

In an example of two private cache levels, a line presence field (LPF) is included in each entry in the directory to indicate which ,if any, of the L2 parts (i.e.
L2 sub-lines represented in the L2 directory entry) is currently available to the processor in the Ll cache, in order to aid in determining when a Ll hit occurs. Each LPF
has a plurality of subfields for each L2 line in the L2 cache. Each L2 subfield represents a L2 subline in the addressed L2 line which may have been copied into a L1 line location in the Ll cache, whereupon the L2 subline becomes a L1 line. And if a set-associative L1 cache is used, the ~2i3~8 LPF also contains an Ll bin number to select the set-associative location in the the addressed Ll congruence class which may contain the requested data. This L1 location will contain the L1 line having the requested data if the requested address "hit" in the common directory. ~
common directory "hit" requires both an adclress "hit" and a unique LPF "hit".

In the two level private cache example, the LPF in each directory entry may be comprised of a plurality of sets respectively corresponding to the associative sets found in the corresponding congruence class in the Ll cache. Each LPF set may be comprised of a plural bit field, in which one bit represents whether the respective L2 subline is present in the Ll congruence class and the remaining bit(s) in the LPF set combinatorially represent the particular set-associative L1 line containing the respective L2 sub-line.

The LPF for a respective L2 sub-line may represent any Ll set-associative line of the addressed L1 congruence class into which the respective L2 sub-line was copied (then the L2 sub-line also became a L1 line). The L2 sub-line copying is done by a fetch of the sub-line from the L2 cache into a LRU selected associative set location in the L1 cache.

In a further example, if there~are three private cache levels Ll, L2 and L3 for a CPU, the LPFs in the common directory entries basically represent the location(s) in the L2 and L1 caches of sub-lines and sub-sub-lines of data in the L3 cache, which here is the highest level. The LPF
in each common directory entry here comprises a sequence of LPF sub-fields representing the presence of the respective L3 sub-lines that presently exist in L2 line locations.
They are set upon a sub-line fetch from the L3 cache to the ~2~8 L2 cache, involving the copying of a selec~ed L3 sub-line into into ~ selected L2 line location. Each LPF sub-field has a L2 presence flag bit to indicate if the respective L3 sub-line was copied into the L2 cache, and also has L2 set-associative bits indicating its set-associative location in the addressed L2 congruence class. Each LPF L2-associating sub-field further has a sequence of 1,3-associating sub-sub-fields corresponding to respective L3 sub-sub-line~
existing in the Ll cache. Each LPF sub-sub-field has a Ll presence flag bit to indicate if the respective L3 sub-sub-line was copied into the Ll cache, and also has Ll set-associative flag bits indicating its set-associative location in the addressed Ll congruence class. Hence each L3 sub-sub-field represents a multiplicity of L3 sub-sub-lines, any one of which is copiable into a line location in the Ll cache. If the L3 sub-sub-field settings indicate a requested sub-sub-line is not in Ll but is in L2, the sub-sub-line is fetched from the L2 cache, and not from the L3 cache, because this is the fastest way to access the requested data for the CPU. That is, the copying into L2 from L3 may have been done at an earlier time for a different cache miss. However if the L3 sub-sub-field settings indicate a requested L3 sub-sub-line is not in Ll or L2, the sub-sub-line is fetched from the L3 cache into both the L2 and Ll caches.

The number of private cache levels may be theoretically increased by any amount in the above described manner. The comple~ity of the LPF accordingly increases exponentially as the number of private cache levels increases.

Thus this invention allows the CPU to access its cache using switchable logical addresses while TLB translation accessing is being done in parallel for the same CPU
requests.

I' C ` ~
~2s~ a A synonym and/or cross-interrogation directory (S/XI) arrangement is provided for proper functioning of the common directory cache s~stem. A synonym (S) directory is provided for each CPU, and it also can function as a cross-interrogation (~I) directory when the CPIJ is provided in amultiprocessor system. Upon a TLB miss the real/absolute address outputted from the TLs is used to address the S/XI
directory to locate a congruence class, which may contain set-associative en~ries. Each entry in the S/XI directory corresponds to an entry in the common directory, but the (S/XI) congruence classes do not correspond to the cache directory congruence classes because the cache congruence classes are mapped by logical addresses while the S/XI
directory congruence classes are mapped by real/absolute addresses.

When the LA and real/absolute address are related tas is the case with cache misses), the different congruence classes in the common directory (mapped with LAs) and in the (S/XI) directory (mapped with real/absolute addresses) can be found in the different directories. Then any directly related set associative entry in each of these directories is located by a set-associative comparison in the respectively addressed congruence class using the high-order bits of the respective address.

However there are situations where the absolute address is known, but the corresponding LA is not known, or visa versa. This is the case with synonyms and XI
requests. The bin number concept in this invention is used to solve this problem. tThe hin number concept is also used to solve the set associative entry location problem occurring with cast outs from the Ll to L2 cache, in which no real/absolute address was involved with the LA being used and the bin numbers in the found L2 entry are used to locate the L2 sublines corresponding to the castout Ll entry.) ~83218 ~ bin number is provided in each (S/XI) entry in order to locate the required set-associative entry having a S or XI hit in the (S/XI) directory for locating the required L2 line, after a (S/XI) hit entry has been found using a real/absolute address in any S/XI directory. An immediate field of the logical address and the bin number field in the S/XI entry then defines the congruence class and the set associative location where the required entry exists in the common directory of the CPU associated with the respective S/XI directory haviny the S or XI hit. The bin number and the CPU identification for the hit S/XI
directory can then find the correct L2 set-associative entry in the common directory from which its LPF can be used to locate the required entry(s) in the Ll cache for a XI induced castout. Then its LPF bits are examined to locate the data line in the next lower level cache that contains the requested data.

The distinction between a synonym hit and a cross-interrogate hit is determined by whether a hit occurs in the S/XI directory associated with the CPU making the particular request or with the S/XI directory associated with another CPU. It is a synonym hit if it occurs in the S/XI directory associated with the CPU making the particular request, and it is a cross-interrogate hit if the hit occurs in a S/XI directory associated with another CPU. In a uniprocessor there is only one S/XI directory which acts as a synonym directory for CPU requests, but acts as an XI directory for channel requests.

In the case of a synonym hit in the S/XI directory associated with the CPU originating the request, a L1 discrimination bit in the requesting LA (e.g. bit 24) locates the particular Ll entry in the found L2 line. In the case of a cross-interrogate hit in a S/XI directory, an invalidation or castout is required of all hit Ll lines in the Eound L2 line in the cache of the CPU receiving the 33~

hit; invalidation occurs when an unchanged line held in its cache readonly or e~clusively is hit by an exclusive request, and a cast out occurs when a changed line is hit by an e~clusive reques~. No invalidation or castout is required oL an Ll line hit by an readonly request, regardless of whether the Ll line was held in its cache readonly or exclusively; but if it was held exclusively, it is changed -to readonly state.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGURES 1, 2, 3 and 4 illustrate different parts of the preferred embodiment.

FIGURE 5 shows detailed structure for any cell in the common directory shown in FIGURE 3.

FIGURE 6 shows the structure for the L1 control array shown in FIGURE 3.

FIGURE 7 shows detailed structure for any cell in the L1 control array shown in FIGURE 3 or 6.

FIGURE 8 shows a general structure for the L1 cache shown in FIGURE 3.

FIGURE 9 shows a general structure for the L2 cache shown in FIGURE 3.

FIGURE 10 shows the general structure for the translation lookaside buffer (TLB) shown in FIGURE 3.

FIGURES 11 and 12 provide a diagram of another embodiment of the invention.

FIGU~E 13 shows a general arrangement ~ the unique common cache dlrectory found in FIGURE l2.

FIGUP~E 14 illustra-tes an example of any cell in the common cache directory found in FIGURES 12 and 13.

FIGURE 15 shows the general structure for the translation lookaside buffer (TLB) shown in FIGURES 3 and 12.

FIGURES 16A and 16B show a synonym/cross-interrogation (S/XI) directory arrangement used by the embodiments in a multiprocessor environment.

FIGURE 17 illustrates synonym/cross-interrogation (S/XI) response controls found with each CPU used by the embodiments in a multiprocessor environment.

DETAILED DESCRIPTION

In this specification abbreviations are used to save space and reading time. The following is an index of abbreviations and their definitions:

Abbreviation and Definition Index AA: Absolute Address. The l~A is formed from an RA by the CPU with prefixing hardware. AAs are used in multiprocessors.

ACF: Address Control Field. The ACF is a CPU
provided field switched between a zero value when the CPU
is requesting a real address, and a non-zero STO value when the CPU is requesting a virtual address.

3 ' ~

~28~18 Address Concatenation. The expression "STO(5:1~)+LA~1:11) " means, for example, that -the 15 STO
bits are concatenated with the 11 LA bits to form a 26 bit ~inary number.

S AG: Address Generation. Address arithmetic usually summinq a base, index and displacement of an operand address to generate an effective address.

BCE: Buffer Control Element. The BCE is that portion of a CPU which contains the cache arrays, their directories, the TLB (or DLAT) and their control logic.

Bin#: A field in each L1 Control Array entry corresponding to a Ll entry in the common cache directory for finding the set-associative location (A, B, C or D) in the common directory containing that part of a L2 line containing the L1 line. The bin# is used for controlling castouts of changed L1 lines to the L2 cache, e.g. after a L1 cache miss or a cross-interrogate request from another CPU .

Cache: A high speed buffer physically located in close proximity to a CPU for storing "lines of data"
containing instructions and/or operands most recently fetched from maln memory. A line (or block) fetched into the cache will include a number of instructions or operands in the immediate address proximity of the instruction or 2S operand requested from main memory by the CPU. A "private cache" is dedicated to use ~y one CPU, except for cross-interrogate requests in a multiprocessor system.

C/O: Cast-Out. A cast-out line from a cache.

CMP: Compare. It is used to designate hardware compare circuits.

. J _ i ) /

B

D~T: Dynamic Address Translation. D~T is turned on/off by the setting of bit number 5 in the Program Status Word.

LA: Logical Address. Any address provided by a CPU, whether a RA (i.e. untrarslatable) or a VA (i.e.
translatable), which may be controlled by the state of the DAT bit in the P~W.

LPF: Line Presence Field. A field in each common -~- directory entry indicating the Ll cache location represented by the entry. The directory entry basically ` indicates the L2 cache location containing the L2 data line having that Ll line as one of its parts.
:` :`
LRU. Least recently used , or partitioned least recently used IPLRU), circuits. The LRU and PLRU
algorithms determine which line of data in a cache is to be "cast out" to a higher level in the hierarchy, in order to make space for a new line not currently in the cache directory. The cache directory entry of the castout line is invalidated, and that entry may be reassiyned for a new line to be put in the cache.

RA: Real Address. The CPU provides RAs with DAT off which do not use translation, and the CPU provides VAs with DAT on which are translated to generate RAs.

SA: Storage Address. It is the address issued by a CPU for an operand or instruction in the main memory of a system.
.
STO: Segment Table Origin. The STO bits are obtained from CR1 or CR7 (bits 5-19) for primary and secondary storage mode.

- . .: . -. : - , ~ , . ..
- . : ..... :..... . ~ . .. .

~2~33~18 TRAD. Translated address. It is obtained from DAT or a TLB as a result of current or previous DAT operation.

TLB: Translation Lookaside ~uffer (sometimes called DLAT: Directory Look-Aside Table).

UTRAD: Untranslated address. It is the effective address requested by a CPU, and i-t may be a VA, or a RA.

VA: Virtual Address. The CPU generates VAs with DAT
on.

A major cause of loss of performance in general purpose data processing systems is the so called "storage penalty". This occurs in the normal execution of a program when the access time to main memory (also called main storage) is substantially longer than a few machine cycles in order to fetch instructions and/or operand data from main memory. The storage penalty becomes increasingly costly in general purpose computer systems employing virtual storage, because several main storage references (and therefore several invocations of the storage penalty) are required to access tables which are used to perform the virtual to real dynamic address translation (DAT). After DAT, the resulting real address (RA) is becomes known to the CPU and the store/fetch operation in main memory may then proceed.

Computer architects have evolved a number of mechanisms to substantially reduce the storage penalty.
Three commonly used mechanisms for reducing the storage penalty are: caches and their cache directories, and TLBs.
These mechanisms rely on a well established principle called the "loca]ity of reference" principle. Stated simply, once an element of data (instruction or operand) is ~28~ L8 -2~-requested ~rom ~ memory, a line of data is accessed containing that element (and other elements of data in the immediate address proximity to the referenced element). The other elements have a high pro~ability of being referenced in the immediate future by the CPU.

Once a data line is entered into a cache from main memory, subsequent references to data elements contained within the line are accessed from the cache, thereby avoiding the storage penalty.

Prior cache systems have been designed with more than one "level" of a storage hierarchy. An Ll (first level) and L2 (second level) cache hierarchical arrangement has been used. In such a hierarchy, the CPU first attempts to locate CPU requested data in the Ll directory (for the sma~ler capacity, higher speed cache). If the data is not present in the L1 cache (i.e. L1 miss), the CPV will attempt to locate the data in the L2 cache (a larger capacity, lower speed cache, compared to L1). A miss in L2 necessitates a main memory (L3) fetch. But a hit in L2 results in a storage penalty significantly smaller than the storage penalty of a fetch from L3.

Two embodiments are shown in this specification, which differ in the way the entries in the directory aifferentiate between translatable and untranslatable types of addresses represented therein. A first (and preferred) embodiment shown in FIGs. 1, 2 and 3 uses a zero or non-zero address control field (ACF) in each entry, in which a non-zero ACF is a STO value, to differentiate between translatable and untranslatable types of addresses.
A second embodiment shown in FIGs. 11 and 12 uses the DAT

~33~

ON/OFF ~ie]d in the current S/370 BSW in the CPU as a ~ero or one valued address control field in each entry, in addition to any STO, to differentiate between translatable and untranslatable types of addresses.

Both embodiments have a "common cache directory" (CCD) for accessing data in a two-Level cache organization private to a high-performance central processor (CPU). The CCD "remembers" which "lines of data" are currently resident in both the L1 and L2 caches. The CCD contains only untranslated address bits of previously requested CPU
addresses. A "local search" within the CCD is executed in hardware to determine if a line containing requested data is in the Ll or L2 cache, and if so, the CCD generates a signal to "gate" the addressed operand to the CPU from the L1 cache ~if available in L1), and if not, then from the L2 cache (if available in L2).

(The common cache directory avoids a common problem in prior cache organizations from which the CPU can request both untranslatable and translatable types of logical addresses. For example, the TLB and Ll cache previously have had their arrays addressed with bits of an untranslated logical address (virtual or real), while any L2 cache was addxessed with bits of a translated logical address (real or absolute). But each level previously had its respective cache directory typically containing only translated (real/absolute) address fields. A cache hit could only be determined after a translatable logical address had its translation completed, unless the requested logical address was a real address, in which case the TLB
was bypassed. Thus the compare and select logic within the TLB was based on virtual addresses, whereas the compare and select logic within the L1 cache directory (and within any L2 cache directory) had to wait for the completion of address translation.

35-0('7 This prior way of requiring both translated and untranslated forms of the same address to determine a cache hit created considerable complexity to the prior hardware for "sortlng out" what is virtual and what is real, and modifying the addressing paths within the cache system accordingly.) The common cache directory (CCD) of this invention uses only a single form of the switchable logical addresses, which is the untranslated form of all CPU
requested addresses (whether or not it is translatable) for accessing both the L1 and L2 caches. The common directory combines the L1 directory, the L2 directory, and some of the functions of the TLB. The requested form of each logical address (regardless of its actual type) is used uniformly in cache operations without translation, both:
(1) within the entries in the common cache directory, and (2) to address the common cache directory, the L1 cache, the L2 cache, and an Ll control array. This single form for the variable addresses leads to a simpler cache addressing structure, greater economy of hardware due to array consolidation, and higher performance due to reduced hardware in the critical cache path, thereby reducing the cache cycle time.

In FIG. 1, circuits are shown for switching the address type between untranslatable and translatable types, in which the address type is indicated in an address control field (ACF) 28, which is associated with a logical address register 30, in which a non-zero ACF value indicates a translatable LA is in register 30, and a zero ACF value indicates an untranslatable LA is in register 30.

A unique structure (in which onlv untranslated addresses are represented~ is provided in the common cache directory of each embodiment. The elimination of address - translation from cache accessing provides a fast critical path for both real and virtual address requests from the ' CPU to obtain L1 and L2 cache accesses. ~n L2 access is started simultaneously with a Ll access, and is completed if the requested data is not obtainable from the L1 cache.

FIGURE l shows unique hardware provided for the common directory to differentiate untranslated and translated (real and virtual) addresses. The address control field (ACF) is set to indicate whether a requested logical address in register 30 is a translated or untranslated address. ACF field 28 is a multiple bit field which is set by the output of AND gate 21, 22, 23 or 24. A zero output indicates the logical address in register 30 does not require translation (RA or AA), and a non-zero output indicates a logical address requiring translation.

In more detail, AND gates 23 and 24 receive an all zeros signal from a source l9, which may be a microcode source in the CPU. All of the gates 21-24 are controlled by one or more of the control bits in a program status word IPSW) currently in control of the CPU. They include a DAT
mode control bit 5, an extended control (EC) mode bit 12 and an address space control bit 16. The PSW and its content including these bits, is described in the IBM
System/370 Principles of Operation (Form No. GA22-7000-8), e.g. beginning on page 3-14.

The EC mode bit 12 controls whether AND gate 24 outputs an all zero signal, which it does if bit 12 is off indicating the basic mode (equivalent to the S/360 mode of operation). AND gate 23 provides the all zeroes output when the EC mode bit is on, which means that the system is operating with the S/370 architecture. Also, a DAT mode bit 5 conditions the operation of gate 23 due to the inverter inputting that signal to gate 23 when bit 5 is off indicating that DAT is off, so that gate 23 outputs an all zero signal.

~'9~~ 7 ~2~3;~8 Gates 21 and 22 provlde non-zero value signals for indicating logical addresses in register 30 which use translation, these are addresses which require the use of a segment table and a page table in main memory. Gate 22 outputs a segment table origin (STO~ for locating the segment table in main memory. A STO is provided from a control register (CRl) to AND gate 22. It is enabled by:
DAT mode bit 5 indicating DAT is on, the EC mode bit 12 is on, and the address space control bit 16 is off. The AND
gate 21 outputs the STO from CR7 when the DAT mode bit 5 is on and the address space control bit 16 also is on.

The output of AND gates 21-24 is dot-ORed to provide a signal to the ACF register 28, which is capable of representing a STO, but this signal does not represent a STO if it is all zeroes which indicates that the associated logical address does not use translation and hence is either a real or absolute address. The associated logical address in register 30 is the effective logical address, which is the computed form for an operand address.

FIG. 2 illustrates alternate inputs to LA register 30, of which the logical address input from the CPU (by in-gating bits 1-31) is used in the operation described thus far. The other inputs to register 30 are provided by synonym and cross-interrogate directory circuits that control both the Ll cache accessing of synonym entries to - obtain the data requested by the local CPU, and the invalidation and cast-out of Ll cache entries at the request of another CPU.

All of fields 41-49 are gated out simultaneously to FIG. 3 within a single machine cycle to directory 60, cache 63, Ll control array 61, TLB 62, L2 cache 64, LRU circuit 67 and LRU circuit 68.

- ~

~ZB~

FIG. 3 receives the signal outputs from FIG l. ~IG. 3 has a single cache directory 60 that operates in common for controlling the accessin~ of both Ll and L2 private caches 63 and 64. Private caches have the advantage of avoidinq contention between processors making stora~e requests.
Each entry in the common directory can represent a L2 line in a corresponding location in the L2 cache. The same entry also represents any or all parts of the L2 line ~hat are available in the Ll cache.

Thus in FIG. 1, the addressing structure enables each untranslated CPU requested address to address in parallel and in a uniform manner the common directory 60, the L1 cache 63, the L2 cache 64, and the TLB 62. Common cache directory 60 serves as the directory for both the Ll cache 63 and the L2 caehe 64 and eliminates the need for separate cache directories.

The eaehe examples in FIG. 3 (or FIG. 12) are:

Ll eache 63: 64K byte capacity 4W set associative 128 byte line 64 byte data bus to L2 and L3 and CPU
"store-in" cache L2 caehe 64: lM byte capacity 4W set assoeiative 256 byte line 2 L1 lines per L2 line 64 byte data buses to Ll and L3 and CPU
"store-thru" eaehe ~Z`~'æ~R3 The embodiment shown in FIGs. 1, 2 and 3 has a common cache directory 60 in which each entry has the exemplary format shown in FIG. 5. The embodiment shown in FIGs. 11 and 12 has a directorv 160 in which each entry has the exemplary format shown in FI~. 14.

In FIG. 1 the output fields from ACF register 28 and LA register 30 are provided on a plurality of buses 41-49.
The various buses 41-49 each represent a selected field having boundary bit positions indicated by two values separated by a colon symbol, e~cept where there is only a single bit such as in the case of the LPF field 33 providing only bit 24.

The particular output fields of FIGURE 1 are provided for the first embodiment and can easily be changed to accommodate different size addresses, or different size caches or different size common directoriesO Thus, in FIGURE l, the defined fields in register 30 are to accommodate the particular size arrays in directory 60, and caches 63 and 64 shown for FIGURE 3. Buses 41, 43 and 46 are used for determining L2 hits in directory 60 with its comparators fields. Select field 32 is outputted on line 46 for addressing directory 60 in order to select a particular congruence class therein. The ACF output on lines 41 and compare field 43 are also provided to the compare section of common directory 60 to select an entry (A,B,C, or D), if any, in the congruence class for determining an L2 hit in the directory. The LPF select field 33 is outputted on lines 49 to directory 60 for determining if there is an L1 hit when an L2 hit is determined.

FIG. 5 illustrates the fields in each combined directory entry, which has the following fields (the values in parenthesis are the number of bit positions that a field may occupy in the embodiment):

~ 35~

'~IL~;3;YLa~

LRU field: represents the LRV state of the respective L2 line in its congruence class in the L2 cache represented by the respective entry.

I field: represents the invalid/valid state of the respective L2 line represented by the respective entry.

EX field: represents the exclusive/readonly state of the respective L2 line represented by the entry in the L2 cache.

CH field: represents the changed/not changed state of the respective L2 line represented by the entry in the L2 cache.

CM field: represents whether the respective L2 line represented by the entry in the L2 cache is in a page in a common or private virtual address space.

ACF field: has a zero value when the entry represents a real address, and is a non-zero STO value when the entry represents a virtual address. The ACF field was derived from the CPU request from which the entry was generated and is used by the directory compare circuits.

LA field~ contains high-order bits of the logical address frGm which the entry was generated for use by the directory compare circuits.

LPF field: line presence field in each directory entry has 6 bits that indicate wlich, if any, of either part of -36~
the L2 line is available in the Ll cache. The bit positions in each LPF in ~his embodiment are defined as fo]lows:

BIT sIT LPF
POSITION NAMEFUNCTION
1 PPresence bit for part 1 of L2 line.
2 blbl~b2 encode the position of part 1 3 b2of the L2 line in the Ll cache.
4 PPresence bit for part 2 of L2 line.
blbl&b2 encode the position of part 2 6 b2of the L2 line in the Ll cache.

The unit of data selected for "exclusive" allocation and control is an L2 line in both embodiments.

The least recently used (LRU) circuit 67 generates the L2 LRU field content in each directory entry accessed in each of the the described embodiments. The L2 LRU fields are updated in all entries in a L2 congruence class when any entry in the class is accessed. When L2 misses occur in the directory, new entries are made in it to represent a new data line put into the L2 cache. During a cache miss, all the fields in the selected congruence class are examined to find which entry to assign to receive a new entry to be generated for the CPU missed request. In this manner, the LRU algorithm determines which directory entry (and its corresponding line of data in L2) may need to be "cast out" to main memory (L3) in the hierarchy, in order to make space for a new entry before the content of the new entry can be written into the directory. (A cast-out from the L2 cache is required only when L2 is operated as a store-in cache. If L2 is a store-thru cache, then no cast-out is required after the LRU designation for the new entry.) ) ~83~

FIG. 4 represents the common cache directory 60 and shows an e~ample of a selected one of its 4-way set associative congruence classes. The compares in directory 60 are illustrated as four comparator circuits 7lA-D for respectively comparing each of the four entries in the selected congruence class with the ACF field and LA
field 1:13. Each compare circuit 71 comprises sub-compare circuits 72 and 73 which are combined in AND circuit 74.
When AND circuit 74 provides an output signal, it indicates an equal comparison that deterrnines the respective entry has a cache hit. When AND circuit 74 does not provide an output signal in response to a congruence class selection, it indicates an unequal compaxison that determines the respective entry does not have a cache hit. If the ACF
value is zero (meaning that the CPU request is a non-translatable address), a directory entry must have a zero ACF value to be a~le to provide an equal comparison.
If the ACF value is non-zero (meaning that the ACF is a STO, the CPU request is a translatable address), a directory entry must have the same non-zero ACF value to be able to provide an equal comparison. The outputs of AND
circuits 74A-D signal if there is a L2 cache hit in the selected congruence class. An L2 miss signal is generated in FIG. 4 by an AND circuit 80 which receives the inverted outputs of AND circuits 74~

In this manner, each of the cache directory compare circuits automatically operates under control of the current ACF signal to determine within the comparison whether it occurred between: (1) a translatable LA and an untranslatable entry-represented address (UERAD), (2) an untranslatable LA and a translatable entry-represented ,address (TERAD), (3) an untranslatable LA and an UERAD, or (~) a translatable LA and a TERAD. This invention requires that comparison (1) or (2) always be declared an unequal ~L2E~3~18 comparison, even though the compared LA values are equal.
Only comparison (3) or (4) may be declared an equal comparison when the compared LA values are equal.
Accordingly a L2 cache miss is determined whenever the S requested LA and each of the TERADs in the selected congruence class have different translation characteristics, regardless of whether any of their TERADs equal the requested LA.

Whenever an L2 cache hit is determined, the directory 60 must determine whether a Ll cache hit exists. In FIG.
3, a Ll cache hit is determined by operation of the Ll hit determination circuitry 75 when a L2 hit is determined by the output of its respective AND circuits 74. Circuitry 75 uses the LPF select field 33 from FIG. 1 from which it is signalled on line 49 and is used in the directory in FIG. 4 to locate the part of the LPF needed for determining if any Ll cache hit exists. If a Ll hit exists, a signal is outputted on one of four Ll cache hit signal lines A, B, C
or D to select the correct data line of the four data lines in the congruence class currently being addressed in Ll cache 63 by the LA bits 18:25 being provided on signal lines 47. The activated one of the four data lines A, B, C, & D in the required Ll congruence class is thereby selected and the requested bus unit (e.g.four quadwords) is outgated as the requested Il data on the data bus to the CPU .

Thus in FIG. 4, each circuit 75 comprises a pair of gates 76, 77 and a decode circuit 78. Gate 76 is enabled by the first (leftmost) bit P when it is in a one state to select and pass the first set of LPF bits bl and b2 to decode circuit 78. Gate 77 is enabled by the second (rightmost) bit P when it is in a one state to select and pass the second set of LPF bits bl and b2 to decode circuit 78. Decode circuit 78 combinatorially decodes its received bits bl and b2 to determine the particular Ll data line of the four (A, B ,C, or D) in the selected congruence class in the Ll cache.

' .

~Z83~8 If the compare operations of director~ 60 find only unequal co~parisons Eor the directory entries in the selected congruence class, there is no L2 cache hit; and a L2 cache miss is declared. It is signaled to the output gate of TLB 62 in FIG. 3 to control the outputting from TLB
62 to main memory of a RA (in a UP) or an AA (in a MP) for fetching the missed L2 line. An L1 miss signal is generated in FIG. 3 by an ~ND circuit 81 which recei~es the inverted outputs of decoders 78.

FIG. 8 represents the L1 cache and its data line select circuitry. Each Ll data cell can contain a data line having 64 bytes in this example. Its cell select gates 82 select one cell in the selected congruence class by enablement of one of its input lines A, B, C or D, which are activated by the output of one of OR circuits 86 A, B, C or D, that are enabled by the output of either an Ll hit select signal A, B, C or D, or a castout selected bin number from the L1 control array 61 in FIGs. 3 and 6. The cell select gates 82 provide two types of outputs from the selected data line; they are the entire line and the data unit in the line being requested by the CPU. The entire data line is needed when it is being castout to the L2 cache and the L3 main memory. This transfer occurs as two contiguous 64 byte blocks; each requiring a separate L1 cycle using LA bits 18:25. The data unit in the selected data line is needed when it is being requested by the CPU, and it is addressed within the selected data line by LA bit 25 for being outputted on the data bus to the CPU.

FIG. 9 represents the L2 cache and its L2 data line select circuitry. Each L2 data cell can contaln a bus unit of 64 bytes in this example. Its cell select gates 83 select one cell in the selected congruence class by enable-ment of one of its input lines A, B, C or D, which are activated by the output of one of OR circuits 87 A, B, C or ~L2~

D, that are enabled by the output of either an L2 hit select signal A, B, C or D from EIG. 4, or a castout LRU
select signal from the L2 LRU circuit 67 in FIG. 3. The cell select gates 83 provide two types of outputs from the selected data line; they are the entire line and the data unit in the line being requested by the CPU~ The entire data line is needed when it is being castout to the L3 main memory. (A cast-out from the L2 cache is required onl~
when L2 is operated as a store-in cache. If L2 is a store-thru cache, then no cast-out is required after the LRU designation for the new entry.) The Ll part of the selected data line needs to be addressed within the selected L2 data line by LA bit 24 for receiving any L1 line being castout of the L1 cache on the data bus to the L2 cache. The L2 transfers occur in bus units of 64 bytes.

In FIG. 3, no output from translation Lookaside Buffer (TLB) 62 is needed for cache accessing until an L2 cache miss occurs. The TLB simultaneously stores the page frame addresses of both "real addresses" (or "absolute addresses") and "virtual addresses" that have been recently requested by the CPU. The TLB congruence class is addressed by bit positions 12 to 19 (12:19) of each CPU
requested logical address, and all entries in that class are compared with the ACF and with LA 1:11. If they compare equal, the requested address is contained in the TLB; then its page frame real address is immediately known to the CPU from the TLB for accessing main memory, and a "long path" DAT wait cycle is avoided.

The TLB generates an entry for each requested ~A
3G address, regardless of whether it is a translatable LA or an untranslatable LA (i.e. VA or AA). In this regard, TLB
62 is not a true TLB, because a true TLB only contains translatable addresses (i.e. VAs). That is, the LA field in any entry in TLB 62 may contain either a VA or RA~ But the AA fields in all entries in the TLB always contain only P~As or AAs, according to whether the TLB is in a UP or MP, respectively.

33~

The TL13 array stores, for each entry, a portion of the effective logical address, the STO, etc., as is conventionally required ~or TLB operation. Each valid TLB
entry also contains the page Absolute Address (~A).

The LA bits 12:19 on bus 44 from FIG. 1 select a TLB
congruence class. Also, bus 41 provides the ACF field to the TLB 62 to do a comparison between the signaled ACF and the ACF-representation in each TLB entry. Like the cache directory compare circuits, the TLB compare circuits automatically operate under control of the ACF signals to control the comparison according to whether it is between:
(1) an untranslatable LA and a translatable entry-represented address (TERAD), (2) a translatable LA
and an untranslatable entry-represented address (UETRAD3, (3) an untranslatable LA and an untranslatable EA, or (4) a translatable LA and a TERAD. This invention requires that an unequal comparison always be declared for cases (1) and (2), even though the compared LA values are equal. Only comparison (3) or (4~ may be declared an equal comparison when the compared LA values are equal. Accordingly a TLB
hit is determined only for an equal condition found under the process of this invention, which is only when the requested LA and each of the TERADs in the selected congruence class have the same translation characteristics when their TERADs equal the requested LA in register 30.
And a TLB miss is determined under the process of this invention when the requested LA and each of the TERADs in the selected congruence class have different translation characteristics even though their TERADs equal the requested LA in register 30.

Upon a L2 cache miss, the TLB is caused to output an address (an RA or AA), which is sent to main memory L3 to fetch the missing L2 data line. Before the TLB can do this, it must contain an entry for the requested LA. It is searched using the I.A to find a congruence class, and using ; I _ ~ `' ! ' ~

32la the ACY and high-order LA bits for examining its entries.
If all entries in the selected TLB congruence class c~use an unequal comparison, the requested LA does not have anv entry in the TLB, and a TLB miss is declared. Then a TLB
entry is generated for the requested LA. But the requested LA requires translation only if it is a VA, and then the LA
is sent to DAT circuitry in the CPU to perform the translation, in which case the CPU must wait for this translation to complete the entry and have the translated address with which the required L2 data line can be fetched from main memory. However the CPU can operate in parallel with TLB operation if the CPU has another address to request in another L2 cache data line, while the L3 fetch is being made.

15 FIG. 10 shows more detail for the TLB circuit used in the embodiment of FIG. 3. It is two-way set associative, and one of its congruence classes is selected by the current LA bits 12:19. It has cell select compare circuits 83A and B respectively receiving the two cells outputted from the selected congruence class. Each circuit 83 is internally identical to each circuit 71 in FIG. 4, and each circuit 83 detects compare-equal inputs from its cell and from the current CPU request by providing an output signal to it:s respective gate 84A or B to enable it to pass the absolute address (AA) from its cell to an address bus 86 to main memory L3. If the respective circuit 83 receives unequal inputs, it does not enable its gate 84. At most, only one of gates 84A and B can output an AA to bus 86.

A TLB miss is detected by an AND circuit 87, which receives the inverted inputs from the outputs of cell select compare circuits 83A and B. Each TLB miss is provided by AND circuit 87 to a gate 88, which is enabled by any DAT ON signal from PSW bit 5 in FIG. 1 to thereby pass a current virtual address to DAT circuits 81 in FIG.

1~3321~

10 for translating it to an ~A. A TLB LRU circuit 90 enables a respective write circuit 82A or 82B to ~rite the newly generated AA in a TLB cell assigned by the LRU
circuit to receive a new entry for the VA -that missed in the TLB.

No TLB output is required when the directory 60 has found a L2 cache hit, even though directory 60 also may have found a Ll cache miss. The Ll miss condition is determined by the L~ hit circuitry 75 whenever the set-associative comparison operation for a selected congruence class in directory 60 finds an L2 entry having a compare-equal status (i.e. L2 hit). Circuitry 75 uses: (ll the LPF select field in the found L2 entry, and (2) LA bit 24 from the current logical address in order to determine if any part of the corresponding L2 line (now known to be available in L2 cache directory 64) is present in the Ll cache~ If LA bit 24 is zero, the first part of the LPF
field is selected. If LA bit 24 is one, the second part of the LPF field is selected. In the selected part of the LPF
field, the state of the P bit is tested: If the P bit is one, the requested data is in Ll, and an Ll hit is declared. The LPF bits bl and b2 are combinatorially examined to determine which Ll entry (A, B, C or D) is the Ll hit. Then the Ll cache hit, which is signalled on one of four lines A, B, C, or D to Ll cache 63 to enable accessing the correct Ll to obtain the requested data from it, which is located by address bits 18:25 and outgated to the CPU.

When the P bit is zero, and an Ll miss is declared, i.e. a L2 hit and Ll miss situation, the Ll line is fetched from the current L2 hit line and is copied into the Ll cache, while the requested data in the line is being sent to the CPU. The part of the L2 line comprising the required Ll line is located by the LA bits 14:24 and the requested data in that part (i.e. 64 bytes) is located in Ll cache 63 by the LA bits 18:25.

~?~

Essential supplementary information about the Ll entry being ~enerated is stored in an L] control array 61, which is done when the CPU misses in Ll. Whenever an Ll entry is selected by the Ll IRU to make space for a new L1 entry, any Ll entry existing in that space (which has the change bit set on) must be castout to L2, and the logical address (LA) field in the L1 control array 61 is needed to locate the corresponding L2 entry. The L2 address is formed from LA[14:17] from the control array and LA[18:25] from the requesting address.

Since the L1 cache is a store-in cache, and its CPU
may be in an MP, there will be times when the activity of another processor (e.g. channel or another CPU) will provide a foreign request that may require the invalidation or castout of a changed L1 data line in the Ll cache. Such a request from an external source does not know the location of any corresponding entry in the L1 or L2 cache.
The L1 control array 61 (LlCA) in FIGS. 3 and 6 supports several functions in relation to Ll cache 63, including normal Ll cache flag indications. Each request requiring an L1 line invalidation or castout from L2 or from the L1 and L2 caches, must send the required LA bits of the request.

The location of each entry in the Ll control array 61 corresponds to a like located L1 line in the L1 cache. The content of the LlCA entry represents the state of tne corresponding data line in the L1 cache. The content of each LlCA entry is shown in FIG. 7 to contain the following fields (the values in parenthesis are the number of bit positions that the field may occupy in the embodiment):

I ~ield: represents the invalid/valid state of the respective L1 lîne.

~z~

EX field: represents the exclusive/readonly state of - the respective Ll line represented by the entry.

C/O LA field: contains bits 12:17 of the logical address in the address being generated. These bits are concatena-ted with bitts 18:23 in -the requesting address for locating the congruence class in the common cache directory and the TLB.

Bin# field: finds the associative set (A, B, C or D) in the selected common d~irectory congruence class. The corresponding L1 and L2 cache entries can be in any associative set in either cache. That is, the bin#
indicates the associative set in L2 for a line in Ll.

CH field: represents the changed/not changed state of the respective Ll line represented by the entry. Only changed lines are castout o~ any cache.

LRU field: represents the least recently used (LRU) entry of the associative sets A, B, C, and D for the four entries in each congruence class in the Ll cache.

In a uniprocessor (UP), the local CPU is the only CPU
in the system, although there are usually other processors in the form of channel processors. In a multiprocessor (MP), there are other CPU(s) in addition to the local CPU, and each respective CPU may have one or more channel processors or none.

When the local CPU makes a request which misses in the common directory 60, the L1 LRU locates a corresponding LlCA entry in the selected congruence class in LlCA and the L2 LRU locates an entry in the selected congruence class in -l~07 the directory 60. The fields in the accessed LlCA entry are generated when the corresponding common directory entry is generated in the directory 60. In both arrays, the corresponding invalid (I) fields are then set off, the corresponding change (C~l) fields are set on whenever the L1 line receives a write access, the corresponding exclusive/readonly bits (E) are set according to the L1 line request type (if any L1 line is exclusive in an L2 line, the entire L2 line is set to the exclusive state), the high-order LA bits 12:17 are set in LlCA for later use in finding the corresponding entries in the L2 cache and the TLB, the bin number of the corresponding L1 cache location is inserted into the common directory entry, and the L1 and L2 LRU fields controlling the congruence classes containing these entries are updated by the respective LRU
circuitry 67 and 68 to control the selection of a candidate for replacement of the next entry in the respective congruence class.

Thus, the L1 control array (LlCA) 61 enables any changed entry in the L1 cache to be castout to the L2 cache at the correct L2 location, such as when there is no invalid entry available and the L1 LRU circuitry must choose one of the valid entries, in which case the LRU
selects the least recently used entry in the same congruence class and causes it to be castout. This is done by first storing the content of the reassigned L1 cache entry in its corresponding location in the L2 cache at a set-associative location determined by the bin number (bin#), and the L2 congruence-class locating field in the LlCA entry originally obtained from the bits 12:17 in the corresponding LA address.

A Ll LRU entry assignment is require when there is a L1 cache miss and all set-associative entries are valid (with previously-written entries) in the addressed ~18 congruence class. This will cause a Ll castout from an LRU
selected entry, if it represents a line with changed data.
No cast out is needed if the represented data is unchanged.

A 1RU Ll castout will be to a corresponding entry in a valid L2 cache line, and to main memory L3 when L2 is a store-thru cache. In that case, the TLB is addressed by LA
[12:17] from the control array and LA [18:19l from the requested address. The directory entry at the castout location gives the ACF and LA [1:11] fields needed to complete the TLB compare to determine the L3 address (AA or RA) for the castout line. The new request (causing the current Ll miss) will have a new entry written into the LRU
freed entry. That is, a new entry is to be written in Ll at the LRU assigned location. However before the new entry can be written, any required cast out from the old entry must be made before the information in the old entry is destroyed by being written over. The invention solves this problem by storing the LA [12:17] and the bin# from the LlCA array in the entry corresponding to the LRU selected entry, which must be accessed for the cast out before the old LlCA entry is written over by a new LlCA entry for the new Ll line. The castout Ll entry can be in any set associative location in the cast-out L2 congruence class, whereas the Ll entry can be found in any set associative entries of the addressed L2 congruence class.

Another cause of Ll cache line castout or invalidation is a cross-interrogate (XI) request (1) from another CPU
for an exclusively (EX) held L2 line if the requested line is indicated to be changed by the corresponding entry in the LlCA, or (2) from a channel processor for data from a ,requested line.

Cross-interrogate directories are used to determine the potential need for a cache line cast-out or invali-dation by another CPU. (A pertinent cross-interrogate directory is described in the IBM Technical Disclosure Bulletin, Volume 26, No. ll, April 1984, pp 6069-6070.) _ "

The synonym and/or cross-interrogation direc~ory (S/XI) arrangement in FIG.16 provides a S/XI directory associated with each CPU in the MP. Each S/XI directory is a synonym (S) directory for the common cache directory in its associated CPU. Each S/XI directory is also a cross-interroqation (XI) directory for all other CPU(s) provided in the multiprocessor system, and for all channelc.

In FIG. 16, the plurality of synonym/cross-interrogate (S/XI) directories 230-1 through 230-N correspond to the respective CPUs l-N in the MP system. In a UP system, only directory 230-1 is provided.

Upon a L2 cache miss, the real/absolute address outputted from the associated TLB is used to address all S/XI directories to locate a congruence class in each S/XI
directory 230. Each congruence class contains plural set-associative entries.

Each entry in any S/XI directory 230 has a corres-ponding entry in its associated common cache directory, but the S/XI congruence classes do not correspond to the cache directory congruence classes, because the cache congruence directory associated with a requesting CPU; and a XI search occurs in all of the other S/XI directories.

The content of each entry in each XI directory 200 is illustrated in FIG. 16 by the illustrated entry 231, and it contains: an absolute address, the L2 congruence-class locating field (LA [14:191), a bin number (bin#), an exclusive indicator, an invalid indicator, and a directory LRU field. The contents of each entry (except its LRU
field) reflect the information about the common cache miss that generated the respective S/XI entry.

~Z8;~

-~9-The output of S/XI priority circuit 211 (provided to register 212) contains the selected request's CPU ID, its absolute address, its common directory bin number, and its Exclusive bit state. Absolute address register 212 receives the priority selected Outpllt request, and CPU ID
decoder 220 receives the identifier of the CPU which has the request in register 212. (In a UP, no CPU identifier is necessary since there is no other CPU in the system.) All S/XI directories are searched for the current request's absolute address bits 1:19 (and optionally for logical address bits 14:19) in register 212. Compare circuit 232 receives the (n) entries (e.g. four3 in each directory's congruence class addressed by the absolute address bits 14:19 in register 212, and compares the absolute address field in each of the four entries with the absolute address bits 1:13 in register 212 to determine if any entry in any S/XI directory contains an entry having that absolute address. Comparator 232 outputs an unequal signal on line 23~ or an equal signal to ANDs circuit 251, 252, 253 and to gate 261.

If no entry in the requesting CPU's associated S/XI
directory is found to compare-equal with the request's absolute address, an unequal signal is provided to a write circuit 233 associated with the same S/XI directory.
Then there is no synonym entry in the requesting CPU's cache, and a new S/XI entry is written into the addressed congruence class at a set-associative location determined by the S/XI LRU field in the respective S/XI entry to represent the requested L2 line in the S/XI directory. The new entry will be used by subsequent S/XI searches to determine synonym or a XI hit conditions for the new L2 line fetched into the L2 cache as a result of the current request.

~83~

~ There must be no en~ry in the requesting CPU
associated S/XI directory found to compare-equal with both the request's absolute address and its LA, because then the entry is indicated to be in the requesting CPU's cache, and a no L2 miss should have occurred; and an error condition should be indicated.) A "synonym hit" is indicated by an equal signal output from AND circuit 252 when a compare-equal condition is found with the AA field (and not with the LA field) of any entry in the S/XI directory associated with the CPU having the current request.

A "XI hit" is indicated if a compare-equal condition is found in any other S/XI directory (i.e. associated with a CPU other than the requesting CPU). Thus each S/XI
directory is a synonym directory or its associated CPU;
and the same directory is a cross-interrogate directory for all other CPUs and all channels in the system.

The XI search is simultaneously done in the other S/XI
directories by AND gates 252 and 253, which receive the equal signal from comparator 232 and also receive an inverted CPU ID signal from the CPU ID decoder 220 indicating that they are operating for the other CPUs and not for the associated CPU. Circuit 252 also receives the EX signal from the found S/XI entry to output a cast out signal to its associated CPU for the current request.

Circuit 253 also receives an RO signal (the invert of the EX signal) from the found S/XI entry and a signal from the EX field in register 212 to output a XI invalidate signal to its associated CPU for the current request to ~ -0C7 1;2P~3~

~51-indicate when the current request is for an exclusive request which hit a readonly entry that could not have been changed and therefore only needs invalidation.

Whenever a synonym or XI hit is obtained in any S/XI
directory, some of the content of the hit entry is sent to the requesting CPU (i.e. the CPU currently having its CPU
ID in register 212). The transferred content is the L2 congruence-class locating field, the bin number (bin#), and the exclusive/readonly (E) field. This is done by gate 261. The transferred L2 congruence-class locating field will address the correct congruence class in the common directory and LlCA; and the transferred bin# locates the required entry therein. The LPF in the selected common directory entry locates the Ll line to be accessed for a synonym hit, or the Ll line(s) to be castout to the requesting CPU for a XI hit.

A S/XI out bus 290 receives the output signals from the S/XI directory having a S or XI hit and provides them to the requesting CPU having the CPU ID currently in register 212~ FIG. 17 shows a circuit for each CPU that receives the signals outputted from FIG. 16.

In FIG 17, when received by the requesting CPU, a synonym signal from the S/XI circuitry to store/fetch control logic circuits 314 causes the requesting CPU to execute the cache access at the synonym address. But a XI
cast out signal to c/o control logic circuit 312 causes a cast-out of the addressed line only if it is changed, i.e.
its CH bit is on. The bin number is received by decoder 301 which activates an AND circuit 303A, B, C or D when it is conditioned by a synonym, castout or invalidate signal from OR circuit 302 to provide a signal to OR circuit 70A, B, C or ~ in FIG. 4.

~52-In this manner the bin numbers are sent to the LlCA, where the l~ne change bits are tested. For each such changed Ll line for the XI request, a cast-out is initiated to update the corresponding L1 line part(s) in the L2 cache, and then this L2 line is castout to update the corresponding line in main memory (L3) from which the requesting CPU can get its requested XI data. (For a store~thru L2 cache, only the L1 cast-ouL operation is required, since it will store-thru to main storage.) Thus the S/XI directory may be considered to be partitioned into N discrete parts, in which N is the number of processors, each with a private cache(s) that share(s) main memory (L3). One such partition is allocated to each CPU. If the S/XI interrogation finds a XI hit on a partition different than that allocated to the requesting CPU, then a cast-out request is initiated. However, if the S/XI interrogation finds a hit to the partition allocated to the requesting processor, this is a synonym discovery.
In the second case the LA [14:19] and bin number are returned to the requesting CPU, and the cache access may now execute at the synonym address.

The second embodiment shown in FIGs. 11 and 12 primarily differs from the embodiment in FIGs. l and 3 in the entrv structure of the common cache directory 160 and TLB 162. In FIG. 12 each entry in directory 160 and TLB
162 contains a one bit DAT OFF field received from the current S/3~0 PSW in the CPU as a zero or one value, respectively indicating DAT OFF and DAT ON states, and also contains any STO or STO ID being provided by the CPU.

In FIG. 12, the common cache directory (CCD) 160 is also used (like in FIG. 3) for accessing data in a two-level cache organization private to a high-performance central processor (CPU). Likewise the CCD contains only ~283Xi8 untranslated address bits of previously requested CPU
addresses, whether or not those address were translatable.
Also a "local search" within the CCD is executed in hardware to determine if a line containing requested data is in the L1 or L2 cache, and if so, the CCD qenerates a signal to "gate" the addressed data to the CPU from the L1 cache if available in L1; and if not, then from the L2 cache if available in L2.

Thus the common cache directory 160 and TLB 162 in the second embodiment use a different form of the CPU requested switchable logical addresses in their untranslated form (whether or not they are translatable) for accessing both the L1 and L2 caches. The requested form of each logical address (regardless of its actual type) is likewise used uniformly in cache operations without translation, both:
(1) within the entries in the common cache directory, and (2) to address the common cache directory, the L1 cache, the L2 cache, and an L1 control array.

In FIG. 11 circuits are also shown for switching the address type between untranslatable and translatable types, but the address type is indicated in a DAT OFF register 26 which receives the inverted value of the DAT mode bit 5 in the PSW. The content of register 26 is associated with the content in logical address register 30, in which a one value in DAT OFF register 26 indicates an untranslatable LA
is in register 30, and a zero value in DAT OFF register 26 indicates an translatable LA is in register 30 .

FIGURE 11 shows unique hardware provided for the common directory to differentiate untranslatable ~real) and transiated (virtual) addresses. The DAT OFF register 26 is an address control register set by the inverted state of DAT mode bit 5 in the PSW to indicate whether a requested logical address in register 30 is a translatable or ~sræ~

untranslatable address. Register 30 has a single bit field. ~ ~ero DAT OFF value in register 26 indicates the logical address in register 30 requires translation (it is a V~), and a one output indicates a logical address not requiring translation (it is a RA or AA). A STO ID value in register 27 is set by the output of AND gate 21 or 22.
The STO ID value in regis-ter 27 may be zero or non-zero, but it is valid only if the value in DAT OFF register 26 is zero; that is, if register 26 contains a one the content of STO ID register 27 is invalid.

In more detail, AND gates 21 and 22 are conditioned by DAT mode control bit 5, extended control (EC) mode bit 12 and address space control bit 16 for selecting between the STOs in CR1 or CR7. The output of AND gates 21 and 22 is dot-ORed to provide a STO signal to STO ID assignment circuits 25. Any selected STO ID is provided by the CPU.
The outputs of registers 26 and 27 are provided on buses 40 and 45, In FIGURE 11 all of fields 40, 45, 42-49 are gated out simultaneously to FIG. 12 within a single machine cycle to directory 160, cache 63, L1 control array 161, TLB 162, and cache 64. Thus only the structure of directory 160 and TLB
162 in FIG. 12 are different from directory 60 and TLB 62 in FIG. 3. Likewise the single cache directory 160 operates in common for controlling the accessing of both Ll and L2 private caches 63 and 64.

Thus in FIG. 12, the addressing structure enables each untranslated CPU requested address to address in parallel and in a uniform manner the common directory 160, the Ll cache 63, the L2 cache 64, and the TLB 162.

Each entry in directory 160 has the exemplary format shown in FIG. 14, which differs from the format shown in FIG. 5, by FIG. 14 having a STO ID field instead of an ACF
field, and additionally having a DAT OFF field.

\ ~

~z~

FIG. l3 represents the common cache director~ 160 and shows an e~ample of a selected one of its ~-way set-associative congruence classes. The compares in director~
160 are illustrated as four comparator circuits 171A-D for respectively cornparing each of the four entries in the selected congruence class with the DAT OFF field and LA
field 1:13. Each compare circuit 171 comprises sub-compare circuits 172 and 173, AND circuits 174 and 177, and OR
circuit 177. When any AND circuit 177 provides an output signal, it indicates an "equal" comparison that determines the respective directory entry has a cache hit. When AND
circuit 177 does not provide an output signal in response to a congruence class selection, it indicates an "unequal"
comparison that determines the respective directory entry does not have a cache hit. If the DAT OFF field is one, the CPU requested address in LA is a non-translatable address, and whatever value is in the STO ID field is disregarded in the comparison by providing a signal from AND circuit 174 through OR circuit 176 to enable AND
circuit 177 regardless of the STO or STO ID values. If the DAT OFF value is zero, the STO ID (whether zero or non-zero) is used in the comparison operation, since AND
circuit circuit 174 receives a zero value as the current DAT signal with the CPU request r and therefore does not provide any output signal to OR circuit 176, so that the output of the STO ID compare circuit 172 controls the enablement of AND circuit 177. An L2 miss signal is generated by an AND circuit 180 whlch receives the inverted outputs of AND circuits 177.

In this manner, each of the cache directory compare circuits 171 automatically operates under control of the ,current DAT state signal to determine whether the comparison is between: (1) a translatable LA and an untranslatable entry-represented address (UTERAD), (2) an untranslatable l,A and a translatable entry-represented ~z~

address ~TERED), (3) an untranslatahle LA and an UT~RAD, or (4) a translatable LA and a TERAD. This invention requires that comparison (1) or (2) always be declared an unequal comparison, even thou~h the ccmpared LA values are equal.
Only comparison (3) or (4~ may be declared an equal comparison when the compared L~ values are equal.
Accordingly a L2 cache miss is determined whenever the requested LA and each of the TERADs in the selected congruence class have different translation character-istics, regardless of whether any of their TERADs equal therequested LA.

Whenever a L2 cache hit is determined, the directory 160 also determines whether a L1 cache hit exists. The L1 hit circuit 75 in FIG. 12 is provided by select and decoders 175A-D in FIG. 13, one of which can receive an L2 hit output from one of AND circuits 177A-D if there is a L2 cache hit in the selected congruence class. Thus any L1 cache hit is determined by operation of the L1 hit deter-mination circuitry 175A-D when a L2 hit is determined by the output of its respective AND circuits 177. Circuits 175A-D may be the same as circuits 75A-D in FIG. 4, and they operate in the same manner to locate the part of the LPF needed for determining if any L1 cache hit exists.
Also the AND circuit 181 signals a L1 miss and is the same as the circuits 81 in FIG. 4.

AND circuits 178 each receive the L1 miss signal from circuit 181 and the respective L2 hit signal from circuits 177 to output a signal indicating a Ll hit on a Ll miss.

Any Ll hit signal from any decoder 175 is provided on one of four Ll cache hit signal lines 179A, B, C or D to the L1 cache 63 to select the correct data line of the four data lines in the congruence class currently being o l ~z~

addressed in Ll, cache 63 hy the LA bits 18:25 being provided on signal lines 47. An activated one of the four data lines A, ~, C, & D in the required L1 congruence class thereby selects a requested bus unit (e.g. four quadwords) in the line, and the unit i.s outgated as the requested I,l data on the data bus to the CPU.

In FIG. 12 (as in FIG. 3), no output from Translation Lookaside Buffer (TLB) 162 is needed for cache accessing until ar, L2 cache miss occurs (which statistically infrequently happens).

FIG. 15 shows more detail for the TLB circuit used in the embodiment of FIG. 12. The FIG. 15 TLB circuit differs from the TLB of FIG. 10 in that FIG. lS uses a DAT OFF
field and a STO ID field in each entry instead of the ACF
field in FIG. 10. Each of compare circuits 80A and 80B in FIG. 15 is structured and operates the same as each compare circuit 171 in FIG. 13. Otherwise the T~B arrangements in FIG. 15 and FIG. 10 are the same.

While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (23)

1. A logical address private cache arrangement in a data processing system including a processor, a translation lookaside buffer (TLB) and a main storage, the processor being able to switch its mode of addressing between real and virtual in its requests for data units from main storage, the cache arrangement comprising:
a first cache directly accessible to the processor, the first cache having locations for containing a plurality of lines of data initially copied from main storage, a cache directory having a plurality of directory entries, each cache directory entry including a logical address representation with an associated indicator of whether the representation is of a real/absolute address or is of a virtual address, means providing to the cache directory each logical address (LA) requested by the processor with an indicator signal of whether the requested LA is a real/absolute address or a virtual address, directory selecting means for receiving each logical address (LA) requested by the processor and selecting a set containing one or more potential hit entries in the cache directory, cache hit determining means for examining each potential hit entry in the set by comparing the logical address (LA) and the indicator signal requested by the processor with a LA representation and the associated indicator in each valid entry in the set for a match condition in order to determine if any entry is a hit entry without using any address translation from the TLB.
2. A logical address cache arrangement in a data processing system as defined in Claim 1, further comprising:
plural caches private to the processor including the first cache and one or more other caches up to an Nth cache, the caches being at different hierarchy levels in relation to the processor, the cache directory being a common cache directory for the plural caches for receiving each indicator signal and its logical address (LA) request from the processor to determine if data requested by the processor exists in any data line in one or more of the plural caches.
3. A logical address cache arrangement in a data processing system as defined in Claim 2, further comprising:
the directory selecting means receiving a LA
representation from the processor for each new storage request for selecting a congruence class of set-associative entries in the common directory, the cache hit determining means comparing each requested indicator signal and its LA signal with each indicator and its LA representation in each valid entry in the selected congruence class to determine if a hit entry exists in at least one of the caches.
4. A logical address cache arrangement in a data processing system as defined in Claim 2, further comprising:
means for providing to the TLB each LA and for inserting in the TLB the LA and its translated or untranslated real/absolute address in accordance with a processor request for translation, common cache directory miss signal means for generating a miss signal when no common directory entry is found for the indicator and LA representation requested by the processor, TLB output means actuated by a common cache directory miss signal to provide the translated or untranslated real/absolute address, associated with a requested LA, for a main storage access.
5. A logical address cache arrangement in a data processing system as defined in Claim 2, further comprising:

at least one control array associated with the first cache to contain information for determining locations in another cache to receive castouts from locations in the first cache selected for replacement.
6. A logical address cache arrangement in a data processing system as defined in Claim 4, further comprising:
control array addressing means receiving a requested LA from the processor for each new storage request for selecting a congruence class of set-associative entries in the control array.
7. A logical address cache arrangement in a data processing system as defined in Claim 3, further comprising:
at least one control array associated with the first cache to contain information supplementary to the common directory for determining locations in another cache for castouts of the first cache.
8. A logical address cache arrangement in a data processing system as defined in Claim 5, further comprising:
line presence fields (LPFS) in each entry in the common cache directory for indicating whether any part of an associated line in the Nth cache is available in another cache more directly accessible to the processor.
9. A logical address cache arrangement in a data processing system as defined in Claim 8, in which:
each line presence field (LPF) has first subfields for indicating whether or not one or more locations in the other cache contain part(s) of the line in the Nth cache associated with the entry in the common cache directory containing the respective line presence field (LPF).
10. A logical address cache arrangement in a data processing system as defined in Claim 9, in which:

a second subfield is provided with each first subfield in each line presence field (LPF) for indicating the particular location in the other cache that contains an associated part of the line in the Nth cache indicated to exist in the other cache.
11. A logical address cache arrangement in a data processing system as defined in Claim 10, further comprising:
a logical address (LA) field being contained in each entry in the control array for containing a LA for locating a line in another cache to be updated by a cast out of the corresponding line in the first cache if the corresponding line has been changed.
12. A logical address cache arrangement in a data processing system as defined in Claim 11, further comprising:
a change field (CH) being contained in each entry in the control array for indicating any change previously made in the associated line being cast out of the first cache to update a line in another cache located by the LA
field in the same control array entry.
13. A logical address cache arrangement in a data processing system as defined in Claim 11, further comprising:
an exclusive/readonly field (EX) being contained in each entry in the control array for indicating the exclusive/readonly state designated for an associated line in the first cache.
14. A logical address cache arrangement in a data processing system as defined in Claim 11, further comprising:
a bin number field being contained in each entry in the control array to locate a set-associative position in a congruence class located by the LA field in the same control array entry in order to find the line in another cache to be updated by receiving a cast out of the corresponding line in the first cache array if the corresponding line has been changed.
15. A logical address cache arrangement in a data processing system as defined in Claim 4, further comprising:
a synonym directory containing a plurality of entries and being associated with the common directory, a LA field in each synonym directory entry for enabling the locating of a common directory entry which caused the generation of the respective synonym directory entry, means for locating in the synonym directory entry a real/absolute address representation that is the same as a real/absolute address provided for a requested LA, causing a common cache directory miss signal requiring a main storage access, means or generating in the synonym directory a new entry when no synonym entry is found, the new entry having a real/absolute address representation that is the same as the received translated or untranslated real/absolute address associated with the requested LA
that caused a current common cache directory miss signal.
16. A logical address cache arrangement in a data processing system as defined in Claim 15, the locating means further comprising:
synonym directory addressing means for receiving each translated or untranslated real/absolute address (associated with a requested LA) for a main storage access due to a common cache directory miss signal to select a congruence class in the synonym directory which may contain a synonym entry, set associative comparison means for comparing the received translated or untranslated real/absolute address with the real/absolute address representation in each entry in the selected congruence class to find any synonym entry by an equal comparison.
17. A logical address cache arrangement in a multiprocessing (MP) system, including a plurality of data processing systems as defined in Claim 4, all CPUs in the MP system having their caches access data in a common main storage, the MP system further comprising:
a plurality of main storage request registers respectively receiving cache miss requests from the CPUs, each cache miss request including at least a real/absolute address representation and a LA
representation of the request that missed in the respective cache, a plurality of synonym/cross-interrogate (S/XI) directories each containing a plurality of S/XI entries and being associated with the common directory of a respective CPU, each S/XI entry including at least a real/absolute address representation and a LA
representation found in a current entry in the respective cache, a priority circuit receiving the cache miss requests provided to the main storage request registers and priority selecting based on a priority, a received cache miss request of an identified CPU for a S/XI
determination, S/XI search means with each S/XI directory for receiving a real/absolute address representation provided with a priority selected cache miss request for searching the S/XI entries in each S/XI directory for any real/absolute address representation, the S/XI search means providing an unequal signal if no S/XI entry is found or providing S/XI hit signals if a S/XI entry is found, the S/XI hit signals including the LA
representation in the S/XI entry found by a S/XI search, a S/XI bus logic transmitting the LA representation with hit signals to the CPU requesting the S/XI search.
18. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 17, the MP system further comprising:
a synonym signal being provided by the S/XI bus logic to the requesting CPU when the CPU making a request is identified as the CPU associated with the S/XI
directory providing the hit signals found by a S/XI
search.
19. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 17, the MP system further comprising:
a cast out signal being provided by the S/XI bus logic to the requesting CPU when the CPU making a request is identified as not being the CPU associated with the S/XI directory providing the hit entry found by a S/XI
search.
20. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 17, the MP system further comprising:
a cast out signal being provided by the S/XI bus logic to the requesting CPU, when the CPU making a request is identified as not being the CPU associated with the S/XI directory providing the hit entry, and the hit entry identifies exclusive data.
21. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 17, the MP system further comprising:
an invalidate signal being provided by the S/XI bus logic to the requesting CPU when the CPU making a request is identified as not being the CPU associated with the S/XI directory providing the providing the hit entry, and the hit entry identifies readonly data when the current request is for exclusive data.
22. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 17, the MP system further comprising:
a set-associative bin number field and a LA
representation field and a real/absolute address representation field being provided in each of the main storage request registers respectively receiving cache miss requests from the CPUs, each S/XI entry in each synonym/cross-interrogate (S/XI) directory containing a set-associative bin number field and a LA representation field and a real/absolute address representation field, each S/XI entry being associated with the common directory of a respective CPU, the S/XI search means including a S/XI priority register receiving the priority selected output of the priority circuit for searching the S/XI entries in each S/XI directory, the S/XI priority register containing a set-associative bin number field and a LA representation field and a real/absolute address representation field, the bin number in a found S/XI entry identifying a set-associative location in a CPU cache required by a CPU
request currently in the S/XI priority register.
23. A logical address cache arrangement in a multiprocessing (MP) system, as defined in Claim 22, the MP system further comprising:
set associative location selection means of the CPU
identified by hit signals on the S/XI bus to ingate a bin number being transmitted on the bus for selecting a set-associative location in a Nth cache of the CPU, the selected location containing LPF fields for locating any set-associative entry(s) in any faster-access cache for obtaining data lines needed for cast out and/or invalidation, means for casting out and/or invalidating any lines found in the faster-access cache at locations indicated by the LPF fields and then casting out and/or invalidating a line found at the selected location in the Nth cache.
CA000534687A 1986-05-01 1987-04-14 Variable address mode cache Expired - Fee Related CA1283218C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US06/858,322 US4797814A (en) 1986-05-01 1986-05-01 Variable address mode cache
US858,322 1986-05-01

Publications (1)

Publication Number Publication Date
CA1283218C true CA1283218C (en) 1991-04-16

Family

ID=25328033

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000534687A Expired - Fee Related CA1283218C (en) 1986-05-01 1987-04-14 Variable address mode cache

Country Status (4)

Country Link
US (1) US4797814A (en)
EP (1) EP0243724A3 (en)
JP (1) JPS62260248A (en)
CA (1) CA1283218C (en)

Families Citing this family (142)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4985829A (en) * 1984-07-31 1991-01-15 Texas Instruments Incorporated Cache hierarchy design for use in a memory management unit
US5317717A (en) * 1987-07-01 1994-05-31 Digital Equipment Corp. Apparatus and method for main memory unit protection using access and fault logic signals
US4942520A (en) * 1987-07-31 1990-07-17 Prime Computer, Inc. Method and apparatus for indexing, accessing and updating a memory
US5179687A (en) * 1987-09-26 1993-01-12 Mitsubishi Denki Kabushiki Kaisha Semiconductor memory device containing a cache and an operation method thereof
US4996641A (en) * 1988-04-15 1991-02-26 Motorola, Inc. Diagnostic mode for a cache
US5247649A (en) * 1988-05-06 1993-09-21 Hitachi, Ltd. Multi-processor system having a multi-port cache memory
US4965720A (en) * 1988-07-18 1990-10-23 International Business Machines Corporation Directed address generation for virtual-address data processors
US5317716A (en) * 1988-08-16 1994-05-31 International Business Machines Corporation Multiple caches using state information indicating if cache line was previously modified and type of access rights granted to assign access rights to cache line
JPH0290259A (en) * 1988-09-27 1990-03-29 Hitachi Ltd Multiprocessor system
US5261074A (en) * 1988-10-03 1993-11-09 Silicon Graphics, Inc. Computer write-initiated special transferoperation
US5058003A (en) * 1988-12-15 1991-10-15 International Business Machines Corporation Virtual storage dynamic address translation mechanism for multiple-sized pages
US5016168A (en) * 1988-12-23 1991-05-14 International Business Machines Corporation Method for storing into non-exclusive cache lines in multiprocessor systems
EP0389151A3 (en) * 1989-03-22 1992-06-03 International Business Machines Corporation System and method for partitioned cache memory management
JP3038781B2 (en) * 1989-04-21 2000-05-08 日本電気株式会社 Memory access control circuit
US5130922A (en) * 1989-05-17 1992-07-14 International Business Machines Corporation Multiprocessor cache memory system using temporary access states and method for operating such a memory
GB9008145D0 (en) * 1989-05-31 1990-06-06 Ibm Microcomputer system employing address offset mechanism to increase the supported cache memory capacity
JPH0340046A (en) * 1989-07-06 1991-02-20 Hitachi Ltd Cache memory control system and information processor
US5077826A (en) * 1989-08-09 1991-12-31 International Business Machines Corporation Cache performance in an information handling system employing page searching
US5214765A (en) * 1989-08-31 1993-05-25 Sun Microsystems, Inc. Method and apparatus for executing floating point instructions utilizing complimentary floating point pipeline and multi-level caches
US5230070A (en) * 1989-09-08 1993-07-20 International Business Machines Corporation Access authorization table for multi-processor caches
JP2833062B2 (en) * 1989-10-30 1998-12-09 株式会社日立製作所 Cache memory control method, processor and information processing apparatus using the cache memory control method
US5307477A (en) * 1989-12-01 1994-04-26 Mips Computer Systems, Inc. Two-level cache memory system
US5136700A (en) * 1989-12-22 1992-08-04 Digital Equipment Corporation Apparatus and method for reducing interference in two-level cache memories
JPH03216744A (en) * 1990-01-22 1991-09-24 Fujitsu Ltd Built-in cache memory control system
US5261066A (en) * 1990-03-27 1993-11-09 Digital Equipment Corporation Data processing system and method with small fully-associative cache and prefetch buffers
US5197139A (en) * 1990-04-05 1993-03-23 International Business Machines Corporation Cache management for multi-processor systems utilizing bulk cross-invalidate
US5269009A (en) * 1990-09-04 1993-12-07 International Business Machines Corporation Processor system with improved memory transfer means
JPH04230549A (en) * 1990-10-12 1992-08-19 Internatl Business Mach Corp <Ibm> Multilevel cache
GB9024692D0 (en) * 1990-11-13 1991-01-02 Int Computers Ltd Virtual memory system
US5412787A (en) * 1990-11-21 1995-05-02 Hewlett-Packard Company Two-level TLB having the second level TLB implemented in cache tag RAMs
US5249282A (en) * 1990-11-21 1993-09-28 Benchmarq Microelectronics, Inc. Integrated cache memory system with primary and secondary cache memories
JPH04246745A (en) * 1991-02-01 1992-09-02 Canon Inc Memory access system
EP0506236A1 (en) * 1991-03-13 1992-09-30 International Business Machines Corporation Address translation mechanism
EP0508577A1 (en) * 1991-03-13 1992-10-14 International Business Machines Corporation Address translation mechanism
US5293608A (en) * 1991-04-19 1994-03-08 Legent Corporation System and method for optimizing cache memory utilization by selectively inhibiting loading of data
JPH04328657A (en) * 1991-04-30 1992-11-17 Toshiba Corp Cache memory
GB2256512B (en) * 1991-06-04 1995-03-15 Intel Corp Second level cache controller unit and system
JP2839060B2 (en) * 1992-03-02 1998-12-16 インターナショナル・ビジネス・マシーンズ・コーポレイション Data processing system and data processing method
JP2737820B2 (en) * 1992-09-24 1998-04-08 インターナショナル・ビジネス・マシーンズ・コーポレイション Memory access method and system
US5450562A (en) * 1992-10-19 1995-09-12 Hewlett-Packard Company Cache-based data compression/decompression
US5584002A (en) * 1993-02-22 1996-12-10 International Business Machines Corporation Cache remapping using synonym classes
US5689679A (en) * 1993-04-28 1997-11-18 Digital Equipment Corporation Memory system and method for selective multi-level caching using a cache level code
EP0622738B1 (en) * 1993-04-30 1998-08-05 Siemens Nixdorf Informationssysteme Aktiengesellschaft Method for carrying out requests to a multilevel cache memory in a data processing system and cache memory configurated accordingly
US5809525A (en) * 1993-09-17 1998-09-15 International Business Machines Corporation Multi-level computer cache system providing plural cache controllers associated with memory address ranges and having cache directories
US5530832A (en) * 1993-10-14 1996-06-25 International Business Machines Corporation System and method for practicing essential inclusion in a multiprocessor and cache hierarchy
US5539893A (en) * 1993-11-16 1996-07-23 Unisys Corporation Multi-level memory and methods for allocating data most likely to be used to the fastest memory level
JP3169155B2 (en) * 1993-12-22 2001-05-21 インターナショナル・ビジネス・マシーンズ・コーポレ−ション Circuit for caching information
US5870599A (en) * 1994-03-01 1999-02-09 Intel Corporation Computer system employing streaming buffer for instruction preetching
JPH07287668A (en) * 1994-04-19 1995-10-31 Hitachi Ltd Data processor
US5539895A (en) * 1994-05-12 1996-07-23 International Business Machines Corporation Hierarchical computer cache system
US6226722B1 (en) * 1994-05-19 2001-05-01 International Business Machines Corporation Integrated level two cache and controller with multiple ports, L1 bypass and concurrent accessing
US5604889A (en) * 1994-06-15 1997-02-18 Texas Instruments Incorporated Memory management system for checkpointed logic simulator with increased locality of data
US5890221A (en) * 1994-10-05 1999-03-30 International Business Machines Corporation Method and system for offset miss sequence handling in a data cache array having multiple content addressable field per cache line utilizing an MRU bit
US5630087A (en) * 1994-11-02 1997-05-13 Sun Microsystems, Inc. Apparatus and method for efficient sharing of virtual memory translations
US5649155A (en) * 1995-03-31 1997-07-15 International Business Machines Corporation Cache memory accessed by continuation requests
US5680598A (en) * 1995-03-31 1997-10-21 International Business Machines Corporation Millicode extended memory addressing using operand access control register to control extended address concatenation
US5850534A (en) * 1995-06-05 1998-12-15 Advanced Micro Devices, Inc. Method and apparatus for reducing cache snooping overhead in a multilevel cache system
US5778434A (en) 1995-06-07 1998-07-07 Seiko Epson Corporation System and method for processing multiple requests and out of order returns
US5740399A (en) * 1995-08-23 1998-04-14 International Business Machines Corporation Modified L1/L2 cache inclusion for aggressive prefetch
US5758119A (en) * 1995-08-23 1998-05-26 International Business Machines Corp. System and method for indicating that a processor has prefetched data into a primary cache and not into a secondary cache
US5778426A (en) * 1995-10-23 1998-07-07 Symbios, Inc. Methods and structure to maintain a two level cache in a RAID controller and thereby selecting a preferred posting method
US5895487A (en) * 1996-11-13 1999-04-20 International Business Machines Corporation Integrated processing and L2 DRAM cache
US6006210A (en) 1997-03-27 1999-12-21 Pitney Bowes Inc. Mailing machine including dimensional rating capability
US6049849A (en) * 1997-04-14 2000-04-11 International Business Machines Corporation Imprecise method and system for selecting an alternative cache entry for replacement in response to a conflict between cache operation requests
US6026470A (en) * 1997-04-14 2000-02-15 International Business Machines Corporation Software-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels
US5978888A (en) * 1997-04-14 1999-11-02 International Business Machines Corporation Hardware-managed programmable associativity caching mechanism monitoring cache misses to selectively implement multiple associativity levels
US5983322A (en) * 1997-04-14 1999-11-09 International Business Machines Corporation Hardware-managed programmable congruence class caching mechanism
US6312072B1 (en) 1997-05-01 2001-11-06 Pitney Bowes Inc. Disabling a printing mechanism in response to an out of ink condition
US6175899B1 (en) * 1997-05-19 2001-01-16 International Business Machines Corporation Method for providing virtual atomicity in multi processor environment having access to multilevel caches
US6138209A (en) * 1997-09-05 2000-10-24 International Business Machines Corporation Data processing system and multi-way set associative cache utilizing class predict data structure and method thereof
US6192458B1 (en) * 1998-03-23 2001-02-20 International Business Machines Corporation High performance cache directory addressing scheme for variable cache sizes utilizing associativity
US6574698B1 (en) * 1998-04-17 2003-06-03 International Business Machines Corporation Method and system for accessing a cache memory within a data processing system
US6981096B1 (en) * 1998-10-02 2005-12-27 International Business Machines Corporation Mapping and logic for combining L1 and L2 directories and/or arrays
US6324617B1 (en) 1999-08-04 2001-11-27 International Business Machines Corporation Method and system for communicating tags of data access target and castout victim in a single data transfer
US6349367B1 (en) 1999-08-04 2002-02-19 International Business Machines Corporation Method and system for communication in which a castout operation is cancelled in response to snoop responses
US6321305B1 (en) 1999-08-04 2001-11-20 International Business Machines Corporation Multiprocessor system bus with combined snoop responses explicitly cancelling master allocation of read data
US6343347B1 (en) 1999-08-04 2002-01-29 International Business Machines Corporation Multiprocessor system bus with cache state and LRU snoop responses for read/castout (RCO) address transaction
US6353875B1 (en) 1999-08-04 2002-03-05 International Business Machines Corporation Upgrading of snooper cache state mechanism for system bus with read/castout (RCO) address transactions
US6343344B1 (en) 1999-08-04 2002-01-29 International Business Machines Corporation System bus directory snooping mechanism for read/castout (RCO) address transaction
US6502171B1 (en) * 1999-08-04 2002-12-31 International Business Machines Corporation Multiprocessor system bus with combined snoop responses explicitly informing snoopers to scarf data
US6338124B1 (en) 1999-08-04 2002-01-08 International Business Machines Corporation Multiprocessor system bus with system controller explicitly updating snooper LRU information
US6629207B1 (en) 1999-10-01 2003-09-30 Hitachi, Ltd. Method for loading instructions or data into a locked way of a cache memory
US6553460B1 (en) 1999-10-01 2003-04-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6598128B1 (en) 1999-10-01 2003-07-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6412043B1 (en) 1999-10-01 2002-06-25 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6772325B1 (en) * 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6339813B1 (en) * 2000-01-07 2002-01-15 International Business Machines Corporation Memory system for permitting simultaneous processor access to a cache line and sub-cache line sectors fill and writeback to a system memory
US6851038B1 (en) * 2000-05-26 2005-02-01 Koninklijke Philips Electronics N.V. Background fetching of translation lookaside buffer (TLB) entries
US7073044B2 (en) * 2001-03-30 2006-07-04 Intel Corporation Method and apparatus for sharing TLB entries
US6728858B2 (en) * 2001-03-30 2004-04-27 Intel Corporation Method and apparatus including heuristic for sharing TLB entries
AU2002357420A1 (en) * 2002-12-30 2004-07-22 Intel Corporation Cache victim sector tag buffer
US7254681B2 (en) * 2003-02-13 2007-08-07 Intel Corporation Cache victim sector tag buffer
JP3936672B2 (en) * 2003-04-30 2007-06-27 富士通株式会社 Microprocessor
CN100484148C (en) * 2003-09-29 2009-04-29 株式会社日立制作所 Information terminals, information sharing method and p2p system and point system using the same
US7093100B2 (en) * 2003-11-14 2006-08-15 International Business Machines Corporation Translation look aside buffer (TLB) with increased translational capacity for multi-threaded computer processes
WO2005078586A2 (en) * 2004-02-09 2005-08-25 Continental Teves Ag & Co. Ohg Method and device for analyzing integrated systems for critical safety computing systems in motor vehicles
US7293157B1 (en) * 2004-11-24 2007-11-06 Sun Microsystems, Inc. Logically partitioning different classes of TLB entries within a single caching structure
US7930514B2 (en) * 2005-02-09 2011-04-19 International Business Machines Corporation Method, system, and computer program product for implementing a dual-addressable cache
US7305522B2 (en) * 2005-02-12 2007-12-04 International Business Machines Corporation Victim cache using direct intervention
US7788642B2 (en) * 2005-05-16 2010-08-31 Texas Instruments Incorporated Displaying cache information using mark-up techniques
US7616218B1 (en) * 2005-12-05 2009-11-10 Nvidia Corporation Apparatus, system, and method for clipping graphics primitives
US20070271421A1 (en) * 2006-05-17 2007-11-22 Nam Sung Kim Reducing aging effect on memory
US8352709B1 (en) 2006-09-19 2013-01-08 Nvidia Corporation Direct memory access techniques that include caching segmentation data
US8347064B1 (en) 2006-09-19 2013-01-01 Nvidia Corporation Memory access techniques in an aperture mapped memory space
US8543792B1 (en) 2006-09-19 2013-09-24 Nvidia Corporation Memory access techniques including coalesing page table entries
US8601223B1 (en) 2006-09-19 2013-12-03 Nvidia Corporation Techniques for servicing fetch requests utilizing coalesing page table entries
US8707011B1 (en) 2006-10-24 2014-04-22 Nvidia Corporation Memory access techniques utilizing a set-associative translation lookaside buffer
US8700883B1 (en) * 2006-10-24 2014-04-15 Nvidia Corporation Memory access techniques providing for override of a page table
US8706975B1 (en) 2006-11-01 2014-04-22 Nvidia Corporation Memory access management block bind system and method
US8347065B1 (en) * 2006-11-01 2013-01-01 Glasco David B System and method for concurrently managing memory access requests
US8607008B1 (en) 2006-11-01 2013-12-10 Nvidia Corporation System and method for independent invalidation on a per engine basis
US8533425B1 (en) 2006-11-01 2013-09-10 Nvidia Corporation Age based miss replay system and method
US8504794B1 (en) 2006-11-01 2013-08-06 Nvidia Corporation Override system and method for memory access management
US8700865B1 (en) 2006-11-02 2014-04-15 Nvidia Corporation Compressed data access system and method
WO2008053053A1 (en) * 2006-11-03 2008-05-08 Intel Corporation Reduction of effect of ageing on registers
US8347037B2 (en) * 2008-10-22 2013-01-01 International Business Machines Corporation Victim cache replacement
US8209489B2 (en) * 2008-10-22 2012-06-26 International Business Machines Corporation Victim cache prefetching
US8225045B2 (en) * 2008-12-16 2012-07-17 International Business Machines Corporation Lateral cache-to-cache cast-in
US8499124B2 (en) * 2008-12-16 2013-07-30 International Business Machines Corporation Handling castout cache lines in a victim cache
US8117397B2 (en) * 2008-12-16 2012-02-14 International Business Machines Corporation Victim cache line selection
US8489819B2 (en) * 2008-12-19 2013-07-16 International Business Machines Corporation Victim cache lateral castout targeting
US8364898B2 (en) * 2009-01-23 2013-01-29 International Business Machines Corporation Optimizing a cache back invalidation policy
US8949540B2 (en) * 2009-03-11 2015-02-03 International Business Machines Corporation Lateral castout (LCO) of victim cache line in data-invalid state
US8285939B2 (en) * 2009-04-08 2012-10-09 International Business Machines Corporation Lateral castout target selection
US8327073B2 (en) * 2009-04-09 2012-12-04 International Business Machines Corporation Empirically based dynamic control of acceptance of victim cache lateral castouts
US8347036B2 (en) * 2009-04-09 2013-01-01 International Business Machines Corporation Empirically based dynamic control of transmission of victim cache lateral castouts
US8312220B2 (en) * 2009-04-09 2012-11-13 International Business Machines Corporation Mode-based castout destination selection
EP3287187A1 (en) 2009-08-04 2018-02-28 CO2 Solutions Inc. Process for co2 capture using carbonates and biocatalysts
US9189403B2 (en) * 2009-12-30 2015-11-17 International Business Machines Corporation Selective cache-to-cache lateral castouts
US8677050B2 (en) * 2010-11-12 2014-03-18 International Business Machines Corporation System, method and computer program product for extending a cache using processor registers
US10146545B2 (en) 2012-03-13 2018-12-04 Nvidia Corporation Translation address cache for a microprocessor
US9880846B2 (en) 2012-04-11 2018-01-30 Nvidia Corporation Improving hit rate of code translation redirection table with replacement strategy based on usage history table of evicted entries
US10241810B2 (en) 2012-05-18 2019-03-26 Nvidia Corporation Instruction-optimizing processor with branch-count table in hardware
US20140082252A1 (en) * 2012-09-17 2014-03-20 International Business Machines Corporation Combined Two-Level Cache Directory
US20140189310A1 (en) 2012-12-27 2014-07-03 Nvidia Corporation Fault detection in instruction translations
US10108424B2 (en) 2013-03-14 2018-10-23 Nvidia Corporation Profiling code portions to generate translations
US9753883B2 (en) * 2014-02-04 2017-09-05 Netronome Systems, Inc. Network interface device that maps host bus writes of configuration information for virtual NIDs into a small transactional memory
US10387314B2 (en) * 2015-08-25 2019-08-20 Oracle International Corporation Reducing cache coherence directory bandwidth by aggregating victimization requests
US10698836B2 (en) 2017-06-16 2020-06-30 International Business Machines Corporation Translation support for a virtual cache
US10606762B2 (en) * 2017-06-16 2020-03-31 International Business Machines Corporation Sharing virtual and real translations in a virtual cache
US10831664B2 (en) 2017-06-16 2020-11-10 International Business Machines Corporation Cache structure using a logical directory
CN115495392B (en) * 2022-11-17 2023-03-24 深圳市楠菲微电子有限公司 Memory multiplexing method and device in multi-stage starting, storage medium and processor

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5072542A (en) * 1973-10-30 1975-06-16
US4070706A (en) * 1976-09-20 1978-01-24 Sperry Rand Corporation Parallel requestor priority determination and requestor address matching in a cache memory system
JPS5849945B2 (en) * 1977-12-29 1983-11-08 富士通株式会社 Buffer combination method
US4322795A (en) * 1980-01-24 1982-03-30 Honeywell Information Systems Inc. Cache memory utilizing selective clearing and least recently used updating
EP0051745B1 (en) * 1980-11-10 1988-01-27 International Business Machines Corporation Cache storage hierarchy for a multiprocessor system
US4456954A (en) * 1981-06-15 1984-06-26 International Business Machines Corporation Virtual machine system with guest architecture emulation using hardware TLB's for plural level address translations
JPS586570A (en) * 1981-07-02 1983-01-14 Nec Corp Buffer memory device
US4464712A (en) * 1981-07-06 1984-08-07 International Business Machines Corporation Second level cache replacement method and apparatus
JPS5898893A (en) * 1981-12-09 1983-06-11 Toshiba Corp Information processing device
US4484267A (en) * 1981-12-30 1984-11-20 International Business Machines Corporation Cache sharing control in a multiprocessor
US4654790A (en) * 1983-11-28 1987-03-31 Amdahl Corporation Translation of virtual and real addresses to system addresses

Also Published As

Publication number Publication date
JPH0555898B2 (en) 1993-08-18
EP0243724A2 (en) 1987-11-04
JPS62260248A (en) 1987-11-12
EP0243724A3 (en) 1990-05-23
US4797814A (en) 1989-01-10

Similar Documents

Publication Publication Date Title
CA1283218C (en) Variable address mode cache
EP0144121B1 (en) Virtually addressed cache
US4731739A (en) Eviction control apparatus
US5426750A (en) Translation lookaside buffer apparatus and method with input/output entries, page table entries and page table pointers
US3723976A (en) Memory system with logical and real addressing
US6216214B1 (en) Apparatus and method for a virtual hashed page table
KR920005280B1 (en) High speed cache system
US6772316B2 (en) Method and apparatus for updating and invalidating store data
US5265232A (en) Coherence control by data invalidation in selected processor caches without broadcasting to processor caches not having the data
US5526504A (en) Variable page size translation lookaside buffer
KR920004400B1 (en) Virtical computing system
US6658538B2 (en) Non-uniform memory access (NUMA) data processing system having a page table including node-specific data storage and coherency control
US6493812B1 (en) Apparatus and method for virtual address aliasing and multiple page size support in a computer system having a prevalidated cache
US6874077B2 (en) Parallel distributed function translation lookaside buffer
US5003459A (en) Cache memory system
JPH03142644A (en) Cache memory control system
EP0327798A2 (en) Control method and apparatus for zero-origin data spaces
JPH0683711A (en) Data processing system and method of data processing
US6298411B1 (en) Method and apparatus to share instruction images in a virtual cache
US20020156989A1 (en) Method for sharing a translation lookaside buffer between CPUs
EP0468804A2 (en) Hierarchical memory control system
US5341485A (en) Multiple virtual address translation per computer cycle
US6574698B1 (en) Method and system for accessing a cache memory within a data processing system
JPH0769867B2 (en) Directory look-up table logic for virtual storage systems
JP2000339221A (en) System and method for invalidating entry of conversion device

Legal Events

Date Code Title Description
MKLA Lapsed