WO1997011419A2 - Synchronous multi-port random access memory - Google Patents

Synchronous multi-port random access memory Download PDF

Info

Publication number
WO1997011419A2
WO1997011419A2 PCT/US1996/014311 US9614311W WO9711419A2 WO 1997011419 A2 WO1997011419 A2 WO 1997011419A2 US 9614311 W US9614311 W US 9614311W WO 9711419 A2 WO9711419 A2 WO 9711419A2
Authority
WO
WIPO (PCT)
Prior art keywords
memory
arrays
ports
array
data
Prior art date
Application number
PCT/US1996/014311
Other languages
French (fr)
Other versions
WO1997011419A3 (en
Inventor
Tom North
Francis Siu
Original Assignee
Shablamm Computer, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shablamm Computer, Inc. filed Critical Shablamm Computer, Inc.
Publication of WO1997011419A2 publication Critical patent/WO1997011419A2/en
Publication of WO1997011419A3 publication Critical patent/WO1997011419A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0607Interleaved addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0853Cache with multiport tag or data arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/30Providing cache or TLB in specific location of a processing system
    • G06F2212/304In main memory subsystem
    • G06F2212/3042In main memory subsystem being part of a memory device, e.g. cache DRAM

Definitions

  • This invention relates to a memory for single processor and multiple processor computers, and, more particularly, to crossbar interleaving between processors, memory, and local bus and within the memory.
  • PCs, workstations, and servers bottleneck when using main memory because there is only one path to main memory, which is slow.
  • main memory In many non-blocking secondary cache architecture of "high" performance systems, the main memory is the bottleneck that limits system performance, especially in multi-processor systems.
  • Recent systems, such as multi-media systems and shared memory video, require data transfers of increasingly greater amounts of data. The requirements for data transfers requires new memory architecture.
  • DRAM dynamic random access memory
  • VRAM video random access memory
  • RAMBus Extended Data Out (EDO), Burst EDO, Synchronous DRAM (SDRAM), such as manufactured by Micron or Samsung, CDRAM by Mitsubishi, EDRAM by
  • RAMTRON RAMTRON
  • Multi-bank such as manufactured by Mosys.
  • a synchronous multi-port dynamic random access memory couples main memory directly to at least one central processing unit (CPU), a video accelerator, or at least one input/ output (I/O) processor, or a combination thereof.
  • the SMPDRAM provides a direct port for each of these devices and provides a higher performance implementation of the multi-bank interleave protocol in U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference, to reduce contention and enhance performance.
  • the crossbar of the SMPDRAM is incorporated into the main memory chip.
  • the memory chip includes a direct interface to the CPU without intervening logic or a chip set.
  • the memory chip is reconfigurable with the
  • each CPU or processor can access memory at the same time as opposed to each having its own memory that may need to be
  • FIGs. 1a, 1b, 1c, 1d, and 1e are pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory, respectively.
  • FIG. 2 is a pictorial diagram illustrating a memory chip in accordance with the present invention.
  • FIG. 3 is a block diagram illustrating a memory array of the memory chip of FIG.2.
  • FIG. 4 is a block diagram illustrating a memory subarray of the memory chip of FIG.2.
  • FIGs. 5a, 5b, and 5c are block diagrams illustrating one, two and four single in-line memory module systems.
  • FIG. 6 is a block diagram illustrating a personal computer system.
  • FIG. 7 is a block diagram illustrating a dual CPU computer system.
  • FIG. 8 is a block diagram illustrating a quad CPU system.
  • FIG. 9 is a flowchart illustrating the reading of data from the memory chip using multi-array interleaving.
  • FIG. 10 is a flowchart illustrating the writing of data into the memory chip using interleaving.
  • FIG. 11 is a block diagram illustrating an interface between a single in-line memory module and a motherboard.
  • FIG. 12 is a block diagram illustrating the data bus/socket connections of the single in-line modules in a memory of a two processor system.
  • FIG. 13 is a block diagram illustrating the data bus /socket connections of the single in-line modules in a memory of a three or four processor system.
  • FIG. 1a, 1b, 1c, 1d, and 1e there are shown pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory,
  • the conventional non-interleaved memory comprises memory arrays 100-0 through 100-7 that are addressed by dividing the address space into equal consecutive blocks of addresses and assigning the blocks to the memory arrays 100.
  • the memory array 100-0, the memory array 100-1, through memory array 100-7 are addressed as 0-1M, 1-2M, through 7-8M, respectively, for an 8M memory.
  • memory arrays 102-0 through 102-7 are addressed by dividing the address space into pages and sequentially assigning the pages to a memory array.
  • the memory array 102-0 is assigned addresses 0-127, 256-383, 512-639, through 2096896-2097023;
  • the memory array 102-1 is assigned addresses 128-255, 384-511, 640-767, through 2097024-2097151;
  • the memory array 102-7 is assigned addresses 6291584-6291711, 6292608-6292735, 6293632-6293759 through 8388480-8388607.
  • the memory arrays 102-0 through 102-7 are organized on a single cache line basis. Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning
  • the memory array 102-0 is assigned addresses 0-3, 32-35, 64-67, through 8388576-8388579; the memory array 102-1 is assigned addresses 4-7, 36-39, 68-71 through 8388580-8388583; and the memory array 102-7 is assigned addresses 28-31, 60-63, 92-95 through 8388604-8388607.
  • the memory arrays 102-0 through 102-7 are organized on a double cache line basis.
  • Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 8 words per cache line, the memory array 102-0 is assigned addresses 0-7, 64-71, 128-135 through 8388544-8388551; the memory array 102-1 is assigned addresses 8-15, 72-79, 136-143 through 8388552-8399559; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607.
  • the memory arrays 102-0 through 102-7 are organized on a quad cache line basis.
  • Memory arrays 102-1 through 102-8 are addressed by dividing the address space on a 4 cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 16 words per cache line, the memory array 102-0 is assigned addresses 0-15, 128-143, 192-207 through 8388480-8388495; the memory array 102-1 is assigned addresses 16-31, 144-159, 208-223 through 8388496-8388511; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607.
  • This organization of the memory 100 reduces memory array contention in multiple central processing unit (CPU) multi-threaded applications in which each CPU, because of locality, may commonly be accessing the same memory array while running the same application.
  • An interleaved architecture evenly spreads the addressing of the application across the memory arrays to reduce the likelihood of both CPUs accessing the same memory array.
  • the interleave pattern is adjustable based on the type of operating system and the type of application being executed by the system.
  • the interleaving may be the interleaving described in the U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference.
  • FIG. 2 there is shown a pictorial diagram illustrating a memory chip 200 in accordance with the present invention.
  • FIG. 3 there is shown a block diagram illustrating a memory array of the memory chip 200.
  • FIG. 4 there is shown a block diagram illustrating a memory subarray of the memory chip 200.
  • the architecture of the memory chip 200 is described for a synchronous multi-port dynamic random access memory (SMPDRAM). However, the architecture may be applied to other types of RAM, such as Static Random Access Memory (SRAM) or Flash RAM.
  • SRAM Static Random Access Memory
  • Flash RAM Flash RAM
  • the memory chip may be organized as a 64 Mb DRAM organized with eight 8-bit or 9-bit ports.
  • the ports can be grouped into four 16-bit ports, two 32-bit ports or one 64-bit port.
  • the memory chip 200 uses the multi-array interleave protocol described above in conjunction with FIGs. 1b through le for reducing the timeline losses incurred when two processors attempt to access the same memory array.
  • each processor ties up a memory array for a shorter time and then releases the memory array for another processor that may be waiting.
  • the access by one CPU is delayed.
  • the delay is reduced because the first CPU which gained access to the array most likely moves to the next array if the next data that it seeks is sequential to the first.
  • a non-interleaved architecture where the first CPU ties up the same bank extends the delay of access of the second CPU as long as the first CPU continues to access sequential information in the same array.
  • the memory chip 200 includes a plurality of bi-directional input/output (I/O) ports 201-0 through 201-7, an I/O bus bar 207, a crossbar link 209, a plurality of cache selectors 222-0 through 222-7, a plurality of embedded caches 204-0 through 204-7, a plurality of crossbar switches 206-0 through 206-7, a plurality of sense amplifiers 214-0 through 214-7, a plurality of memory arrays 208-0 through 208-7, and a reprogrammable controller 212.
  • the memory chip 200 has eight I/O ports 201 and eight memory arrays 208.
  • Each memory array 208-0 through 208-7 is coupled to a respective error checking and correction (ECC) circuit 210-0 through 210-7.
  • ECC error checking and correction
  • the memory chip 200 supports error checking and correction for CPUs without direct ECC support.
  • PCs personal computers
  • the ECC in the memory chip 200 corrects defects and therefore increases manufacturing yield and lowering
  • the memory chip 200 also includes row decoders 216-0 through 216-3.
  • Each of the bi-directional I/O ports 201-0 through 201-7 has a register 202-0 through 202-7, respectively.
  • the I/O bus bar 207 couples each register 202 to each of the plurality of cache selectors 222 via the crossbar link 209.
  • each cache selector 222-0 through 222-7 receives 8/9 bits from each register 202-0 through 7.
  • An I/O controller 242 provides control signals to the registers 202 for controlling the transfer of data between the ports 201 and the memory arrays 208 in response to control signals from the reprogrammable controller 212.
  • each array 208 comprises a plurality of subarrays 308-0 through 308-7.
  • Each of the cache selectors 222-0 through 222-7 comprises subcache selectors 302-0 through 302-15 for controlling the transfer of data between the crossbar link 209 and a respective one of the caches 204.
  • the cache selector 222 may be, for example, a plurality of pass transistors that couple one bit from the crossbar link 209 to a subcache 304.
  • Each cache 204 comprises a plurality of subcaches 304-0 through 304-8 for storing data being transferred between the memory and the ports 201
  • Each crossbar switches 206 comprises a plurality of crossbar switches 306-0 through 306-15 for selectively coupling the subcaches 302 to a respective memory subarray 308.
  • the ECC circuit 210 comprises a plurality of ECC circuits 310-0 through 310-7. Each of the plurality of ECC circuits 310-0 through 310-7 provides error checking and correction for a corresponding pair of crossbar switches 306-0 through 306-7.
  • Each column of the subarray has a corresponding one of a plurality of sense amplifiers 314. Data is communicated to memory cells in the memory subarrays 308 per addressing described below.
  • the memory chip 200 receives through an interface port 224, array, and control signal configuration information for
  • the interface port 224 is preferably a JTAG port.
  • the control signal information may be configured for specific processors, such as one family of processors, and for selecting the signal configuration of the data, such as the voltage levels of the I/O signals, e.g. low voltage transistor- transistor logic (LVTTL) or Enhanced Gunning Logic (GTL+).
  • the reprogrammable controller 212 provides a separate byte enable (BE) signal 223 to each port 201 for writes.
  • the reprogrammable controller 212 provides a separate ready (BRDY) signal 226, which is programmable to be associated with any one port 201, as part of the configuration information at power up.
  • the reprogrammable controller 212 receives address (A0-A24) signals 228 for addressing the memory arrays 208 in response thereto and in accordance with the array control information provided at power up. Such addressing is described in greater detail below.
  • the reprogrammable controller 212 provides the address signals to the row decoders 216-0 through 216-3 for selecting rows of the memory arrays 208.
  • the reprogrammable controller 212 provides selection signals to array /cache line selectors 218-0 through 218-3 for enabling the selective coupling of the crossbar switches 206 to the cache selectors 222.
  • Port identification (ID) signals 230 program the reprogrammable controller 212 to define a port number (such as port 0 through port 7) of the ports 201-0 through 201-7.
  • each memory chip 200 preferably operates identically so that all memory chips 200 in a bank make identical arbitration choices. Consequently, the memory chip 200 gives priority based on the port number. For example, Port 0 has the highest priority and Port 7 has the lowest priority. This allows the processors to be connected in order. The programmability of the priority allows user tuning.
  • a clock (Clk) signal 233 provides timing control for read and write cycles.
  • a pair of select (SEL) signals 240 provides an identification of the memory chip 200 for addressing as described below in conjunction with FIG. 6.
  • the memory chip 200 has an interface for receiving control signals.
  • the control signals include: ADS, CACHE, M/IO, D/C, and W/R.
  • the reprogrammable controller 212 can configure the memory arrays 208 and the I/O ports 201 in any of a number of possible configurations.
  • the format of n 1 ⁇ n 2 /n 2 ' ⁇ n 3 ⁇ n 4 is used to indicate a configuration having n 1 ports, a port width of n2 bits (or n 2 ' bits if no parity is used), n 3 arrays, and an array depth of n 4 bits.
  • the configuration may be 2 ⁇ 32/36 ⁇ 8 ⁇ 256Kb as described below in conjunction with FIG. 6; the configuration may be 4 ⁇ 16/18 ⁇ 8 ⁇ 512Kb as described below in conjunction with FIG. 7; the
  • configuration may be 8 ⁇ 8/9 ⁇ 8 ⁇ 1Mb as described below in conjunction with FIG. 8; or the configuration may be 1 ⁇ 64/72 ⁇ 4 ⁇ 256Kb (not shown). If the memory chip 200 is configured as 2 ⁇ 32/36 ⁇ 8 ⁇ 256Kb, for example, four ports access the selected array. In the 1 ⁇ 64/72 ⁇ 4 ⁇ 256Kb configuration, two memory arrays are accessed in parallel.
  • Control signals can be tailored for specific processors.
  • the configuration of the memory chip 200 shown in FIG. 2 is a default configuration and is compatible with the X86 family of processors manufactured by Intel Corporation of Santa Clara, California, has 8 bit wide ports, has a 2 ⁇ 32 ⁇ 8 ⁇ 256K organization, has ECC, and has a single cache line interleave protocol.
  • the size of the memory can be incrementally expanded by adding memory chips 200 in parallel to other memory chips 200.
  • One such embodiment is shown in FIG. 6.
  • the memory chip 200 may be used as a conventional 64-bit wide memory to provide memory increments of 8MB or (as in FIG. 6 below) 2 ⁇ 32-bits wide utilizing the crossbar 206 to separate I/O and CPU accesses.
  • a 2 ⁇ 32/36-bit wide configuration provides less loading but has memory increments of 16MB.
  • the memory chip 200 is used as a 4 ⁇ 16/18 memory, memory increments are 32MB, and, if used as a 8 ⁇ 8 memory, memory increments are 64MB.
  • Each memory array 208 has a plurality of memory cells (not shown) typically connected in rows and columns.
  • the memory cells are, for example, conventional dynamic random access memory cells. For example, for a memory of 8 arrays, the cells may be connected in 8K rows and 1,152 columns.
  • the columns of the cells are interlaced so that the error checking and correction circuit 210 detects and corrects any single defect in the memory array 208 even ones that affect adjacent memory cells which will be in different interlace groups.
  • the column sense amplifiers 214 are selectively connected to the crossbar switches 206 for distributing the column data to the caches 204.
  • the memory array 208 has three sections: a data section, an ECC section, and a hybrid section that is used for ECC if configured for ECC, or is used for additional bits per port if configured without ECC.
  • the memory array may have 8K ⁇ 1K for data, and 8K ⁇ 128 that is used for ECC or, if configured without ECC, is the 9th bit per port.
  • the arrays 208 are interlaced to ensure that no single defect or alpha hit causes an ECC failure. So, for example, for a two way interlaced array, the columns are divided into two groups each with its own ECC data bits. The two groups span alternating columns. So, the columns might be labeled: A0, B0, A1, B1, A2, B2, A3, B3, and so forth.
  • Each ECC circuit 210-0 through 210-7 has a conventional ECC generator for writes and a conventional checker for reads.
  • the ECC circuit 210 corrects a single bit error and detects a double bit error.
  • the ECC circuit 210 therefore checks ECC during an array read and generates ECC during writes. ECC failures are reported via the interface port 224.
  • a subarray 308 comprises a plurality of column groups 401-0 through 401-17.
  • each sub array is divided into 144 columns.
  • FIG. 4 shows the subarray 308-0 having columns 0 through 143, the crossbar switches 306-0 and 306-1, the caches 304-0 and 304-1, and the cache selectors 302-0 and 302-1.
  • the other subarrays 308, crossbar switches 306, caches 304, and cache selectors 302 have identical architecture.
  • Each I/O port 201-0 through 201-7 is coupled through a respective interconnect group 402-0 through 402-7 of the crossbar link 209 to each of the cache selectors 302-0 and 302-1 for selective coupling to a respective cache 304-0 and 304-1.
  • Each cache 304 comprises subcaches 404-0 through 404-9.
  • Each cache 304 is 4 words deep ⁇ 36 bits wide and can store at least four cache lines of a processor in the X86 family of processors manufactured by Intel Corporation of Santa Clara, California.
  • Each cache 304 can post data for writes until the memory array 208 is available or prefetch the next consecutive cache line for reads.
  • Each subcache 404 has an associated tag used by the reprogrammable controller 212 to determine if there is a cache hit.
  • Each cache selector 302 selectively couples the interconnect groups, and thus the I/O ports 201-0 through 201-7, to the subcaches 404.
  • the pair of crossbar switches 306-0 and 306-1 comprises crossbar switches 406-0 through 406-17 for selectively coupling the subcaches 404-0 through 404-8 of both caches 304-0 and 304-1 through the ECC circuit 310-0 to the sense amplifiers 314.
  • the size of the cache is a trade off of the economics of a smaller cache versus the storage capacity of a larger cache.
  • the greater number of cache lines the less likely that an array access is required and consequently a page miss is risked.
  • the cache size can be altered by changing the number of columns in each array. A larger cache requires more columns and consequently fewer rows.
  • To access data in an array 208 first its row (or "page") is selected. The data is sensed and latched in the corresponding column sense amplifiers 214. While the error checking and correction circuit 210 checks the data for each row of the memory array 208 as described earlier herein, the data in the addressed columns is routed via the crossbar switch 206 to the appropriate cache 204. The data goes through the cache select 222 of the port, through the link 209, to the I/O bus bar 207, through the I/O port 201, and to the I/O. This avoids having to access the array 208 and incur a potential page miss penalty if the array 208 has subsequently been accessed by another processor to another page.
  • the crossbar switches 206 also facilitate SNARFing, in which one CPU can be reading the data that another CPU is writing to the array.
  • one of the memory arrays 208 is linked to both one of the ports 201-0 through 201-7 where the data is being written and at least one of the ports 201-0 through 201-7 where the data is to be read. Similarily, data can be transferred from one port to another such as when a CPU accesses I/O directly.
  • the columns of the memory arrays 208 are grouped, for example, in groups of 8. Each column group is connected to the crossbar switch, which selectively connects the columns to the caches 204 responsive to the array and control configuration
  • the crossbar switch is an 8 x 8 switch.
  • a port 201 is connected to an array 208, based on cache select bits, described below in conjunction with Tables I through IV, the A0 and A1 address signals 228, and, if a burst, the cache line interleave protocol (linear or gray scale), one of the 16 subcaches 300 for that port 201 are connected through its cache select 222 to the I/O bus bar 207 of the port 202 for each cycle, until the transaction is complete.
  • Each subcache has one bit for each I/O bit. Consequently, it takes 4 subcaches to supply a cache line and 4 caches lines can be cached in each array for each port or group of ports at a time.
  • the interleave protocol may provide, for example, single, double, or quad interleaving of 4, 8 or 16 words cached, respectively.
  • the number in the parentheses, ( ), equals the number of words cached; the A0-A1 address signals 228 are used in nonburst operations to select an individual word within a cache line.
  • the distribution of cache lines forms the pattern shown in Table II.
  • the most significant address (A H ) is the A22 address signal 228.
  • the memory arrays 208 may be split into two groups working together; each group has its own interleave pattern. Each group of memory arrays 208 has
  • the memory arrays 208 may be split into a first group of two arrays 208-0 through 208-1 for shared memory video and a second group of six arrays 208-2 through 208-7 for main memory.
  • the six arrays 208 interleave among themselves and the two arrays interleave among themselves.
  • the possible groupings are: 2/6, 3/5, 4/4. However, only the 4/4 or 8/0 grouping is used if there are more than one bank of chips.
  • the grouping affects the array and cache line selects. Each grouping has its own unique decoding to ensure no two consecutive cache lines are in the same array and to simplify the decoding.
  • the cache lines and array are addressed in 4 array interleaving as shown in Table III.
  • a H is the most significant address and A L is the least significant address.
  • the address selection signals are defined as:
  • CLS 1 A H /A H-1 +A L+1 A L +A H A L ;
  • the AS signals are inverted.
  • the address selection signals are defined as:
  • AS 2 A L+1 A L /A H +A H A H-1 +A H A H-2 ;
  • AS 1 /A H /A L+1 +A H /A H-1 /A L A H-2 +A H A L A H-1 +A H A H-1 /A H-2 ;
  • AS 0 /A L+1 /A L /A H + /A H /A L A H-1 + A H /A L + A H /A H-1 A H-2 ;
  • CLS 1 A H-1 A H-2 + A H-1 /A H-2 /A L + /A H-1 /A H-2 A L ;
  • the AS signals are inverted.
  • FIGs. 5a, 5b, and 5c there are shown block diagrams illustrating one, two and four single in-line memory module systems 500, 501, and 502,
  • the system 500 comprises a pair of processors 504-0 and 504-1, and a single in line memory module 506-0 which comprises synchronous multi-port dynamic random access memories (SMPDRAMs) 508-0 through 508-3.
  • the module 506 may be the SIMM 1102 described below in conjunction with FIG. 11.
  • the connections of the system 500 may be as described below in conjunction with FIG. 12.
  • a data bus 510 of the processor 504-0 is divided into groups 510-0 through 510-3, each group having a predetermined number of bits.
  • a data bus 512 of the processor 504-1 is divided into groups 512-0 through 512-3, each group having a predetermined number of bits.
  • the groups 510-0 through 510-3 and 512-0 through 512-3 preferably each include the same bits of the respective data bus 510 and 512.
  • Each of the groups 510-0 through 510-3 is coupled to a respective SMPDRAM 508-0 through 508-3; and similarly each of the groups 512-0 through 512-3 is coupled to a respective SMPDRAM 508-0 through 508-3.
  • the system 501 comprises a pair of processors 504-0 and 504-1, and a pair of single in line memory modules 506-0 and 506-1, each module 506 comprising SMPDRAMs 508-0 through 508-3.
  • the module 506 may be the SIMM 1102 described below in conjunction with FIG. 11.
  • the connections of the system 501 may be as described below in conjunction with FIG. 12.
  • a data bus 514 of the processor 504-0 is divided into groups 514-0 through 514-7, each group having a predetermined number of bits.
  • a data bus 516 of the processor 504-1 is divided into groups 516-0 through 516-7, each group having a predetermined number of bits.
  • the groups 514-0 through 514-7 and 516-0 through 516-7 preferably each include the same bits of the respective data bus 514 and 516.
  • the group 514-0, 514-2, 514-4, 514-6 and the group 516-0, 516-2, 516-4, 516-6 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-1.
  • the group 514-1, 514-3, 514-5, 514-7 and the group 516-1, 516-3, 516-5, 516-7 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-0.
  • the system 502 comprises a pair of processors 504-0 and 504-1, and four single in line memory modules 506-0 through 506-3, each module 506 comprising SMPDRAMs 538-0 through 538-3.
  • a data bus 524 of the processor 504-0 is divided into groups 524-0 through 524-7, each group having a predetermined number of bits.
  • a data bus 526 of the processor 504-1 is divided into groups 526-0 through 526-7, each group having a predetermined number of bits.
  • the groups 524-0 through 524-7 and 526-0 through 526-7 preferably each include the same bits of the respective data bus 524 and 526.
  • the group 524-0, 524-2, 524-4, 524-6 and the group 526-0, 526-2, 526-4, 526-6 each are coupled to both
  • SMPDRAMs 528-2 and 528-3 of the modules 536-0 through 536-3 The group 524-1, 524-3, 524-5, 524-7 and the group 526-1, 526-3, 526-5, 526-7 each are coupled to both SMPDRAMs 528-0 and 528-1 of the modules 536-0 through 536-3.
  • FIG. 6 there is shown a block diagram illustrating a personal computer (PC) system 600 having a memory 602 organized in a 2x32/36 configuration, a central processing unit (CPU) 604, and an I/O processor 606.
  • the memory 602 includes banks 608-0 through 608-3, each bank 608 comprising memory chips 200.
  • each bank is 64-bits wide.
  • Two additional address lines (A23-A24) that match the two select (SEL) pins 240 of the memory chips 200 provide the unique address for each bank 608 of chips 200. With these additional address lines, up to four banks of chips can be accommodated without additional external decoding.
  • FIG. 7 there is shown a block diagram illustrating a dual CPU computer system 700.
  • a dual CPU computer system 700 Such a system may be used for a personal computer or workstation.
  • An I/O bus 702 connects CPUs 704-1 and 704-2, and a video processor 708 for direct reading or writing by either CPU 704 to an I/O (not shown) through an I/O processor 706.
  • the I/O bus 702 may be, for example, a high speed I/O bus, such as a RAMBus, or a mini-I/O bus, such as that used in the Triton chip set manufactured by Intel Corporation of Santa Clara, California. I/O memory transfers are handled through the I/O processor 706 and the memory bus associated with the I/O processor 706.
  • Memory buses 714-1 and 714-2 couples the CPUs 704-1 and 704-2, respectively, to a plurality of SMPDRAM memories 711-0 through 711-3 for communicating data.
  • the memories 711 may be the memory chip 200.
  • memories 711 are shown with four data ports and four memory arrays. Each of the four data ports comprises two of the eight 8 bit ports as already described. And, there are 8 arrays in each chip instead of the four shown for simplicity.
  • the memories 711-0 through 711-3 include arrays 712-0 through 712-3, 712-4 through 712-7, 712-8through 712-11, and 712-12 through 712-15, respectively.
  • the interconnections within the memories 711 is shown diagrammatically as a cross bar interconnection in FIG. 7.
  • a memory bus 716 couples the video processor 708 to the memories 711-0 through 711-3 for communicating data.
  • a memory bus 718 couples the I/O processor 706 to the memories 711-0 through 711-3 for communicating data.
  • the same bits of the data bus are coupled to the same memories 711.
  • bits 0 through 15 of the data buses 714 for each CPU 704 bits 0 through 15 of the data bus 716 for the video processor 708, and bits 0 through 15 of the data buses 718 for the I/O processor 706 each are coupled to the memory 711-0.
  • Each memory array 712-0 through 712-15 provides a BRDY signal to a respective processor.
  • the CPUs 704-1 and 704-2, the I/O processor 706, and the video processor 708 provide address signals to the memory arrays 712-0 through 712-15 over an address bus 714.
  • the control signals (not shown) allow the processors to arbitrate for the address bias (as done with the P6 processor of Intel) and then control the memories directly.
  • each memory array 711 provides a separate ready signal (BRDY) 226 to each CPU 704, the video processor 708, and I/O processor 706.
  • Each memory array 711-0 through 711-3 provides the BRDY signal 226 from a different port.
  • the port may be programmed at power up as part of the programming of the reprogrammable controller 212. This is sufficient since all memory chips comprising a bank of memory chips respond identically to each read or write operation.
  • the system 700 has at least as many memory chips as processors.
  • FIG. 8 there is shown a block diagram illustrating a quad CPU system 800.
  • the quad CPU system 800 may be used, for example, as a server.
  • the quad CPU system 800 has four CPUs 802-0 through 802-3 and four I/O processors 804- 0 through 804-3.
  • Memory banks 807-0 and 807-1 each comprise memory chips 808-0 through 808-7, which may be the memory chip 200.
  • a port 810 of each memory chip 808-0 through 808-7 of the memory bank 807-0 is coupled to the same port 810 of a respective memory chip 808-0 through 808-7 of the memory bank 807-1.
  • Data buses 805-0 through 805-3 of respective CPUs 802-0 through 802-3 couples each CPU 802 to respective ports 810 of each memory chip 200.
  • Data buses 806-0 through 806-3 of respective I/O processor 804-0 through 804-3 couples each I/O processor 804 to respective ports 810 of each memory chip 200.
  • the quad CPU, quad I/O processor configuration utilizes every port of the memory chip 200.
  • a 32-bit wide memory chip 200 may be organized as 8 ports x 4 bits to reduce data bus loading.
  • memory chip 200 could be reconfigured with more ports: for example, 164-bit ports.
  • For each data bus 805 and 806, the same bits of the data bus are coupled to the same memory chip 808. For example, bits 0 through 15 of the data buses 805-0 through 805-3 for each CPU 802 and bits 0 through 15 of the data buses 806-0 through 806-3 for each I/O processor 804 are coupled to memory chips 808-0 of the banks 807.
  • a flowchart illustrating the reading of data from the memory chip 200 using multi-array interleaving In a read cycle 900, first, the port is connected 902 to the link through the corresponding I/O. If 904 the data requested is in the cache 204, a cache hit, it is moved 906 to the port and is supplied to the I/O. The memory chip 200 provides a BRDY signal 228 to notify the corresponding CPU that the data is now present on the I/O. The burst counter is incremented 907 and data from the cache 204 continues to cycle to the I/O until the burst is complete 908.
  • the cache 204 is connected 912 to the array through the crossbar switch 206 as soon as it is available 910. If 914 the data is in the page currently accessed by the array 208, the cache line requested is delivered to the cache 204 and moved 906 onto the I/O as described above. If 914 the page is not already accessed, a page access 916 is initiated. Once the page is available, the data moves to the crossbar 206/cache 204/I/O as already described while its ECC is checked.
  • next cache line is not in the array 208, then when 922 the next array is not busy, the array 208 is linked 924 to the cache 204. If 926 the correct page is not being accessed, the appropriate page is accessed 928 with the next cache line and ECC is performed. This next cache line is then placed 930 in its cache for the same port in anticipation that the port will next request it.
  • FIG. 10 there is shown a flowchart illustrating the writing of data into the memory chip 200 using interleaving.
  • a write cycle 1000 if the port is enabled, the port is linked 1002 to the cache 204 and the write data is posted 1004 to the cache throughout the burst.
  • the memory chip 200 provides a BRDY signal to notify the corresponding CPU that the data has been posted.
  • the burst counter is
  • the array 208 is linked 1012 to the cache 204 to access the correct page to prepare for a write. If 1014 the next cycle is a write cycle to the same page, write data is posted 1004 to the cache 204 as described above. Otherwise 1014, the ECC is checked 1016 and generated as soon as all the data to be written for the interlace group is present. The write to the array 208 is completed when there is no new data to be written to the page.
  • FIG. 11 there is shown a block diagram illustrating an interface between a single in-line memory module (SIMM) 1102 and a motherboard 1104.
  • the motherboard 1104 may be, for example, a conventional motherboard of a conventional personal computer.
  • the SIMM 1102 and the motherboard 1104 may be used in the systems 501, 502 and 503 (FIGs. 5a-5c).
  • the single in-line memory module (SIMM) 1102 comprises a plurality of memory chips 1106-0 through 1106-3.
  • the memory chip 1106 may be the memory chip 200.
  • the SIMM 1102 has 8 16-bit ports having one load per data line and 4 16-bit ports with two loads per data line.
  • memory chips 1106-0 and 1106-1 each have ports A, C, E, and G coupled to respective data buses A, C, E, and G.
  • memory chips 1106-2 and 1106-3 each have ports B, D, F, and H coupled to respective data buses B, D, F, and H.
  • Memory chips 1106-0 through 1106-3 each have ports A', B', C, and D' coupled to respective data buses A', B', C, and D'.
  • FIG. 12 there is shown a block diagram illustrating the data bus/socket connections of the SIMMs of a two processor system 1200.
  • Data buses 1203-0 through 1203-3 couple respective ports A and A', B and B', E and A', F and B' of sockets 1202-1 through 1202-4 to a processor 1201-0.
  • Data buses 1203-4 through 1203-7 couple respective ports C and C, D and D', G and C, H and D' of sockets 1202-1 through 1202-4 to a processor 1201-1.
  • Table VIII shows the addressing of the sockets 1202.
  • each socket 1201-1 through 1202-4 may receive a bank of one SIMM 1102.
  • the system 1200 may have one to four banks.
  • a system 1200 having one bank of SIMMS 1102 has the databuses A-G of the one bank coupled to the socket 1202-1.
  • a system 1200 having two banks of SIMMS 1102 has the databuses A-G of the two banks coupled to the sockets 1202-2 and 1202-3.
  • a system 1200 having three banks of SIMMS 1102 has the databuses A-G of the three banks coupled to the sockets 1202-1 through 1202-3.
  • a system 1200 having four banks of SIMMS 1102 has the databuses A'-D' of the four banks coupled to the sockets 1202-1 through 1202-4.
  • each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102.
  • the data buses on the second SIMM 1102 are labeled I through P and match the A through H buses of FIG. 11.
  • Data buses 1303-0 through 1203-3 couple respective ports A and A', B and B', I and I", J and J' of the sockets 1202-1 through 1202-4 to a processor 1301-0.
  • Data buses 1303-4 through 1203-7 couple respective ports C and C, D and D', K and K', L and L' of the sockets 1202-1 through 1202-4 to a processor 1301-1.
  • Data buses 1303-8 through 1203-11 couple respective ports M and I', N and J', E and A', F and B" of the sockets 1202-1 through 1202-4 to a processor 1301-2.
  • Data buses 1303-12 through 1203-15 couple respective ports O and K", P and L', G and C, H and D' of the sockets 1202-1 through 1202-4 to a processor 1301-3.
  • each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102.
  • the system 1200 may have one to four banks.
  • a system 1300 having one bank of SIMMS 1102 has the databuses A-P of the one bank coupled to the socket 1302-1.
  • a system 1300 having two banks of SIMMS 1102 has the databuses A-P of the two banks coupled to the sockets 1302-2 and 1302-3.
  • a system 1300 having three banks of SIMMS 1102 has the databuses A-P of the three banks coupled to the sockets 1302-1 through 1302-3.
  • a system 1300 having four banks of SIMMS 1102 has the databuses A'-D' and F-L' of the four banks coupled to the sockets 1302-1 through 1302-4.
  • the dots on the sockets 1202-1 and 1302-1 represent an electrical connection for a system having one bank 1201 and 1301 of SIMMs.
  • the O's on the sockets 1202-2 and 1202-3 and 1302-2 and 1302-3 represent an electrical connection for a system having two banks 1201 and 1301 of SIMMs.
  • the X's on the sockets 1202-1 through 1202-4 and 1302-1 through 1302-4 represent an electrical connection for a system having four banks 1201 and 1301 of SIMMs.
  • a system having three banks has electrical connections represented by both the dots and the O's. The loading in this system is doubled. All buses A-H are enabled in the one bank system. In a two bank system, only the indicated buses are enabled.
  • sockets 1202-2 and 1302-2 buses E, F, G, and H are enabled.
  • sockets 1202-3 and 1302-3 buses A, B, C, and D are enabled.
  • the SIMMs are connected using the A' through D' buses, which are on the side of the SIMM opposite the side of the A through D buses.
  • the memory chip provides configurable connections between memory arrays and memory ports with interleaved addressing of the memory arrays. This allows multiple concurrent accesses to the memory arrays.

Abstract

A synchronous multi-port random access memory (200) has a plurality of memory arrays (208). Each memory array having a plurality of memory cells arranged in a predetermined number of rows and a predetermined number of columns. The columns of each memory array are interleaved. Each of a plurality of memory ports (201) has a subcache (204) coupled thereto for each connection between each of the plurality of memory ports. A programmable controller (212) enables the memory cells to enable the cells in interleaved groups responsive to address signals and applies control signals to the memory arrays, to the memory ports, and the caches.

Description

Synchronous Multi-Port Random Access Memory
Field Of The Invention
This invention relates to a memory for single processor and multiple processor computers, and, more particularly, to crossbar interleaving between processors, memory, and local bus and within the memory.
Background Of The Invention
PCs, workstations, and servers bottleneck when using main memory because there is only one path to main memory, which is slow. In many non-blocking secondary cache architecture of "high" performance systems, the main memory is the bottleneck that limits system performance, especially in multi-processor systems. Recent systems, such as multi-media systems and shared memory video, require data transfers of increasingly greater amounts of data. The requirements for data transfers requires new memory architecture.
Until just a few years ago, dynamic random access memory (DRAM) architecture has been fairly stagnant. Beyond the conventional multiplexed address DRAM, the major architectural improvement had been the video random access memory (VRAM) for video. To improve system performance, several new enhanced architectures for memory were recently developed. These architectures include
RAMBus, Extended Data Out (EDO), Burst EDO, Synchronous DRAM (SDRAM), such as manufactured by Micron or Samsung, CDRAM by Mitsubishi, EDRAM by
RAMTRON, and Multi-bank (MDRAM), such as manufactured by Mosys. These later architectures improve system performance with enhanced architectural features and higher raw performance.
It is desirable to have a memory that has a bandwidth for applications which transfer large amounts of data, such as database, file, and print servers, and multi-media, for multiprocessor workstations, servers, and personal computers. Summary Of The Invention
In the present invention, a synchronous multi-port dynamic random access memory (SMPDRAM) couples main memory directly to at least one central processing unit (CPU), a video accelerator, or at least one input/ output (I/O) processor, or a combination thereof. The SMPDRAM provides a direct port for each of these devices and provides a higher performance implementation of the multi-bank interleave protocol in U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference, to reduce contention and enhance performance. The crossbar of the SMPDRAM is incorporated into the main memory chip. The memory chip includes a direct interface to the CPU without intervening logic or a chip set. The memory chip is reconfigurable with the
configuration information supplied through a JTAG port.
With a multi-port DRAM, each CPU or processor can access memory at the same time as opposed to each having its own memory that may need to be
synchronized or each having to wait to get access to memory if it is being accessed by another. This allows the system to have greater throughput for applications where the cache hit rate is low, such as database applications, multi-media, multi-tasking, and multi-processor. Brief Description Of The Drawings
FIGs. 1a, 1b, 1c, 1d, and 1e are pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory, respectively.
FIG. 2 is a pictorial diagram illustrating a memory chip in accordance with the present invention.
FIG. 3 is a block diagram illustrating a memory array of the memory chip of FIG.2.
FIG. 4 is a block diagram illustrating a memory subarray of the memory chip of FIG.2.
FIGs. 5a, 5b, and 5c are block diagrams illustrating one, two and four single in-line memory module systems.
FIG. 6 is a block diagram illustrating a personal computer system.
FIG. 7 is a block diagram illustrating a dual CPU computer system.
FIG. 8 is a block diagram illustrating a quad CPU system.
FIG. 9 is a flowchart illustrating the reading of data from the memory chip using multi-array interleaving.
FIG. 10 is a flowchart illustrating the writing of data into the memory chip using interleaving.
FIG. 11 is a block diagram illustrating an interface between a single in-line memory module and a motherboard.
FIG. 12 is a block diagram illustrating the data bus/socket connections of the single in-line modules in a memory of a two processor system.
FIG. 13 is a block diagram illustrating the data bus /socket connections of the single in-line modules in a memory of a three or four processor system.
Detailed Description Of The Preferred Embodiments
Referring to FIG. 1a, 1b, 1c, 1d, and 1e, there are shown pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory,
respectively. Referring in particular to FIG. 1a, the conventional non-interleaved memory comprises memory arrays 100-0 through 100-7 that are addressed by dividing the address space into equal consecutive blocks of addresses and assigning the blocks to the memory arrays 100. For example, the memory array 100-0, the memory array 100-1, through memory array 100-7 are addressed as 0-1M, 1-2M, through 7-8M, respectively, for an 8M memory.
Referring in particular to FIG. 1b, memory arrays 102-0 through 102-7 are addressed by dividing the address space into pages and sequentially assigning the pages to a memory array. For example, for 128 word pages, the memory array 102-0 is assigned addresses 0-127, 256-383, 512-639, through 2096896-2097023; the memory array 102-1 is assigned addresses 128-255, 384-511, 640-767, through 2097024-2097151; and the memory array 102-7 is assigned addresses 6291584-6291711, 6292608-6292735, 6293632-6293759 through 8388480-8388607.
Referring in particular to FIG. 1c, the memory arrays 102-0 through 102-7 are organized on a single cache line basis. Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning
subsequent cache lines to subsequent memory arrays. For example, for 4 words per cache line, the memory array 102-0 is assigned addresses 0-3, 32-35, 64-67, through 8388576-8388579; the memory array 102-1 is assigned addresses 4-7, 36-39, 68-71 through 8388580-8388583; and the memory array 102-7 is assigned addresses 28-31, 60-63, 92-95 through 8388604-8388607.
Referring in particular to FIG. 1d, the memory arrays 102-0 through 102-7 are organized on a double cache line basis. Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 8 words per cache line, the memory array 102-0 is assigned addresses 0-7, 64-71, 128-135 through 8388544-8388551; the memory array 102-1 is assigned addresses 8-15, 72-79, 136-143 through 8388552-8399559; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607.
Referring in particular to FIG. 1e, the memory arrays 102-0 through 102-7 are organized on a quad cache line basis. Memory arrays 102-1 through 102-8 are addressed by dividing the address space on a 4 cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 16 words per cache line, the memory array 102-0 is assigned addresses 0-15, 128-143, 192-207 through 8388480-8388495; the memory array 102-1 is assigned addresses 16-31, 144-159, 208-223 through 8388496-8388511; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607. This organization of the memory 100 reduces memory array contention in multiple central processing unit (CPU) multi-threaded applications in which each CPU, because of locality, may commonly be accessing the same memory array while running the same application. An interleaved architecture evenly spreads the addressing of the application across the memory arrays to reduce the likelihood of both CPUs accessing the same memory array. The interleave pattern is adjustable based on the type of operating system and the type of application being executed by the system. The interleaving may be the interleaving described in the U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference.
Referring to FIG. 2, there is shown a pictorial diagram illustrating a memory chip 200 in accordance with the present invention. Referring to FIG. 3, there is shown a block diagram illustrating a memory array of the memory chip 200. Referring to FIG. 4, there is shown a block diagram illustrating a memory subarray of the memory chip 200. The architecture of the memory chip 200 is described for a synchronous multi-port dynamic random access memory (SMPDRAM). However, the architecture may be applied to other types of RAM, such as Static Random Access Memory (SRAM) or Flash RAM. The memory chip 200 may be in a single semiconductor device package.
For illustrative purposes, the memory chip may be organized as a 64 Mb DRAM organized with eight 8-bit or 9-bit ports. The ports can be grouped into four 16-bit ports, two 32-bit ports or one 64-bit port. In addition, there are eight 8 or 9 Mb memories; although the memory chip 200 may have other numbers of ports and memories.
The memory chip 200 uses the multi-array interleave protocol described above in conjunction with FIGs. 1b through le for reducing the timeline losses incurred when two processors attempt to access the same memory array. By interleaving on a cache line basis, each processor ties up a memory array for a shorter time and then releases the memory array for another processor that may be waiting. In the event that two CPUs attempt to access the same memory array, the access by one CPU is delayed. However, by interleaving, the delay is reduced because the first CPU which gained access to the array most likely moves to the next array if the next data that it seeks is sequential to the first. In contrast, a non-interleaved architecture where the first CPU ties up the same bank extends the delay of access of the second CPU as long as the first CPU continues to access sequential information in the same array.
The memory chip 200 includes a plurality of bi-directional input/output (I/O) ports 201-0 through 201-7, an I/O bus bar 207, a crossbar link 209, a plurality of cache selectors 222-0 through 222-7, a plurality of embedded caches 204-0 through 204-7, a plurality of crossbar switches 206-0 through 206-7, a plurality of sense amplifiers 214-0 through 214-7, a plurality of memory arrays 208-0 through 208-7, and a reprogrammable controller 212. For illustrative purposes, the memory chip 200 has eight I/O ports 201 and eight memory arrays 208. Each memory array 208-0 through 208-7 is coupled to a respective error checking and correction (ECC) circuit 210-0 through 210-7. In workstations and servers, the memory chip 200 supports error checking and correction for CPUs without direct ECC support. For personal computers (PCs) for which ECC is not required, the ECC in the memory chip 200 corrects defects and therefore increases manufacturing yield and lowering
manufacturing cost. In those CPUs with direct support or for systems that support parity, the extra bits provide a ninth bit for each port 201 as described below. The memory chip 200 also includes row decoders 216-0 through 216-3.
Each of the bi-directional I/O ports 201-0 through 201-7 has a register 202-0 through 202-7, respectively. The I/O bus bar 207 couples each register 202 to each of the plurality of cache selectors 222 via the crossbar link 209. Thus, for an 8/9 bit input to each register 202, each cache selector 222-0 through 222-7 receives 8/9 bits from each register 202-0 through 7. An I/O controller 242 provides control signals to the registers 202 for controlling the transfer of data between the ports 201 and the memory arrays 208 in response to control signals from the reprogrammable controller 212.
Referring in particular to FIG. 3, each array 208 comprises a plurality of subarrays 308-0 through 308-7. Each of the cache selectors 222-0 through 222-7 comprises subcache selectors 302-0 through 302-15 for controlling the transfer of data between the crossbar link 209 and a respective one of the caches 204. The cache selector 222 may be, for example, a plurality of pass transistors that couple one bit from the crossbar link 209 to a subcache 304. Each cache 204 comprises a plurality of subcaches 304-0 through 304-8 for storing data being transferred between the memory and the ports 201 Each crossbar switches 206 comprises a plurality of crossbar switches 306-0 through 306-15 for selectively coupling the subcaches 302 to a respective memory subarray 308. The ECC circuit 210 comprises a plurality of ECC circuits 310-0 through 310-7. Each of the plurality of ECC circuits 310-0 through 310-7 provides error checking and correction for a corresponding pair of crossbar switches 306-0 through 306-7. Each column of the subarray has a corresponding one of a plurality of sense amplifiers 314. Data is communicated to memory cells in the memory subarrays 308 per addressing described below.
Referring again to FIG. 2, at power up, the memory chip 200 receives through an interface port 224, array, and control signal configuration information for
programming the reprogrammable controller 212. The interface port 224 is preferably a JTAG port. The control signal information may be configured for specific processors, such as one family of processors, and for selecting the signal configuration of the data, such as the voltage levels of the I/O signals, e.g. low voltage transistor- transistor logic (LVTTL) or Enhanced Gunning Logic (GTL+). To provide byte write capability, the reprogrammable controller 212 provides a separate byte enable (BE) signal 223 to each port 201 for writes. The reprogrammable controller 212 provides a separate ready (BRDY) signal 226, which is programmable to be associated with any one port 201, as part of the configuration information at power up. The reprogrammable controller 212 receives address (A0-A24) signals 228 for addressing the memory arrays 208 in response thereto and in accordance with the array control information provided at power up. Such addressing is described in greater detail below. The reprogrammable controller 212 provides the address signals to the row decoders 216-0 through 216-3 for selecting rows of the memory arrays 208. The reprogrammable controller 212 provides selection signals to array /cache line selectors 218-0 through 218-3 for enabling the selective coupling of the crossbar switches 206 to the cache selectors 222.
Port identification (ID) signals 230 program the reprogrammable controller 212 to define a port number (such as port 0 through port 7) of the ports 201-0 through 201-7. To avoid bus contention, each memory chip 200 preferably operates identically so that all memory chips 200 in a bank make identical arbitration choices. Consequently, the memory chip 200 gives priority based on the port number. For example, Port 0 has the highest priority and Port 7 has the lowest priority. This allows the processors to be connected in order. The programmability of the priority allows user tuning. A clock (Clk) signal 233 provides timing control for read and write cycles.
A pair of select (SEL) signals 240 provides an identification of the memory chip 200 for addressing as described below in conjunction with FIG. 6. The memory chip 200 has an interface for receiving control signals. For interfacing with a Pentium processor for decode memory read and write operations, the control signals include: ADS, CACHE, M/IO, D/C, and W/R.
Responsive to the array control information, the reprogrammable controller 212 can configure the memory arrays 208 and the I/O ports 201 in any of a number of possible configurations. For clarity, the format of n1 × n2/n2' × n3 × n4 is used to indicate a configuration having n1 ports, a port width of n2 bits (or n2' bits if no parity is used), n3 arrays, and an array depth of n4 bits. For example, the configuration may be 2×32/36×8×256Kb as described below in conjunction with FIG. 6; the configuration may be 4×16/18×8×512Kb as described below in conjunction with FIG. 7; the
configuration may be 8×8/9×8×1Mb as described below in conjunction with FIG. 8; or the configuration may be 1×64/72×4×256Kb (not shown). If the memory chip 200 is configured as 2×32/36×8×256Kb, for example, four ports access the selected array. In the 1×64/72×4×256Kb configuration, two memory arrays are accessed in parallel.
Control signals can be tailored for specific processors. The configuration of the memory chip 200 shown in FIG. 2 is a default configuration and is compatible with the X86 family of processors manufactured by Intel Corporation of Santa Clara, California, has 8 bit wide ports, has a 2×32×8×256K organization, has ECC, and has a single cache line interleave protocol.
The size of the memory can be incrementally expanded by adding memory chips 200 in parallel to other memory chips 200. One such embodiment is shown in FIG. 6. Different densities and configurations may be implemented. In a single CPU system, such as a PC, the memory chip 200 may be used as a conventional 64-bit wide memory to provide memory increments of 8MB or (as in FIG. 6 below) 2×32-bits wide utilizing the crossbar 206 to separate I/O and CPU accesses. A 2×32/36-bit wide configuration provides less loading but has memory increments of 16MB. Similarly, if the memory chip 200 is used as a 4×16/18 memory, memory increments are 32MB, and, if used as a 8×8 memory, memory increments are 64MB.
Each memory array 208 has a plurality of memory cells (not shown) typically connected in rows and columns. The memory cells are, for example, conventional dynamic random access memory cells. For example, for a memory of 8 arrays, the cells may be connected in 8K rows and 1,152 columns. The columns of the cells are interlaced so that the error checking and correction circuit 210 detects and corrects any single defect in the memory array 208 even ones that affect adjacent memory cells which will be in different interlace groups. The column sense amplifiers 214 are selectively connected to the crossbar switches 206 for distributing the column data to the caches 204.
The memory array 208 has three sections: a data section, an ECC section, and a hybrid section that is used for ECC if configured for ECC, or is used for additional bits per port if configured without ECC. For example, the memory array may have 8K×1K for data, and 8K×128 that is used for ECC or, if configured without ECC, is the 9th bit per port.
The arrays 208 are interlaced to ensure that no single defect or alpha hit causes an ECC failure. So, for example, for a two way interlaced array, the columns are divided into two groups each with its own ECC data bits. The two groups span alternating columns. So, the columns might be labeled: A0, B0, A1, B1, A2, B2, A3, B3, and so forth.
Each ECC circuit 210-0 through 210-7 has a conventional ECC generator for writes and a conventional checker for reads. The ECC circuit 210 corrects a single bit error and detects a double bit error. The ECC circuit 210 therefore checks ECC during an array read and generates ECC during writes. ECC failures are reported via the interface port 224.
Referring in particular to FIG. 4, a subarray 308 comprises a plurality of column groups 401-0 through 401-17. For illustrative purposes each sub array is divided into 144 columns. Here, FIG. 4 shows the subarray 308-0 having columns 0 through 143, the crossbar switches 306-0 and 306-1, the caches 304-0 and 304-1, and the cache selectors 302-0 and 302-1. The other subarrays 308, crossbar switches 306, caches 304, and cache selectors 302 have identical architecture.
Each I/O port 201-0 through 201-7 is coupled through a respective interconnect group 402-0 through 402-7 of the crossbar link 209 to each of the cache selectors 302-0 and 302-1 for selective coupling to a respective cache 304-0 and 304-1.
Each cache 304 comprises subcaches 404-0 through 404-9. Each cache 304 is 4 words deep × 36 bits wide and can store at least four cache lines of a processor in the X86 family of processors manufactured by Intel Corporation of Santa Clara, California. Each cache 304 can post data for writes until the memory array 208 is available or prefetch the next consecutive cache line for reads. Each subcache 404 has an associated tag used by the reprogrammable controller 212 to determine if there is a cache hit. Each cache selector 302 selectively couples the interconnect groups, and thus the I/O ports 201-0 through 201-7, to the subcaches 404. The pair of crossbar switches 306-0 and 306-1 comprises crossbar switches 406-0 through 406-17 for selectively coupling the subcaches 404-0 through 404-8 of both caches 304-0 and 304-1 through the ECC circuit 310-0 to the sense amplifiers 314.
Referring to FIGs. 2-4 together, the size of the cache is a trade off of the economics of a smaller cache versus the storage capacity of a larger cache. The greater number of cache lines, the less likely that an array access is required and consequently a page miss is risked. The fewer cache lines that are stored, the more quickly the array is released for any pending access from another port. The cache size can be altered by changing the number of columns in each array. A larger cache requires more columns and consequently fewer rows.
To access data in an array 208, first its row (or "page") is selected. The data is sensed and latched in the corresponding column sense amplifiers 214. While the error checking and correction circuit 210 checks the data for each row of the memory array 208 as described earlier herein, the data in the addressed columns is routed via the crossbar switch 206 to the appropriate cache 204. The data goes through the cache select 222 of the port, through the link 209, to the I/O bus bar 207, through the I/O port 201, and to the I/O. This avoids having to access the array 208 and incur a potential page miss penalty if the array 208 has subsequently been accessed by another processor to another page.
The crossbar switches 206 also facilitate SNARFing, in which one CPU can be reading the data that another CPU is writing to the array. In that case, one of the memory arrays 208 is linked to both one of the ports 201-0 through 201-7 where the data is being written and at least one of the ports 201-0 through 201-7 where the data is to be read. Similarily, data can be transferred from one port to another such as when a CPU accesses I/O directly. The columns of the memory arrays 208 are grouped, for example, in groups of 8. Each column group is connected to the crossbar switch, which selectively connects the columns to the caches 204 responsive to the array and control configuration
information. Here, the crossbar switch is an 8 x 8 switch. When a port 201 is connected to an array 208, based on cache select bits, described below in conjunction with Tables I through IV, the A0 and A1 address signals 228, and, if a burst, the cache line interleave protocol (linear or gray scale), one of the 16 subcaches 300 for that port 201 are connected through its cache select 222 to the I/O bus bar 207 of the port 202 for each cycle, until the transaction is complete. Each subcache has one bit for each I/O bit. Consequently, it takes 4 subcaches to supply a cache line and 4 caches lines can be cached in each array for each port or group of ports at a time.
Depending on the port configuration, the cache lines and array are addressed as shown below in Table I. The interleave protocol may provide, for example, single, double, or quad interleaving of 4, 8 or 16 words cached, respectively.
Figure imgf000011_0001
In Table I, the number in the parentheses, ( ), equals the number of words cached; the A0-A1 address signals 228 are used in nonburst operations to select an individual word within a cache line.
The distribution of cache lines forms the pattern shown in Table II. The most significant address (AH) is the A22 address signal 228.
Figure imgf000012_0001
The memory arrays 208 may be split into two groups working together; each group has its own interleave pattern. Each group of memory arrays 208 has
interleaved addresses that are different than the interleaved addresses of the other groups of memory arrays. For example, the memory arrays 208 may be split into a first group of two arrays 208-0 through 208-1 for shared memory video and a second group of six arrays 208-2 through 208-7 for main memory. The six arrays 208 interleave among themselves and the two arrays interleave among themselves. The possible groupings are: 2/6, 3/5, 4/4. However, only the 4/4 or 8/0 grouping is used if there are more than one bank of chips. The grouping affects the array and cache line selects. Each grouping has its own unique decoding to ensure no two consecutive cache lines are in the same array and to simplify the decoding. The cache lines and array are addressed in 4 array interleaving as shown in Table III.
Figure imgf000012_0002
Figure imgf000013_0002
The 4/4 distribution of cache lines forms the pattern shown in Table IV.
Figure imgf000013_0003
For 2 / 6 and 3/5 interleavvng, a pattern similar to the pattern shown in Table IV may be used. However, Table V shows an alternate pattern which simplifies the address decoding.
Figure imgf000013_0001
Figure imgf000014_0001
where AH is the most significant address and AL is the least significant address.
For 6L/2U (6 arrays are the lower arrays and 2 arrays are the upper arrays) interleaving, the address selection signals are defined as:
AS2 = /AL+1/AL+AHAH-1+AH/AL;
AS1 = /AHAL+AHAH-1;
AS0 = AH-2;
CLS1 = AH/AH-1+AL+1AL+AHAL;
CLS0 = /AHAH-1+AHAL+1
The 6L/2U distribution of cache lines forms the pattern shown in Table VI.
Figure imgf000014_0002
For 2L/6U interleaving, the AS signals are inverted.
For 5L/3U interleaving, the address selection signals are defined as:
AS2=AL+1AL/AH+AHAH-1+AHAH-2;
AS1=/AH/AL+1+AH/AH-1/ALAH-2+AHALAH-1+AHAH-1/AH-2; AS0=/AL+1/AL/AH + /AH/ALAH-1 + AH/AL + AH/AH-1AH-2;
CLS1 = AH-1AH-2 + AH-1/AH-2/AL + /AH-1/AH-2AL;
CLS0 = /AHAH-2 + AH-1/AL+1AL + AL+1ALAH
The 5L/3U distribution of cache lines forms the pattern shown in Table VII.
Figure imgf000015_0001
For 3L/5U interleaving, the AS signals are inverted.
Other patterns may be used. These patterns simplify the decoding while ensuring that no two consecutive cache lines are in the same array. Each pattern is in 4 cache line increments. It is repeated through each page. The number of increments per page depends on the port grouping.
Referring to FIGs. 5a, 5b, and 5c, there are shown block diagrams illustrating one, two and four single in-line memory module systems 500, 501, and 502,
respectively. Referring in particular to FIG. 5a, the system 500 comprises a pair of processors 504-0 and 504-1, and a single in line memory module 506-0 which comprises synchronous multi-port dynamic random access memories (SMPDRAMs) 508-0 through 508-3. The module 506 may be the SIMM 1102 described below in conjunction with FIG. 11. The connections of the system 500 may be as described below in conjunction with FIG. 12. A data bus 510 of the processor 504-0 is divided into groups 510-0 through 510-3, each group having a predetermined number of bits. Similarly, a data bus 512 of the processor 504-1 is divided into groups 512-0 through 512-3, each group having a predetermined number of bits. The groups 510-0 through 510-3 and 512-0 through 512-3 preferably each include the same bits of the respective data bus 510 and 512. Each of the groups 510-0 through 510-3 is coupled to a respective SMPDRAM 508-0 through 508-3; and similarly each of the groups 512-0 through 512-3 is coupled to a respective SMPDRAM 508-0 through 508-3.
Referring in particular to FIG. 5b, the system 501 comprises a pair of processors 504-0 and 504-1, and a pair of single in line memory modules 506-0 and 506-1, each module 506 comprising SMPDRAMs 508-0 through 508-3. The module 506 may be the SIMM 1102 described below in conjunction with FIG. 11. The connections of the system 501 may be as described below in conjunction with FIG. 12. A data bus 514 of the processor 504-0 is divided into groups 514-0 through 514-7, each group having a predetermined number of bits. Similarly, a data bus 516 of the processor 504-1 is divided into groups 516-0 through 516-7, each group having a predetermined number of bits. The groups 514-0 through 514-7 and 516-0 through 516-7 preferably each include the same bits of the respective data bus 514 and 516. The group 514-0, 514-2, 514-4, 514-6 and the group 516-0, 516-2, 516-4, 516-6 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-1. The group 514-1, 514-3, 514-5, 514-7 and the group 516-1, 516-3, 516-5, 516-7 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-0.
Referring in particular to FIG. 5c, the system 502 comprises a pair of processors 504-0 and 504-1, and four single in line memory modules 506-0 through 506-3, each module 506 comprising SMPDRAMs 538-0 through 538-3. A data bus 524 of the processor 504-0 is divided into groups 524-0 through 524-7, each group having a predetermined number of bits. Similarly, a data bus 526 of the processor 504-1 is divided into groups 526-0 through 526-7, each group having a predetermined number of bits. The groups 524-0 through 524-7 and 526-0 through 526-7 preferably each include the same bits of the respective data bus 524 and 526. The group 524-0, 524-2, 524-4, 524-6 and the group 526-0, 526-2, 526-4, 526-6 each are coupled to both
SMPDRAMs 528-2 and 528-3 of the modules 536-0 through 536-3. The group 524-1, 524-3, 524-5, 524-7 and the group 526-1, 526-3, 526-5, 526-7 each are coupled to both SMPDRAMs 528-0 and 528-1 of the modules 536-0 through 536-3.
Referring to FIG. 6, there is shown a block diagram illustrating a personal computer (PC) system 600 having a memory 602 organized in a 2x32/36 configuration, a central processing unit (CPU) 604, and an I/O processor 606. The memory 602 includes banks 608-0 through 608-3, each bank 608 comprising memory chips 200. For a 64-bit processor, such as a Pentium processor, each bank is 64-bits wide. Two additional address lines (A23-A24) that match the two select (SEL) pins 240 of the memory chips 200 provide the unique address for each bank 608 of chips 200. With these additional address lines, up to four banks of chips can be accommodated without additional external decoding.
Referring to FIG. 7, there is shown a block diagram illustrating a dual CPU computer system 700. Such a system may be used for a personal computer or workstation. An I/O bus 702 connects CPUs 704-1 and 704-2, and a video processor 708 for direct reading or writing by either CPU 704 to an I/O (not shown) through an I/O processor 706. The I/O bus 702 may be, for example, a high speed I/O bus, such as a RAMBus, or a mini-I/O bus, such as that used in the Triton chip set manufactured by Intel Corporation of Santa Clara, California. I/O memory transfers are handled through the I/O processor 706 and the memory bus associated with the I/O processor 706.
Memory buses 714-1 and 714-2 couples the CPUs 704-1 and 704-2, respectively, to a plurality of SMPDRAM memories 711-0 through 711-3 for communicating data. The memories 711 may be the memory chip 200. For simplicity, memories 711 are shown with four data ports and four memory arrays. Each of the four data ports comprises two of the eight 8 bit ports as already described. And, there are 8 arrays in each chip instead of the four shown for simplicity. The memories 711-0 through 711-3 include arrays 712-0 through 712-3, 712-4 through 712-7, 712-8through 712-11, and 712-12 through 712-15, respectively. The interconnections within the memories 711 is shown diagrammatically as a cross bar interconnection in FIG. 7.
A memory bus 716 couples the video processor 708 to the memories 711-0 through 711-3 for communicating data. A memory bus 718 couples the I/O processor 706 to the memories 711-0 through 711-3 for communicating data. For each data bus 714, 716, and 718, the same bits of the data bus are coupled to the same memories 711. For example, bits 0 through 15 of the data buses 714 for each CPU 704, bits 0 through 15 of the data bus 716 for the video processor 708, and bits 0 through 15 of the data buses 718 for the I/O processor 706 each are coupled to the memory 711-0.
Each memory array 712-0 through 712-15 provides a BRDY signal to a respective processor. The CPUs 704-1 and 704-2, the I/O processor 706, and the video processor 708 provide address signals to the memory arrays 712-0 through 712-15 over an address bus 714. The control signals (not shown) allow the processors to arbitrate for the address bias (as done with the P6 processor of Intel) and then control the memories directly.
Because each array 208 has its own page hit or miss, each memory array 711 provides a separate ready signal (BRDY) 226 to each CPU 704, the video processor 708, and I/O processor 706. Each memory array 711-0 through 711-3 provides the BRDY signal 226 from a different port. The port may be programmed at power up as part of the programming of the reprogrammable controller 212. This is sufficient since all memory chips comprising a bank of memory chips respond identically to each read or write operation. The system 700 has at least as many memory chips as processors.
Referring to FIG. 8, there is shown a block diagram illustrating a quad CPU system 800. The quad CPU system 800 may be used, for example, as a server. The quad CPU system 800 has four CPUs 802-0 through 802-3 and four I/O processors 804- 0 through 804-3. Memory banks 807-0 and 807-1 each comprise memory chips 808-0 through 808-7, which may be the memory chip 200. A port 810 of each memory chip 808-0 through 808-7 of the memory bank 807-0 is coupled to the same port 810 of a respective memory chip 808-0 through 808-7 of the memory bank 807-1. Data buses 805-0 through 805-3 of respective CPUs 802-0 through 802-3 couples each CPU 802 to respective ports 810 of each memory chip 200. Data buses 806-0 through 806-3 of respective I/O processor 804-0 through 804-3 couples each I/O processor 804 to respective ports 810 of each memory chip 200. Thus, for an 8 port memory chip 200, the quad CPU, quad I/O processor configuration utilizes every port of the memory chip 200. For large arrays, a 32-bit wide memory chip 200 may be organized as 8 ports x 4 bits to reduce data bus loading. To allow for more CPUs 802 or I/O processors 804, memory chip 200 could be reconfigured with more ports: for example, 164-bit ports. For each data bus 805 and 806, the same bits of the data bus are coupled to the same memory chip 808. For example, bits 0 through 15 of the data buses 805-0 through 805-3 for each CPU 802 and bits 0 through 15 of the data buses 806-0 through 806-3 for each I/O processor 804 are coupled to memory chips 808-0 of the banks 807.
Referring to FIG. 9, there is shown a flowchart illustrating the reading of data from the memory chip 200 using multi-array interleaving. In a read cycle 900, first, the port is connected 902 to the link through the corresponding I/O. If 904 the data requested is in the cache 204, a cache hit, it is moved 906 to the port and is supplied to the I/O. The memory chip 200 provides a BRDY signal 228 to notify the corresponding CPU that the data is now present on the I/O. The burst counter is incremented 907 and data from the cache 204 continues to cycle to the I/O until the burst is complete 908.
If 904 the data requested is not in the cache 204, the cache 204 is connected 912 to the array through the crossbar switch 206 as soon as it is available 910. If 914 the data is in the page currently accessed by the array 208, the cache line requested is delivered to the cache 204 and moved 906 onto the I/O as described above. If 914 the page is not already accessed, a page access 916 is initiated. Once the page is available, the data moves to the crossbar 206/cache 204/I/O as already described while its ECC is checked.
In parallel, if 920 the next cache line is not in the array 208, then when 922 the next array is not busy, the array 208 is linked 924 to the cache 204. If 926 the correct page is not being accessed, the appropriate page is accessed 928 with the next cache line and ECC is performed. This next cache line is then placed 930 in its cache for the same port in anticipation that the port will next request it.
Referring to FIG. 10, there is shown a flowchart illustrating the writing of data into the memory chip 200 using interleaving. In a write cycle 1000, if the port is enabled, the port is linked 1002 to the cache 204 and the write data is posted 1004 to the cache throughout the burst. The memory chip 200 provides a BRDY signal to notify the corresponding CPU that the data has been posted. The burst counter is
incremented 1008 and data is posted 1004 to the cache until 1006 the burst is complete. After the burst is finished, when 1010 the array is available, the array 208 is linked 1012 to the cache 204 to access the correct page to prepare for a write. If 1014 the next cycle is a write cycle to the same page, write data is posted 1004 to the cache 204 as described above. Otherwise 1014, the ECC is checked 1016 and generated as soon as all the data to be written for the interlace group is present. The write to the array 208 is completed when there is no new data to be written to the page.
Referring to FIG. 11, there is shown a block diagram illustrating an interface between a single in-line memory module (SIMM) 1102 and a motherboard 1104. The motherboard 1104 may be, for example, a conventional motherboard of a conventional personal computer. The SIMM 1102 and the motherboard 1104 may be used in the systems 501, 502 and 503 (FIGs. 5a-5c). The single in-line memory module (SIMM) 1102 comprises a plurality of memory chips 1106-0 through 1106-3. The memory chip 1106 may be the memory chip 200. The SIMM 1102 has 8 16-bit ports having one load per data line and 4 16-bit ports with two loads per data line. More specifically, memory chips 1106-0 and 1106-1 each have ports A, C, E, and G coupled to respective data buses A, C, E, and G. Similarly, memory chips 1106-2 and 1106-3 each have ports B, D, F, and H coupled to respective data buses B, D, F, and H. Memory chips 1106-0 through 1106-3 each have ports A', B', C, and D' coupled to respective data buses A', B', C, and D'.
Referring to FIG. 12, there is shown a block diagram illustrating the data bus/socket connections of the SIMMs of a two processor system 1200. Data buses 1203-0 through 1203-3 couple respective ports A and A', B and B', E and A', F and B' of sockets 1202-1 through 1202-4 to a processor 1201-0. Data buses 1203-4 through 1203-7 couple respective ports C and C, D and D', G and C, H and D' of sockets 1202-1 through 1202-4 to a processor 1201-1. Table VIII shows the addressing of the sockets 1202.
Figure imgf000019_0001
In the two processor system 1200, each socket 1201-1 through 1202-4 may receive a bank of one SIMM 1102. The system 1200 may have one to four banks. A system 1200 having one bank of SIMMS 1102 has the databuses A-G of the one bank coupled to the socket 1202-1. A system 1200 having two banks of SIMMS 1102 has the databuses A-G of the two banks coupled to the sockets 1202-2 and 1202-3. A system 1200 having three banks of SIMMS 1102 has the databuses A-G of the three banks coupled to the sockets 1202-1 through 1202-3. A system 1200 having four banks of SIMMS 1102 has the databuses A'-D' of the four banks coupled to the sockets 1202-1 through 1202-4.
Referring to FIG. 13, there is shown a block diagram illustrating the data bus/socket connections of the SIMMs of a memory 1300 in a three or four processor system 1300. In the three or four processor system 1300, each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102. For a two SIMMs 1102 per bank architecture, the data buses on the second SIMM 1102 are labeled I through P and match the A through H buses of FIG. 11. Data buses 1303-0 through 1203-3 couple respective ports A and A', B and B', I and I", J and J' of the sockets 1202-1 through 1202-4 to a processor 1301-0. Data buses 1303-4 through 1203-7 couple respective ports C and C, D and D', K and K', L and L' of the sockets 1202-1 through 1202-4 to a processor 1301-1. Data buses 1303-8 through 1203-11 couple respective ports M and I', N and J', E and A', F and B" of the sockets 1202-1 through 1202-4 to a processor 1301-2. Data buses 1303-12 through 1203-15 couple respective ports O and K", P and L', G and C, H and D' of the sockets 1202-1 through 1202-4 to a processor 1301-3. In the three or four processor system 1300, each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102. The system 1200 may have one to four banks. A system 1300 having one bank of SIMMS 1102 has the databuses A-P of the one bank coupled to the socket 1302-1. A system 1300 having two banks of SIMMS 1102 has the databuses A-P of the two banks coupled to the sockets 1302-2 and 1302-3. A system 1300 having three banks of SIMMS 1102 has the databuses A-P of the three banks coupled to the sockets 1302-1 through 1302-3. A system 1300 having four banks of SIMMS 1102 has the databuses A'-D' and F-L' of the four banks coupled to the sockets 1302-1 through 1302-4.
Referring now to both FIGs. 12 and 13, the dots on the sockets 1202-1 and 1302-1 represent an electrical connection for a system having one bank 1201 and 1301 of SIMMs. The O's on the sockets 1202-2 and 1202-3 and 1302-2 and 1302-3 represent an electrical connection for a system having two banks 1201 and 1301 of SIMMs. The X's on the sockets 1202-1 through 1202-4 and 1302-1 through 1302-4 represent an electrical connection for a system having four banks 1201 and 1301 of SIMMs. A system having three banks has electrical connections represented by both the dots and the O's. The loading in this system is doubled. All buses A-H are enabled in the one bank system. In a two bank system, only the indicated buses are enabled. In the sockets 1202-2 and 1302-2 buses E, F, G, and H are enabled. In the sockets 1202-3 and 1302-3 buses A, B, C, and D are enabled. In a four bank system, the SIMMs are connected using the A' through D' buses, which are on the side of the SIMM opposite the side of the A through D buses.
In summary, the memory chip provides configurable connections between memory arrays and memory ports with interleaved addressing of the memory arrays. This allows multiple concurrent accesses to the memory arrays.

Claims

We Claim:
1. A memory comprising:
a plurality of dynamic random access memory arrays, each memory array having a plurality of memory cells arranged in a predetermined number of rows and a predetermined number of columns;
a plurality of memory ports; and
a crossbar switch selectively connecting the plurality of memory arrays to the plurality of memory ports.
2. The memory of claim 1 implemented in a semiconductor device.
3. The memory of claim 1 further comprising:
a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals, the interleaved groups being selected responsive to array control information; and
an interface for receiving the array control information for programming the programmable controller.
4. The memory of claim 1 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having its own interface.
5. The memory of claim 1 further comprising
a plurality of caches, one cache being coupled to each of the plurality of arrays for each port-array connection.
6. The memory of claim 5 implemented in a semiconductor device.
7. The memory of claim 5 further comprising:
a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals and for applying control signals to the memory arrays, to the memory ports, and to the caches to enable transfers between the memory arrays, the memory ports, and the caches, the interleaved groups being selected responsive to array control information; and an interface for receiving the array control information for programming the programmable controller.
8. The memory of claim 5 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having a different interface.
9. The memory of claim 5 further comprising an error checking and correction circuit coupled to the memory arrays.
10. The memory of claim 9 implemented in a semiconductor device.
11. The memory of claim 9 further comprising:
a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals and for applying control signals to the memory arrays, to the memory ports, and to the caches to enable transfers between the memory arrays, the memory ports, and the caches, the interleaved groups being selected responsive to array control information; and
an interface for receiving the array control information for programming the programmable controller.
12. The memory of claim 9 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having a different interface.
13. The memory of claim 5 wherein the interleving of the memory arrays has an address for one cache line assigned to one of the plurality of memory arrays and has an address for a subsequent cache line assigned to another of the plurality of memory arrays, said address for one cache line enabling said one of the plurality of memory arrays for a data transfer, and said address for a subsequent cache line enabling said another of the plurality of memory arrays to prefetch a subsequent cache line or a group of cache lines for a subsequent data transfer.
14. The memory of claim 1 further comprising an error checking and correction circuit coupled to the plurality of memory arrays with the columns of each memory array being interlaced .
15. The memory of claim 1 wherein the crossbar switch selectively connects at least two of the plurality of ports to the same one of the plurality of arrays for accessing thereof by said at least two of the plurality of ports.
16. The memory of claim 1 wherein the plurality of memory arrays comprises first and second groups of memory arrays, each group of memory arrays having interleaved addresses different than the interleaved addresses of the other group of memory arrays.
17. A reconfigurable memory comprising:
a plurality of memory cells arranged in a predetermined number of rows and in a predetermined number of columns;
a plurality of memory ports selectively coupled to the plurality of memory cells responsive to control signals, and having an output for providing data signals having one of a plurality of selectable data signal formats responsive to a selection signal, each of the plurality of selectable data signal formats corresponding to a protocol of an external processor;
a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals, for generating the selection signal, and for generating the control signals, the control signals being configurable to match the control signal protocol of one of a plurality of external processors; and
an interface for receiving information for programming the programmable controller.
18. The reconfigurable memory of claim 17 wherein the data signal formats correspond to a voltage level for states of the data signal.
19. A computer system comprising:
a plurality of processors,
a plurality of data buses, each of the plurality of data buses being directly connected to one of the plurality of processors; and
a plurality of memories, each of the plurality of memories being directly connected to a portion of the separate data buses that is the same for each of the plurality of processors.
PCT/US1996/014311 1995-09-08 1996-09-06 Synchronous multi-port random access memory WO1997011419A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US52585695A 1995-09-08 1995-09-08
US08/525,856 1995-09-08

Publications (2)

Publication Number Publication Date
WO1997011419A2 true WO1997011419A2 (en) 1997-03-27
WO1997011419A3 WO1997011419A3 (en) 1997-04-24

Family

ID=24094880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/014311 WO1997011419A2 (en) 1995-09-08 1996-09-06 Synchronous multi-port random access memory

Country Status (1)

Country Link
WO (1) WO1997011419A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999005604A1 (en) * 1997-07-28 1999-02-04 Nexabit Networks, Llc Multi-port internally cached drams
DE19937176A1 (en) * 1999-08-06 2001-02-15 Siemens Ag Multiprocessor system
WO2010126658A2 (en) 2009-04-29 2010-11-04 Micron Technology, Inc. Multi-port memory devices and methods

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4783731A (en) * 1982-07-15 1988-11-08 Hitachi, Ltd. Multicomputer system having dual common memories
US4930066A (en) * 1985-10-15 1990-05-29 Agency Of Industrial Science And Technology Multiport memory system
US5127014A (en) * 1990-02-13 1992-06-30 Hewlett-Packard Company Dram on-chip error correction/detection
US5283877A (en) * 1990-07-17 1994-02-01 Sun Microsystems, Inc. Single in-line DRAM memory module including a memory controller and cross bar switches
US5386511A (en) * 1991-04-22 1995-01-31 International Business Machines Corporation Multiprocessor system and data transmission apparatus thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4783731A (en) * 1982-07-15 1988-11-08 Hitachi, Ltd. Multicomputer system having dual common memories
US4930066A (en) * 1985-10-15 1990-05-29 Agency Of Industrial Science And Technology Multiport memory system
US5127014A (en) * 1990-02-13 1992-06-30 Hewlett-Packard Company Dram on-chip error correction/detection
US5283877A (en) * 1990-07-17 1994-02-01 Sun Microsystems, Inc. Single in-line DRAM memory module including a memory controller and cross bar switches
US5386511A (en) * 1991-04-22 1995-01-31 International Business Machines Corporation Multiprocessor system and data transmission apparatus thereof

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999005604A1 (en) * 1997-07-28 1999-02-04 Nexabit Networks, Llc Multi-port internally cached drams
AU748133B2 (en) * 1997-07-28 2002-05-30 Nexabit Networks, Llc Multi-port internally cached drams
DE19937176A1 (en) * 1999-08-06 2001-02-15 Siemens Ag Multiprocessor system
WO2010126658A2 (en) 2009-04-29 2010-11-04 Micron Technology, Inc. Multi-port memory devices and methods
EP2425346A2 (en) * 2009-04-29 2012-03-07 Micron Technology, Inc. Multi-port memory devices and methods
CN102414669A (en) * 2009-04-29 2012-04-11 美光科技公司 Multi-port memory devices and methods
EP2425346A4 (en) * 2009-04-29 2014-05-07 Micron Technology Inc Multi-port memory devices and methods
US8930642B2 (en) 2009-04-29 2015-01-06 Micron Technology, Inc. Configurable multi-port memory device and method thereof

Also Published As

Publication number Publication date
WO1997011419A3 (en) 1997-04-24

Similar Documents

Publication Publication Date Title
US6108745A (en) Fast and compact address bit routing scheme that supports various DRAM bank sizes and multiple interleaving schemes
US6415364B1 (en) High-speed memory storage unit for a multiprocessor system having integrated directory and data storage subsystems
US6272594B1 (en) Method and apparatus for determining interleaving schemes in a computer system that supports multiple interleaving schemes
US5752260A (en) High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US6854043B2 (en) System and method for multi-modal memory controller system operation
US5421000A (en) Memory subsystem having a static row memory and a dynamic RAM
US4577293A (en) Distributed, on-chip cache
US8244952B2 (en) Multiple processor system and method including multiple memory hub modules
US6957285B2 (en) Data storage system
KR100201057B1 (en) Integrated circuit i/o using a high performance bus interface
US5896404A (en) Programmable burst length DRAM
KR100626223B1 (en) A memory expansion module with stacked memory packages
US6356991B1 (en) Programmable address translation system
KR101428844B1 (en) Multi-mode memory device and method
US6715025B2 (en) Information processing apparatus using index and tag addresses for cache
US6049855A (en) Segmented memory system employing different interleaving scheme for each different memory segment
US5848258A (en) Memory bank addressing scheme
JP2648548B2 (en) Computer memory
US5329489A (en) DRAM having exclusively enabled column buffer blocks
US6202133B1 (en) Method of processing memory transactions in a computer system having dual system memories and memory controllers
JPH0766350B2 (en) High speed cache memory array architecture
JPH04233050A (en) Cache-memory exchanging protcol
US5761714A (en) Single-cycle multi-accessible interleaved cache
US6535966B1 (en) System and method for using a page tracking buffer to reduce main memory latency in a computer system
JPH0198044A (en) Control method for digital memory system and memory function of digital computer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA