WO1997011419A2

WO1997011419A2 - Synchronous multi-port random access memory

Info

Publication number: WO1997011419A2
Application number: PCT/US1996/014311
Authority: WO
Inventors: Tom North; Francis Siu
Original assignee: Shablamm Computer, Inc.
Priority date: 1995-09-08
Filing date: 1996-09-06
Publication date: 1997-03-27
Also published as: WO1997011419A3

Abstract

A synchronous multi-port random access memory (200) has a plurality of memory arrays (208). Each memory array having a plurality of memory cells arranged in a predetermined number of rows and a predetermined number of columns. The columns of each memory array are interleaved. Each of a plurality of memory ports (201) has a subcache (204) coupled thereto for each connection between each of the plurality of memory ports. A programmable controller (212) enables the memory cells to enable the cells in interleaved groups responsive to address signals and applies control signals to the memory arrays, to the memory ports, and the caches.

Description

Synchronous Multi-Port Random Access Memory

Field Of The Invention

This invention relates to a memory for single processor and multiple processor computers, and, more particularly, to crossbar interleaving between processors, memory, and local bus and within the memory.

Background Of The Invention

PCs, workstations, and servers bottleneck when using main memory because there is only one path to main memory, which is slow. In many non-blocking secondary cache architecture of "high" performance systems, the main memory is the bottleneck that limits system performance, especially in multi-processor systems. Recent systems, such as multi-media systems and shared memory video, require data transfers of increasingly greater amounts of data. The requirements for data transfers requires new memory architecture.

Until just a few years ago, dynamic random access memory (DRAM) architecture has been fairly stagnant. Beyond the conventional multiplexed address DRAM, the major architectural improvement had been the video random access memory (VRAM) for video. To improve system performance, several new enhanced architectures for memory were recently developed. These architectures include

RAMBus, Extended Data Out (EDO), Burst EDO, Synchronous DRAM (SDRAM), such as manufactured by Micron or Samsung, CDRAM by Mitsubishi, EDRAM by

RAMTRON, and Multi-bank (MDRAM), such as manufactured by Mosys. These later architectures improve system performance with enhanced architectural features and higher raw performance.

It is desirable to have a memory that has a bandwidth for applications which transfer large amounts of data, such as database, file, and print servers, and multi-media, for multiprocessor workstations, servers, and personal computers. Summary Of The Invention

In the present invention, a synchronous multi-port dynamic random access memory (SMPDRAM) couples main memory directly to at least one central processing unit (CPU), a video accelerator, or at least one input/ output (I/O) processor, or a combination thereof. The SMPDRAM provides a direct port for each of these devices and provides a higher performance implementation of the multi-bank interleave protocol in U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference, to reduce contention and enhance performance. The crossbar of the SMPDRAM is incorporated into the main memory chip. The memory chip includes a direct interface to the CPU without intervening logic or a chip set. The memory chip is reconfigurable with the

configuration information supplied through a JTAG port.

With a multi-port DRAM, each CPU or processor can access memory at the same time as opposed to each having its own memory that may need to be

synchronized or each having to wait to get access to memory if it is being accessed by another. This allows the system to have greater throughput for applications where the cache hit rate is low, such as database applications, multi-media, multi-tasking, and multi-processor. Brief Description Of The Drawings

FIGs. 1a, 1b, 1c, 1d, and 1e are pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory, respectively.

FIG. 2 is a pictorial diagram illustrating a memory chip in accordance with the present invention.

FIG. 3 is a block diagram illustrating a memory array of the memory chip of FIG.2.

FIG. 4 is a block diagram illustrating a memory subarray of the memory chip of FIG.2.

FIGs. 5a, 5b, and 5c are block diagrams illustrating one, two and four single in-line memory module systems.

FIG. 6 is a block diagram illustrating a personal computer system.

FIG. 7 is a block diagram illustrating a dual CPU computer system.

FIG. 8 is a block diagram illustrating a quad CPU system.

FIG. 9 is a flowchart illustrating the reading of data from the memory chip using multi-array interleaving.

FIG. 10 is a flowchart illustrating the writing of data into the memory chip using interleaving.

FIG. 11 is a block diagram illustrating an interface between a single in-line memory module and a motherboard.

FIG. 12 is a block diagram illustrating the data bus/socket connections of the single in-line modules in a memory of a two processor system.

FIG. 13 is a block diagram illustrating the data bus /socket connections of the single in-line modules in a memory of a three or four processor system.

Detailed Description Of The Preferred Embodiments

Referring to FIG. 1a, 1b, 1c, 1d, and 1e, there are shown pictorial diagrams illustrating addressing of a conventional non-interleaved memory, a page interleaved architecture memory, a single cache line size architecture memory, a double cache line size architecture memory, and a quad cache line size architecture memory,

respectively. Referring in particular to FIG. 1a, the conventional non-interleaved memory comprises memory arrays 100-0 through 100-7 that are addressed by dividing the address space into equal consecutive blocks of addresses and assigning the blocks to the memory arrays 100. For example, the memory array 100-0, the memory array 100-1, through memory array 100-7 are addressed as 0-1M, 1-2M, through 7-8M, respectively, for an 8M memory.

Referring in particular to FIG. 1b, memory arrays 102-0 through 102-7 are addressed by dividing the address space into pages and sequentially assigning the pages to a memory array. For example, for 128 word pages, the memory array 102-0 is assigned addresses 0-127, 256-383, 512-639, through 2096896-2097023; the memory array 102-1 is assigned addresses 128-255, 384-511, 640-767, through 2097024-2097151; and the memory array 102-7 is assigned addresses 6291584-6291711, 6292608-6292735, 6293632-6293759 through 8388480-8388607.

Referring in particular to FIG. 1c, the memory arrays 102-0 through 102-7 are organized on a single cache line basis. Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning

subsequent cache lines to subsequent memory arrays. For example, for 4 words per cache line, the memory array 102-0 is assigned addresses 0-3, 32-35, 64-67, through 8388576-8388579; the memory array 102-1 is assigned addresses 4-7, 36-39, 68-71 through 8388580-8388583; and the memory array 102-7 is assigned addresses 28-31, 60-63, 92-95 through 8388604-8388607.

Referring in particular to FIG. 1d, the memory arrays 102-0 through 102-7 are organized on a double cache line basis. Memory arrays 102-0 through 102-7 are addressed by dividing the address space on a cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 8 words per cache line, the memory array 102-0 is assigned addresses 0-7, 64-71, 128-135 through 8388544-8388551; the memory array 102-1 is assigned addresses 8-15, 72-79, 136-143 through 8388552-8399559; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607.

Referring in particular to FIG. 1e, the memory arrays 102-0 through 102-7 are organized on a quad cache line basis. Memory arrays 102-1 through 102-8 are addressed by dividing the address space on a 4 cache line basis and assigning the subsequent cache lines to subsequent memory arrays. For example, for 16 words per cache line, the memory array 102-0 is assigned addresses 0-15, 128-143, 192-207 through 8388480-8388495; the memory array 102-1 is assigned addresses 16-31, 144-159, 208-223 through 8388496-8388511; and the memory array 102-7 is assigned addresses 112-127, 176-191, 304-319 through 8388592-8388607. This organization of the memory 100 reduces memory array contention in multiple central processing unit (CPU) multi-threaded applications in which each CPU, because of locality, may commonly be accessing the same memory array while running the same application. An interleaved architecture evenly spreads the addressing of the application across the memory arrays to reduce the likelihood of both CPUs accessing the same memory array. The interleave pattern is adjustable based on the type of operating system and the type of application being executed by the system. The interleaving may be the interleaving described in the U.S. patent application Serial No. 08/414,118, filed March 31, 1995, the subject matter of which is incorporated herein by reference.

Referring to FIG. 2, there is shown a pictorial diagram illustrating a memory chip 200 in accordance with the present invention. Referring to FIG. 3, there is shown a block diagram illustrating a memory array of the memory chip 200. Referring to FIG. 4, there is shown a block diagram illustrating a memory subarray of the memory chip 200. The architecture of the memory chip 200 is described for a synchronous multi-port dynamic random access memory (SMPDRAM). However, the architecture may be applied to other types of RAM, such as Static Random Access Memory (SRAM) or Flash RAM. The memory chip 200 may be in a single semiconductor device package.

For illustrative purposes, the memory chip may be organized as a 64 Mb DRAM organized with eight 8-bit or 9-bit ports. The ports can be grouped into four 16-bit ports, two 32-bit ports or one 64-bit port. In addition, there are eight 8 or 9 Mb memories; although the memory chip 200 may have other numbers of ports and memories.

The memory chip 200 uses the multi-array interleave protocol described above in conjunction with FIGs. 1b through le for reducing the timeline losses incurred when two processors attempt to access the same memory array. By interleaving on a cache line basis, each processor ties up a memory array for a shorter time and then releases the memory array for another processor that may be waiting. In the event that two CPUs attempt to access the same memory array, the access by one CPU is delayed. However, by interleaving, the delay is reduced because the first CPU which gained access to the array most likely moves to the next array if the next data that it seeks is sequential to the first. In contrast, a non-interleaved architecture where the first CPU ties up the same bank extends the delay of access of the second CPU as long as the first CPU continues to access sequential information in the same array.

The memory chip 200 includes a plurality of bi-directional input/output (I/O) ports 201-0 through 201-7, an I/O bus bar 207, a crossbar link 209, a plurality of cache selectors 222-0 through 222-7, a plurality of embedded caches 204-0 through 204-7, a plurality of crossbar switches 206-0 through 206-7, a plurality of sense amplifiers 214-0 through 214-7, a plurality of memory arrays 208-0 through 208-7, and a reprogrammable controller 212. For illustrative purposes, the memory chip 200 has eight I/O ports 201 and eight memory arrays 208. Each memory array 208-0 through 208-7 is coupled to a respective error checking and correction (ECC) circuit 210-0 through 210-7. In workstations and servers, the memory chip 200 supports error checking and correction for CPUs without direct ECC support. For personal computers (PCs) for which ECC is not required, the ECC in the memory chip 200 corrects defects and therefore increases manufacturing yield and lowering

manufacturing cost. In those CPUs with direct support or for systems that support parity, the extra bits provide a ninth bit for each port 201 as described below. The memory chip 200 also includes row decoders 216-0 through 216-3.

Each of the bi-directional I/O ports 201-0 through 201-7 has a register 202-0 through 202-7, respectively. The I/O bus bar 207 couples each register 202 to each of the plurality of cache selectors 222 via the crossbar link 209. Thus, for an 8/9 bit input to each register 202, each cache selector 222-0 through 222-7 receives 8/9 bits from each register 202-0 through 7. An I/O controller 242 provides control signals to the registers 202 for controlling the transfer of data between the ports 201 and the memory arrays 208 in response to control signals from the reprogrammable controller 212.

Referring in particular to FIG. 3, each array 208 comprises a plurality of subarrays 308-0 through 308-7. Each of the cache selectors 222-0 through 222-7 comprises subcache selectors 302-0 through 302-15 for controlling the transfer of data between the crossbar link 209 and a respective one of the caches 204. The cache selector 222 may be, for example, a plurality of pass transistors that couple one bit from the crossbar link 209 to a subcache 304. Each cache 204 comprises a plurality of subcaches 304-0 through 304-8 for storing data being transferred between the memory and the ports 201 Each crossbar switches 206 comprises a plurality of crossbar switches 306-0 through 306-15 for selectively coupling the subcaches 302 to a respective memory subarray 308. The ECC circuit 210 comprises a plurality of ECC circuits 310-0 through 310-7. Each of the plurality of ECC circuits 310-0 through 310-7 provides error checking and correction for a corresponding pair of crossbar switches 306-0 through 306-7. Each column of the subarray has a corresponding one of a plurality of sense amplifiers 314. Data is communicated to memory cells in the memory subarrays 308 per addressing described below.

Referring again to FIG. 2, at power up, the memory chip 200 receives through an interface port 224, array, and control signal configuration information for

programming the reprogrammable controller 212. The interface port 224 is preferably a JTAG port. The control signal information may be configured for specific processors, such as one family of processors, and for selecting the signal configuration of the data, such as the voltage levels of the I/O signals, e.g. low voltage transistor- transistor logic (LVTTL) or Enhanced Gunning Logic (GTL+). To provide byte write capability, the reprogrammable controller 212 provides a separate byte enable (BE) signal 223 to each port 201 for writes. The reprogrammable controller 212 provides a separate ready (BRDY) signal 226, which is programmable to be associated with any one port 201, as part of the configuration information at power up. The reprogrammable controller 212 receives address (A0-A24) signals 228 for addressing the memory arrays 208 in response thereto and in accordance with the array control information provided at power up. Such addressing is described in greater detail below. The reprogrammable controller 212 provides the address signals to the row decoders 216-0 through 216-3 for selecting rows of the memory arrays 208. The reprogrammable controller 212 provides selection signals to array /cache line selectors 218-0 through 218-3 for enabling the selective coupling of the crossbar switches 206 to the cache selectors 222.

Port identification (ID) signals 230 program the reprogrammable controller 212 to define a port number (such as port 0 through port 7) of the ports 201-0 through 201-7. To avoid bus contention, each memory chip 200 preferably operates identically so that all memory chips 200 in a bank make identical arbitration choices. Consequently, the memory chip 200 gives priority based on the port number. For example, Port 0 has the highest priority and Port 7 has the lowest priority. This allows the processors to be connected in order. The programmability of the priority allows user tuning. A clock (Clk) signal 233 provides timing control for read and write cycles.

A pair of select (SEL) signals 240 provides an identification of the memory chip 200 for addressing as described below in conjunction with FIG. 6. The memory chip 200 has an interface for receiving control signals. For interfacing with a Pentium processor for decode memory read and write operations, the control signals include: ADS, CACHE, M/IO, D/C, and W/R.

Responsive to the array control information, the reprogrammable controller 212 can configure the memory arrays 208 and the I/O ports 201 in any of a number of possible configurations. For clarity, the format of n₁ × n₂/n₂' × n₃ × n₄ is used to indicate a configuration having n₁ ports, a port width of n2 bits (or n₂' bits if no parity is used), n₃ arrays, and an array depth of n₄ bits. For example, the configuration may be 2×32/36×8×256Kb as described below in conjunction with FIG. 6; the configuration may be 4×16/18×8×512Kb as described below in conjunction with FIG. 7; the

configuration may be 8×8/9×8×1Mb as described below in conjunction with FIG. 8; or the configuration may be 1×64/72×4×256Kb (not shown). If the memory chip 200 is configured as 2×32/36×8×256Kb, for example, four ports access the selected array. In the 1×64/72×4×256Kb configuration, two memory arrays are accessed in parallel.

Control signals can be tailored for specific processors. The configuration of the memory chip 200 shown in FIG. 2 is a default configuration and is compatible with the X86 family of processors manufactured by Intel Corporation of Santa Clara, California, has 8 bit wide ports, has a 2×32×8×256K organization, has ECC, and has a single cache line interleave protocol.

The size of the memory can be incrementally expanded by adding memory chips 200 in parallel to other memory chips 200. One such embodiment is shown in FIG. 6. Different densities and configurations may be implemented. In a single CPU system, such as a PC, the memory chip 200 may be used as a conventional 64-bit wide memory to provide memory increments of 8MB or (as in FIG. 6 below) 2×32-bits wide utilizing the crossbar 206 to separate I/O and CPU accesses. A 2×32/36-bit wide configuration provides less loading but has memory increments of 16MB. Similarly, if the memory chip 200 is used as a 4×16/18 memory, memory increments are 32MB, and, if used as a 8×8 memory, memory increments are 64MB.

Each memory array 208 has a plurality of memory cells (not shown) typically connected in rows and columns. The memory cells are, for example, conventional dynamic random access memory cells. For example, for a memory of 8 arrays, the cells may be connected in 8K rows and 1,152 columns. The columns of the cells are interlaced so that the error checking and correction circuit 210 detects and corrects any single defect in the memory array 208 even ones that affect adjacent memory cells which will be in different interlace groups. The column sense amplifiers 214 are selectively connected to the crossbar switches 206 for distributing the column data to the caches 204.

The memory array 208 has three sections: a data section, an ECC section, and a hybrid section that is used for ECC if configured for ECC, or is used for additional bits per port if configured without ECC. For example, the memory array may have 8K×1K for data, and 8K×128 that is used for ECC or, if configured without ECC, is the 9th bit per port.

The arrays 208 are interlaced to ensure that no single defect or alpha hit causes an ECC failure. So, for example, for a two way interlaced array, the columns are divided into two groups each with its own ECC data bits. The two groups span alternating columns. So, the columns might be labeled: A0, B0, A1, B1, A2, B2, A3, B3, and so forth.

Each ECC circuit 210-0 through 210-7 has a conventional ECC generator for writes and a conventional checker for reads. The ECC circuit 210 corrects a single bit error and detects a double bit error. The ECC circuit 210 therefore checks ECC during an array read and generates ECC during writes. ECC failures are reported via the interface port 224.

Referring in particular to FIG. 4, a subarray 308 comprises a plurality of column groups 401-0 through 401-17. For illustrative purposes each sub array is divided into 144 columns. Here, FIG. 4 shows the subarray 308-0 having columns 0 through 143, the crossbar switches 306-0 and 306-1, the caches 304-0 and 304-1, and the cache selectors 302-0 and 302-1. The other subarrays 308, crossbar switches 306, caches 304, and cache selectors 302 have identical architecture.

Each I/O port 201-0 through 201-7 is coupled through a respective interconnect group 402-0 through 402-7 of the crossbar link 209 to each of the cache selectors 302-0 and 302-1 for selective coupling to a respective cache 304-0 and 304-1.

Each cache 304 comprises subcaches 404-0 through 404-9. Each cache 304 is 4 words deep × 36 bits wide and can store at least four cache lines of a processor in the X86 family of processors manufactured by Intel Corporation of Santa Clara, California. Each cache 304 can post data for writes until the memory array 208 is available or prefetch the next consecutive cache line for reads. Each subcache 404 has an associated tag used by the reprogrammable controller 212 to determine if there is a cache hit. Each cache selector 302 selectively couples the interconnect groups, and thus the I/O ports 201-0 through 201-7, to the subcaches 404. The pair of crossbar switches 306-0 and 306-1 comprises crossbar switches 406-0 through 406-17 for selectively coupling the subcaches 404-0 through 404-8 of both caches 304-0 and 304-1 through the ECC circuit 310-0 to the sense amplifiers 314.

Referring to FIGs. 2-4 together, the size of the cache is a trade off of the economics of a smaller cache versus the storage capacity of a larger cache. The greater number of cache lines, the less likely that an array access is required and consequently a page miss is risked. The fewer cache lines that are stored, the more quickly the array is released for any pending access from another port. The cache size can be altered by changing the number of columns in each array. A larger cache requires more columns and consequently fewer rows.

To access data in an array 208, first its row (or "page") is selected. The data is sensed and latched in the corresponding column sense amplifiers 214. While the error checking and correction circuit 210 checks the data for each row of the memory array 208 as described earlier herein, the data in the addressed columns is routed via the crossbar switch 206 to the appropriate cache 204. The data goes through the cache select 222 of the port, through the link 209, to the I/O bus bar 207, through the I/O port 201, and to the I/O. This avoids having to access the array 208 and incur a potential page miss penalty if the array 208 has subsequently been accessed by another processor to another page.

The crossbar switches 206 also facilitate SNARFing, in which one CPU can be reading the data that another CPU is writing to the array. In that case, one of the memory arrays 208 is linked to both one of the ports 201-0 through 201-7 where the data is being written and at least one of the ports 201-0 through 201-7 where the data is to be read. Similarily, data can be transferred from one port to another such as when a CPU accesses I/O directly. The columns of the memory arrays 208 are grouped, for example, in groups of 8. Each column group is connected to the crossbar switch, which selectively connects the columns to the caches 204 responsive to the array and control configuration

information. Here, the crossbar switch is an 8 x 8 switch. When a port 201 is connected to an array 208, based on cache select bits, described below in conjunction with Tables I through IV, the A0 and A1 address signals 228, and, if a burst, the cache line interleave protocol (linear or gray scale), one of the 16 subcaches 300 for that port 201 are connected through its cache select 222 to the I/O bus bar 207 of the port 202 for each cycle, until the transaction is complete. Each subcache has one bit for each I/O bit. Consequently, it takes 4 subcaches to supply a cache line and 4 caches lines can be cached in each array for each port or group of ports at a time.

Depending on the port configuration, the cache lines and array are addressed as shown below in Table I. The interleave protocol may provide, for example, single, double, or quad interleaving of 4, 8 or 16 words cached, respectively.

In Table I, the number in the parentheses, ( ), equals the number of words cached; the A0-A1 address signals 228 are used in nonburst operations to select an individual word within a cache line.

The distribution of cache lines forms the pattern shown in Table II. The most significant address (A_H) is the A22 address signal 228.

The memory arrays 208 may be split into two groups working together; each group has its own interleave pattern. Each group of memory arrays 208 has

interleaved addresses that are different than the interleaved addresses of the other groups of memory arrays. For example, the memory arrays 208 may be split into a first group of two arrays 208-0 through 208-1 for shared memory video and a second group of six arrays 208-2 through 208-7 for main memory. The six arrays 208 interleave among themselves and the two arrays interleave among themselves. The possible groupings are: 2/6, 3/5, 4/4. However, only the 4/4 or 8/0 grouping is used if there are more than one bank of chips. The grouping affects the array and cache line selects. Each grouping has its own unique decoding to ensure no two consecutive cache lines are in the same array and to simplify the decoding. The cache lines and array are addressed in 4 array interleaving as shown in Table III.

The 4/4 distribution of cache lines forms the pattern shown in Table IV.

For 2 / 6 and 3/5 interleavvng, a pattern similar to the pattern shown in Table IV may be used. However, Table V shows an alternate pattern which simplifies the address decoding.

where A_H is the most significant address and A_L is the least significant address.

For 6L/2U (6 arrays are the lower arrays and 2 arrays are the upper arrays) interleaving, the address selection signals are defined as:

AS₂ = /A_L+1/A_L+A_HA_H-1+A_H/A_L;

AS₁ = /A_HA_L+A_HA_H-1;

AS₀ = A_H-2;

CLS₁ = A_H/A_H-1+A_L+1A_L+A_HA_L;

CLS₀ = /A_HA_H-1+A_HA_L+1

The 6L/2U distribution of cache lines forms the pattern shown in Table VI.

For 2L/6U interleaving, the AS signals are inverted.

For 5L/3U interleaving, the address selection signals are defined as:

AS₂=A_L+1A_L/A_H+A_HA_H-1+A_HA_H-2;

AS₁=/A_H/A_L+1+A_H/A_H-1/A_LA_H-2+A_HA_LA_H-1+A_HA_H-1/A_H-2; AS₀=/A_L+1/A_L/A_H + /A_H/A_LA_H-1 + A_H/A_L + A_H/A_H-1A_H-2;

CLS₁ = A_H-1A_H-2 + A_H-1/A_H-2/A_L + /A_H-1/A_H-2A_L;

CLS₀ = /A_HA_H-2 + A_H-1/A_L+1A_L + A_L+1A_LA_H

The 5L/3U distribution of cache lines forms the pattern shown in Table VII.

For 3L/5U interleaving, the AS signals are inverted.

Other patterns may be used. These patterns simplify the decoding while ensuring that no two consecutive cache lines are in the same array. Each pattern is in 4 cache line increments. It is repeated through each page. The number of increments per page depends on the port grouping.

Referring to FIGs. 5a, 5b, and 5c, there are shown block diagrams illustrating one, two and four single in-line memory module systems 500, 501, and 502,

respectively. Referring in particular to FIG. 5a, the system 500 comprises a pair of processors 504-0 and 504-1, and a single in line memory module 506-0 which comprises synchronous multi-port dynamic random access memories (SMPDRAMs) 508-0 through 508-3. The module 506 may be the SIMM 1102 described below in conjunction with FIG. 11. The connections of the system 500 may be as described below in conjunction with FIG. 12. A data bus 510 of the processor 504-0 is divided into groups 510-0 through 510-3, each group having a predetermined number of bits. Similarly, a data bus 512 of the processor 504-1 is divided into groups 512-0 through 512-3, each group having a predetermined number of bits. The groups 510-0 through 510-3 and 512-0 through 512-3 preferably each include the same bits of the respective data bus 510 and 512. Each of the groups 510-0 through 510-3 is coupled to a respective SMPDRAM 508-0 through 508-3; and similarly each of the groups 512-0 through 512-3 is coupled to a respective SMPDRAM 508-0 through 508-3.

Referring in particular to FIG. 5b, the system 501 comprises a pair of processors 504-0 and 504-1, and a pair of single in line memory modules 506-0 and 506-1, each module 506 comprising SMPDRAMs 508-0 through 508-3. The module 506 may be the SIMM 1102 described below in conjunction with FIG. 11. The connections of the system 501 may be as described below in conjunction with FIG. 12. A data bus 514 of the processor 504-0 is divided into groups 514-0 through 514-7, each group having a predetermined number of bits. Similarly, a data bus 516 of the processor 504-1 is divided into groups 516-0 through 516-7, each group having a predetermined number of bits. The groups 514-0 through 514-7 and 516-0 through 516-7 preferably each include the same bits of the respective data bus 514 and 516. The group 514-0, 514-2, 514-4, 514-6 and the group 516-0, 516-2, 516-4, 516-6 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-1. The group 514-1, 514-3, 514-5, 514-7 and the group 516-1, 516-3, 516-5, 516-7 each are coupled to respective SMPDRAMs 518-0 and 518-3 of the module 506-0.

Referring in particular to FIG. 5c, the system 502 comprises a pair of processors 504-0 and 504-1, and four single in line memory modules 506-0 through 506-3, each module 506 comprising SMPDRAMs 538-0 through 538-3. A data bus 524 of the processor 504-0 is divided into groups 524-0 through 524-7, each group having a predetermined number of bits. Similarly, a data bus 526 of the processor 504-1 is divided into groups 526-0 through 526-7, each group having a predetermined number of bits. The groups 524-0 through 524-7 and 526-0 through 526-7 preferably each include the same bits of the respective data bus 524 and 526. The group 524-0, 524-2, 524-4, 524-6 and the group 526-0, 526-2, 526-4, 526-6 each are coupled to both

SMPDRAMs 528-2 and 528-3 of the modules 536-0 through 536-3. The group 524-1, 524-3, 524-5, 524-7 and the group 526-1, 526-3, 526-5, 526-7 each are coupled to both SMPDRAMs 528-0 and 528-1 of the modules 536-0 through 536-3.

Referring to FIG. 6, there is shown a block diagram illustrating a personal computer (PC) system 600 having a memory 602 organized in a 2x32/36 configuration, a central processing unit (CPU) 604, and an I/O processor 606. The memory 602 includes banks 608-0 through 608-3, each bank 608 comprising memory chips 200. For a 64-bit processor, such as a Pentium processor, each bank is 64-bits wide. Two additional address lines (A23-A24) that match the two select (SEL) pins 240 of the memory chips 200 provide the unique address for each bank 608 of chips 200. With these additional address lines, up to four banks of chips can be accommodated without additional external decoding.

Referring to FIG. 7, there is shown a block diagram illustrating a dual CPU computer system 700. Such a system may be used for a personal computer or workstation. An I/O bus 702 connects CPUs 704-1 and 704-2, and a video processor 708 for direct reading or writing by either CPU 704 to an I/O (not shown) through an I/O processor 706. The I/O bus 702 may be, for example, a high speed I/O bus, such as a RAMBus, or a mini-I/O bus, such as that used in the Triton chip set manufactured by Intel Corporation of Santa Clara, California. I/O memory transfers are handled through the I/O processor 706 and the memory bus associated with the I/O processor 706.

Memory buses 714-1 and 714-2 couples the CPUs 704-1 and 704-2, respectively, to a plurality of SMPDRAM memories 711-0 through 711-3 for communicating data. The memories 711 may be the memory chip 200. For simplicity, memories 711 are shown with four data ports and four memory arrays. Each of the four data ports comprises two of the eight 8 bit ports as already described. And, there are 8 arrays in each chip instead of the four shown for simplicity. The memories 711-0 through 711-3 include arrays 712-0 through 712-3, 712-4 through 712-7, 712-8through 712-11, and 712-12 through 712-15, respectively. The interconnections within the memories 711 is shown diagrammatically as a cross bar interconnection in FIG. 7.

A memory bus 716 couples the video processor 708 to the memories 711-0 through 711-3 for communicating data. A memory bus 718 couples the I/O processor 706 to the memories 711-0 through 711-3 for communicating data. For each data bus 714, 716, and 718, the same bits of the data bus are coupled to the same memories 711. For example, bits 0 through 15 of the data buses 714 for each CPU 704, bits 0 through 15 of the data bus 716 for the video processor 708, and bits 0 through 15 of the data buses 718 for the I/O processor 706 each are coupled to the memory 711-0.

Each memory array 712-0 through 712-15 provides a BRDY signal to a respective processor. The CPUs 704-1 and 704-2, the I/O processor 706, and the video processor 708 provide address signals to the memory arrays 712-0 through 712-15 over an address bus 714. The control signals (not shown) allow the processors to arbitrate for the address bias (as done with the P6 processor of Intel) and then control the memories directly.

Because each array 208 has its own page hit or miss, each memory array 711 provides a separate ready signal (BRDY) 226 to each CPU 704, the video processor 708, and I/O processor 706. Each memory array 711-0 through 711-3 provides the BRDY signal 226 from a different port. The port may be programmed at power up as part of the programming of the reprogrammable controller 212. This is sufficient since all memory chips comprising a bank of memory chips respond identically to each read or write operation. The system 700 has at least as many memory chips as processors.

Referring to FIG. 8, there is shown a block diagram illustrating a quad CPU system 800. The quad CPU system 800 may be used, for example, as a server. The quad CPU system 800 has four CPUs 802-0 through 802-3 and four I/O processors 804- 0 through 804-3. Memory banks 807-0 and 807-1 each comprise memory chips 808-0 through 808-7, which may be the memory chip 200. A port 810 of each memory chip 808-0 through 808-7 of the memory bank 807-0 is coupled to the same port 810 of a respective memory chip 808-0 through 808-7 of the memory bank 807-1. Data buses 805-0 through 805-3 of respective CPUs 802-0 through 802-3 couples each CPU 802 to respective ports 810 of each memory chip 200. Data buses 806-0 through 806-3 of respective I/O processor 804-0 through 804-3 couples each I/O processor 804 to respective ports 810 of each memory chip 200. Thus, for an 8 port memory chip 200, the quad CPU, quad I/O processor configuration utilizes every port of the memory chip 200. For large arrays, a 32-bit wide memory chip 200 may be organized as 8 ports x 4 bits to reduce data bus loading. To allow for more CPUs 802 or I/O processors 804, memory chip 200 could be reconfigured with more ports: for example, 164-bit ports. For each data bus 805 and 806, the same bits of the data bus are coupled to the same memory chip 808. For example, bits 0 through 15 of the data buses 805-0 through 805-3 for each CPU 802 and bits 0 through 15 of the data buses 806-0 through 806-3 for each I/O processor 804 are coupled to memory chips 808-0 of the banks 807.

Referring to FIG. 9, there is shown a flowchart illustrating the reading of data from the memory chip 200 using multi-array interleaving. In a read cycle 900, first, the port is connected 902 to the link through the corresponding I/O. If 904 the data requested is in the cache 204, a cache hit, it is moved 906 to the port and is supplied to the I/O. The memory chip 200 provides a BRDY signal 228 to notify the corresponding CPU that the data is now present on the I/O. The burst counter is incremented 907 and data from the cache 204 continues to cycle to the I/O until the burst is complete 908.

If 904 the data requested is not in the cache 204, the cache 204 is connected 912 to the array through the crossbar switch 206 as soon as it is available 910. If 914 the data is in the page currently accessed by the array 208, the cache line requested is delivered to the cache 204 and moved 906 onto the I/O as described above. If 914 the page is not already accessed, a page access 916 is initiated. Once the page is available, the data moves to the crossbar 206/cache 204/I/O as already described while its ECC is checked.

In parallel, if 920 the next cache line is not in the array 208, then when 922 the next array is not busy, the array 208 is linked 924 to the cache 204. If 926 the correct page is not being accessed, the appropriate page is accessed 928 with the next cache line and ECC is performed. This next cache line is then placed 930 in its cache for the same port in anticipation that the port will next request it.

Referring to FIG. 10, there is shown a flowchart illustrating the writing of data into the memory chip 200 using interleaving. In a write cycle 1000, if the port is enabled, the port is linked 1002 to the cache 204 and the write data is posted 1004 to the cache throughout the burst. The memory chip 200 provides a BRDY signal to notify the corresponding CPU that the data has been posted. The burst counter is

incremented 1008 and data is posted 1004 to the cache until 1006 the burst is complete. After the burst is finished, when 1010 the array is available, the array 208 is linked 1012 to the cache 204 to access the correct page to prepare for a write. If 1014 the next cycle is a write cycle to the same page, write data is posted 1004 to the cache 204 as described above. Otherwise 1014, the ECC is checked 1016 and generated as soon as all the data to be written for the interlace group is present. The write to the array 208 is completed when there is no new data to be written to the page.

Referring to FIG. 11, there is shown a block diagram illustrating an interface between a single in-line memory module (SIMM) 1102 and a motherboard 1104. The motherboard 1104 may be, for example, a conventional motherboard of a conventional personal computer. The SIMM 1102 and the motherboard 1104 may be used in the systems 501, 502 and 503 (FIGs. 5a-5c). The single in-line memory module (SIMM) 1102 comprises a plurality of memory chips 1106-0 through 1106-3. The memory chip 1106 may be the memory chip 200. The SIMM 1102 has 8 16-bit ports having one load per data line and 4 16-bit ports with two loads per data line. More specifically, memory chips 1106-0 and 1106-1 each have ports A, C, E, and G coupled to respective data buses A, C, E, and G. Similarly, memory chips 1106-2 and 1106-3 each have ports B, D, F, and H coupled to respective data buses B, D, F, and H. Memory chips 1106-0 through 1106-3 each have ports A', B', C, and D' coupled to respective data buses A', B', C, and D'.

Referring to FIG. 12, there is shown a block diagram illustrating the data bus/socket connections of the SIMMs of a two processor system 1200. Data buses 1203-0 through 1203-3 couple respective ports A and A', B and B', E and A', F and B' of sockets 1202-1 through 1202-4 to a processor 1201-0. Data buses 1203-4 through 1203-7 couple respective ports C and C, D and D', G and C, H and D' of sockets 1202-1 through 1202-4 to a processor 1201-1. Table VIII shows the addressing of the sockets 1202.

In the two processor system 1200, each socket 1201-1 through 1202-4 may receive a bank of one SIMM 1102. The system 1200 may have one to four banks. A system 1200 having one bank of SIMMS 1102 has the databuses A-G of the one bank coupled to the socket 1202-1. A system 1200 having two banks of SIMMS 1102 has the databuses A-G of the two banks coupled to the sockets 1202-2 and 1202-3. A system 1200 having three banks of SIMMS 1102 has the databuses A-G of the three banks coupled to the sockets 1202-1 through 1202-3. A system 1200 having four banks of SIMMS 1102 has the databuses A'-D' of the four banks coupled to the sockets 1202-1 through 1202-4.

Referring to FIG. 13, there is shown a block diagram illustrating the data bus/socket connections of the SIMMs of a memory 1300 in a three or four processor system 1300. In the three or four processor system 1300, each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102. For a two SIMMs 1102 per bank architecture, the data buses on the second SIMM 1102 are labeled I through P and match the A through H buses of FIG. 11. Data buses 1303-0 through 1203-3 couple respective ports A and A', B and B', I and I", J and J' of the sockets 1202-1 through 1202-4 to a processor 1301-0. Data buses 1303-4 through 1203-7 couple respective ports C and C, D and D', K and K', L and L' of the sockets 1202-1 through 1202-4 to a processor 1301-1. Data buses 1303-8 through 1203-11 couple respective ports M and I', N and J', E and A', F and B" of the sockets 1202-1 through 1202-4 to a processor 1301-2. Data buses 1303-12 through 1203-15 couple respective ports O and K", P and L', G and C, H and D' of the sockets 1202-1 through 1202-4 to a processor 1301-3. In the three or four processor system 1300, each socket 1302-1 through 1302-4 may receive a bank of two SIMMs 1102. The system 1200 may have one to four banks. A system 1300 having one bank of SIMMS 1102 has the databuses A-P of the one bank coupled to the socket 1302-1. A system 1300 having two banks of SIMMS 1102 has the databuses A-P of the two banks coupled to the sockets 1302-2 and 1302-3. A system 1300 having three banks of SIMMS 1102 has the databuses A-P of the three banks coupled to the sockets 1302-1 through 1302-3. A system 1300 having four banks of SIMMS 1102 has the databuses A'-D' and F-L' of the four banks coupled to the sockets 1302-1 through 1302-4.

Referring now to both FIGs. 12 and 13, the dots on the sockets 1202-1 and 1302-1 represent an electrical connection for a system having one bank 1201 and 1301 of SIMMs. The O's on the sockets 1202-2 and 1202-3 and 1302-2 and 1302-3 represent an electrical connection for a system having two banks 1201 and 1301 of SIMMs. The X's on the sockets 1202-1 through 1202-4 and 1302-1 through 1302-4 represent an electrical connection for a system having four banks 1201 and 1301 of SIMMs. A system having three banks has electrical connections represented by both the dots and the O's. The loading in this system is doubled. All buses A-H are enabled in the one bank system. In a two bank system, only the indicated buses are enabled. In the sockets 1202-2 and 1302-2 buses E, F, G, and H are enabled. In the sockets 1202-3 and 1302-3 buses A, B, C, and D are enabled. In a four bank system, the SIMMs are connected using the A' through D' buses, which are on the side of the SIMM opposite the side of the A through D buses.

In summary, the memory chip provides configurable connections between memory arrays and memory ports with interleaved addressing of the memory arrays. This allows multiple concurrent accesses to the memory arrays.

Claims

We Claim:

1. A memory comprising:

a plurality of dynamic random access memory arrays, each memory array having a plurality of memory cells arranged in a predetermined number of rows and a predetermined number of columns;

a plurality of memory ports; and

a crossbar switch selectively connecting the plurality of memory arrays to the plurality of memory ports.

2. The memory of claim 1 implemented in a semiconductor device.

3. The memory of claim 1 further comprising:

a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals, the interleaved groups being selected responsive to array control information; and

an interface for receiving the array control information for programming the programmable controller.

4. The memory of claim 1 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having its own interface.

5. The memory of claim 1 further comprising

a plurality of caches, one cache being coupled to each of the plurality of arrays for each port-array connection.

6. The memory of claim 5 implemented in a semiconductor device.

7. The memory of claim 5 further comprising:

a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals and for applying control signals to the memory arrays, to the memory ports, and to the caches to enable transfers between the memory arrays, the memory ports, and the caches, the interleaved groups being selected responsive to array control information; and an interface for receiving the array control information for programming the programmable controller.

8. The memory of claim 5 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having a different interface.

9. The memory of claim 5 further comprising an error checking and correction circuit coupled to the memory arrays.

10. The memory of claim 9 implemented in a semiconductor device.

11. The memory of claim 9 further comprising:

a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals and for applying control signals to the memory arrays, to the memory ports, and to the caches to enable transfers between the memory arrays, the memory ports, and the caches, the interleaved groups being selected responsive to array control information; and

12. The memory of claim 9 implemented on a single in-line memory module that is reconfigurable to allow point to point connections between the plurality of memory ports and one of a plurality of external processors, each processor having a different interface.

13. The memory of claim 5 wherein the interleving of the memory arrays has an address for one cache line assigned to one of the plurality of memory arrays and has an address for a subsequent cache line assigned to another of the plurality of memory arrays, said address for one cache line enabling said one of the plurality of memory arrays for a data transfer, and said address for a subsequent cache line enabling said another of the plurality of memory arrays to prefetch a subsequent cache line or a group of cache lines for a subsequent data transfer.

14. The memory of claim 1 further comprising an error checking and correction circuit coupled to the plurality of memory arrays with the columns of each memory array being interlaced .

15. The memory of claim 1 wherein the crossbar switch selectively connects at least two of the plurality of ports to the same one of the plurality of arrays for accessing thereof by said at least two of the plurality of ports.

16. The memory of claim 1 wherein the plurality of memory arrays comprises first and second groups of memory arrays, each group of memory arrays having interleaved addresses different than the interleaved addresses of the other group of memory arrays.

17. A reconfigurable memory comprising:

a plurality of memory cells arranged in a predetermined number of rows and in a predetermined number of columns;

a plurality of memory ports selectively coupled to the plurality of memory cells responsive to control signals, and having an output for providing data signals having one of a plurality of selectable data signal formats responsive to a selection signal, each of the plurality of selectable data signal formats corresponding to a protocol of an external processor;

a programmable controller coupled to the plurality of memory cells for applying enabling signals thereto to enable the cells in interleaved groups responsive to address signals, for generating the selection signal, and for generating the control signals, the control signals being configurable to match the control signal protocol of one of a plurality of external processors; and

an interface for receiving information for programming the programmable controller.

18. The reconfigurable memory of claim 17 wherein the data signal formats correspond to a voltage level for states of the data signal.

19. A computer system comprising:

a plurality of processors,

a plurality of data buses, each of the plurality of data buses being directly connected to one of the plurality of processors; and

a plurality of memories, each of the plurality of memories being directly connected to a portion of the separate data buses that is the same for each of the plurality of processors.