Novel Row and Column Select Pre-Decoding Scheme for Semiconductor Memories
Technical Field
This invention relates to the field of semiconductor memories and, more particularly to pre- decoding circuits for random-access semiconductor memories.
Description of Related Art
A: Background of the Invention
As widely described in the literature, the storage cells of a semiconductor memory device are arranged in one or several rectangular arrays. To access a particular cell in the rectangular array, its X and Y coordinates are referenced. For example, a 16,777,216 (16M) cell storage array can be implemented as a 4,096 rows by 4,096 column rectangular array of cells. Each element is then accessed by activating one of 4,096 horizontal Row-Select lines, and by selecting one of 4,096 vertical column lines. A subset of the address lines - in this example, the most significant 12 bits - is used to specify the horizontal line, and another subset - for example, the lower 12 bits - is used to specify the vertical column. If more than a single array is used, a third subset of the address lines may be used to select one of several arrays.
The selected column is usually routed to a sense-amplifier, which converts the voltage or current on the selected line to a binary representation, corresponding to the programming of the selected memory cell.
Memory devices include a decoder which activates one of the horizontal lines. This decoder is commonly referred to as X-Decode or Row Decoder in the literature. The memory device also includes a selector to select one of the vertical columns. This selector is commonly referred to as the Y-Select or Column-Selector in the literature.
Note should be paid to the fact that memories are often organized whereas the read word is of more than one bit - most notably 8 bit, or a byte. This fits with the description above and with further descriptions below, if we define that each of the vertical columns mentioned in this disclosure may include more than one bit and more than one physical line; for example when the memory is byte wide, each column comprises 8 lines, and 8 bits are read concurrently.
In some cases the Y selection is done before the selected column is routed to the sense amplifiers; in others each column has its own sense amplifier and the Y selection it is done after the sense amplifiers, and yet in others it is done partly before and partly after the sense amplifiers.
One prior art example of how a semiconductor memory is accessed can be found in US patent 4,104,735 by Hoffmann et al.
The purpose of the current invention is to improve the speed, power consumption and density of the X-Decode circuits and the Y select circuits. Hence, the prior art for those circuits will now be described to more detail.
B: Prior Art X-Decode Circuits
As described above, the function of the X-Decode circuit is to drive one horizontal line, as specified by a subset of the address lines. If this subset includes n lines, then this decoder drives one of 2" lines active, and keeps the others inactive.
The straight forward way to implement the X-Decode circuit is to generate for each address input Ai, a pair of complementary signals - Xi and ~Xi, where Xi is logically equivalent to Ai and ~Xi is equivalent to its inverse. Next, for each row in the array, an n-input AND gate is implemented, with its input connected, for each address bit i, to either Xi or ~Xi. Thus, for example, for a 4,096 row array, the X-Decode has 12 address lines; the X-Decode AND gate of a row whose address is (from MSB to LSB) 000011010100 has its inputs connected to ~Xιι,~Xιo,~X9,~X8,X7> X6, ~X5, Xt, ~X35 X2s ~Xis Xo-
A logic schematics of such an X-Decode circuit is depicted in Figure 1. For clarity, a smaller size decoder is shown, with four address lines and 16 rows.
It should be noted that, in practice, the AND gates are typically implemented as NOR gates of the inverted inputs. It should also be noted that the NOR gates are usually not implemented as a fully complementary CMOS gate but, rather, as a precharged two-phase gate (a practical implementation will be depicted in Figure 2, along with the Pre-Decoding concept).
Each vertical line in Figure 2 drives 8 inputs of 8 of the 16 AND gates. For the complete 12- address X decoder described above, each vertical line drives 2,048 inputs of 2,048 out of the 4,096 AND gates. Each such AND gate has 12 inputs. Those numbers represent heavy loading and result in poor delay-power product. In other words, for a given power budget the decoder will be slow; or for a given speed it will consume a lot of power.
To overcome this problem, the concept of pre-decoding has been introduced and is widely used in the previous art. See, for example, US Patent US 4,194,130, by Moench.
For the 4,096 row example, one possible pre-decoding scheme would be to generate four possible combinations for each pair of two bits Ai, Aj :
• ~Ai*~Aj • ~Ai*Aj Ai*~Aj • Ai*Aj
Now, the 24 vertical lines Ai i ..A0 and ~Aι i ..~A0
Are replaced by 6 groups of four pre-decoded address lines:
• [~A11*~A10, ~A11*A10, A11*~A10, A11*A10], • [~A9*~A8, ~A9*A8, A9*~A8, A9*A8], • [~A1*~A0, ~A1*A0, A1*~A0, A1*A0].
Figure 2 depicts a 4,096 row decoder using the pre-decoding described above. Six pre- decoders are used, each for two of the 12 address lines. The structure of each pre-decoder is illustrated at the box in the bottom of the figure. Note that the vertical lines will be at logic Low when the Clock input is at High. When the Clock is Low, one of the four outputs will switch to Logic High, according to the logic state of the two input address lines.
Each group of four vertical lines drives the gates of N-MOS transistors. For every horizontal X-decoder line, there will be a pass transistor with each gate connected to one of the four lines. Only those horizontal lines for which the pass transistor connects to the vertical line which is at logic High will conduct. The vertical line at Logic High will be the that output which corresponds to the logic state of the two address lines which are input to the relevant pre-decoder.
Each horizontal line connects to 6 transistors in series, each for every group of four pre- decoded address lines, and then to a ground terminal at the left end. The result would be that a horizontal X-decoder line will be pulled to ground if and only if all of its six transistors have their gates driven by vertical lines at logic High, corresponding to a specific combination of the 12 address lines. The transistors are so assigned as to have one and only one such horizontal line pulled low for every combination of the 12 address lines.
For example, D4095, which corresponds to all 12 address bits at high, connects to the left line of each of the six groups of pre-decoded lines, These lines will be at Logic High only when the two corresponding address lines are at High, and, as a result, D4095 will be pulled low only when all 12 address lines are at Logic High.
The transistors at the right end of the horizontal lines pre-charge the horizontal lines to High when Clock is at logic High.
As can be observed, thanks to the pre-decoding, the load of the vertical lines is now only 1,024 inputs. Moreover, each AND gate has six inputs only.
The advantage is big. The pre-decoding is done using a small number of gates, and its contribution to the power consumption, to the delay and to the area is relatively insignificant. The area of the decoder remains basically unchanged if we assume that the width of the X-
Decoder is set by the number of vertical metal lines. The power-delay product improves significantly, as the capacitive loading on the vertical lines and the number of inputs in the NOR gates is down by 50%.
Other pre-decoding schemes are used, for example pre-decoding groups of 3 bits. This will result in more area (32 rather than 24 vertical lines), but better power-delay product thanks to lower loading of the vertical lines (512 gates each) and NOR gates with 4 rather than 6 inputs.
In general, various previous art pre-decoding schemes provide trade-off between power-delay product and area.
The disadvantages of previous art X-Decoder is that the tradeoffs that it can offer between power-delay product and area are limited. The new invention provides significantly better trade-offs.
C: Prior Art Y select Circuits
Figure 3 depicts a previous art Y-select circuit for 4,096 columns. It is composed of layers. There are 4,096 transistors in the top layer which is controlled by A0, 2,048 transistors in the second layer, controlled by Al, and so on, until in the bottom layer there are two transistors controlled by Al 1. There are 24 horizontal lines, which directly affect the size of the Y- Select circuit. In addition, there are 12 transistors in series in the route from the sense amplifier to each of the columns.
Figure 4 depicts the same using a pre-decoded configuration. One should note the following:
• The speed of the circuit is, to a large extent, a function of the number of series transistors on the path from the column in the array to the sense amplifier. In this respect, the number of devices is 12 in the non-pre-decoded case, versus 6 for the pre- decoded case. • The speed is usually NOT a function of the loading of the pre-decode lines. The reason for that is that in a memory device the decoding can be easily done in the time it takes to do X-decoding and reading the bit values into the columns of the array - by
the time the column is ready, the Y section is done. (This is not the case for some types of devices where the address is applied to the memory in two parts. For example - DRAM).
• The area of the Y-selector is proportional to the number of select lines that go through the selector (the horizontal lines in Figs 3,4). In this respect, the area of Figure 3, with 24 horizontal lines, is the same as that of Figure 4.
The disadvantage of the previous art Y selector is that it is slow (in terms of the number of series transistors), and it consumes a large area, when compared to the new invention.
We will next describe the current invention.
Description of the Current Invention
A: Brief Description
According to the new invention, the n address bits of the semiconductor device, which are to be decoded to 2n word lines, are coded to a special representation of k bits, where a fixed number of bits - m - is always at logic high. The number of bits for this representation - k - is greater than n, but it is less than n/2. We will show that when this representation is used, a simpler X-decoder can be implemented, where the number of vertical lines is k, and where of transistors per AND gate is m.
B: Brief Description of the Drawings
Figure 1 depicts a prior art X-decoder, without pre-decoding
Figure 2 depicts a prior art X-decoder, with 2-2-2-2-2-2 pre-decoding
Figure 3 depicts a prior art Y-select circuit, without pre-decoding
Figure 4 depicts a prior art Y-select circuit with 2-2-2-2-2-2 pre-decoding
Figure 5 depicts a simple X-decoder, for 16 word lines, according to the new invention
Figure 6 illustrates several area/power-delay points for several new invention and the prior art embodiments.
Figure 7 depicts a simple 8-input column selector built according to the new invention
Figure 8 depicts the same, with merged lines for power saving
A: Coding
The current invention converts the binary representation of a k bit address sub-field to a code where m bits out of n are at logic high, and the others are at logic low.
Here is a basic example, showing how a four bit address space is represented by 3/6 code:
If this code is used to pre-decode a group of 4 bits, 6 lines will be needed, as compared to 8 needed for the previous art pre-decoding.
An example of a simple 16-row X Decoder based on this coding is depicted in Figure 5. The rectangle titled "4 bit => 3/6 pre-decoder" is a straight forward implementation of the table described above, with the addition that when EN is high, all outputs are at Low. Pull-down transistors are denoted by small rectangles with a Ground connection, while pre-charge transistors are denoted by small rectangles with Vdd connection.
The cost of this new invention is a more complex pre-decoder. However, as the pre-decoder is local and does not drive heavy capacitive loads, it can be implemented in a small area consuming little energy and adding negligible time to the propagation delay, when compared to the savings in those parameters that it brings about.
B: Example - 4 bit to 3 of 6 Translation Logic
A simple 4 bit to 3 of 6 translation logic can be implemented, for example, using the following biliary equations:
Mi = A3 * ~A2 + ~A3 * A2 *~Aι + ~A3 * A2 * ~A0 + ~A3 * ~Aι * ~A0 M2 = ~A3 * ~Aι + A2 * ~Aι + ~A3 * ~A0 M3 = ~A3 * ~A2 + A3 * ~A0 + A3 * A2 * Ai + A3 * ~A2 * ~Aι M4 = A2 * ~A, * ~Ao + A3 * ~Aι * ~A0 + A2 * A, * A0 + A3 * A2 * Ai M5 = ~A2 * A i + ~A3 * Ai + Ai * ~A0 + A3 * A2 * ~Aι * A0 M6 = A0
The table corresponding to the above equations is:
C: Example - 6 bit to 2 of 12 Translation Logic
Another example is translation from 6 bit to 2/12, per the following equations (here we use a short-cut representation, where 1 in the ith position means A, , 0 means ~A, and X means that that particular Ai input does not affect the minterm):
Ml = 00X101 + 0X0101 + 111XXX M2 = 000101 + X10100 + 110XXX M3 = 01X101 + 01010X + 101XXX M4 = 0X1101 + 100XXX M5 = 00X110 + 0X0110 + 0110XX + 1XX111 M6 = X00110 + 0X1100 + 0100XX + 1XX110 M7 = 01X110 + 001X00 + 0010XX + 1XX101 M8 = 0X1110 + 0111X0 + 0000XX + 1XX100 M9 = 0X0X11 + 00XX11 + XXX011 M10 = 000111 + 000100 + XXX010 Mil = 01X111 + 000100 + XXX001 M12 = 0X1111 + XXX000
The table of the values is as follows:
One knowledgeable in the art will be able to use this and similar tables and sets of equations to implement the translation functions, for example using "domino logic".
D: Optimal and Sub-Optimal Coding
From basic combinatorial theory, the number of n set bit out of m is given as
ml(n - m)\
For example, there are 1,287 combinations for 5 set bits out of 13, and this code can be used to represent a 10 bit address field (1,024 combinations). For another example, a 6/15 code has 5,005 combinations, and can used to map a 12 bit (4,096 combinations) address field.
If a given m of n code is used to represent a k-bit address field, and C - the number of possible combinations for that code is:
2k < C™ < 2k+l
(That is - the given code is large enough for k bit, but not for k+1), then we denote the coding optimal.
Optimal coding yields the best performance if we ignore the translation logic. However, the translation logic may be relatively complex. Therefore, the current invention also comprises non-optimal coding, where less saving is achieved in the array, but the translation logic is simple.
E: X Decoder Examples
The following table gives several examples of X-decoder implementations, for a 12-bit address field, with prior art pre-decoding (1 through 4) and the new invention (5 through 10). One should note the following: • As the number of horizontal lines depends on the number of address bits, the area of the X Decoder is, to a large extent, dependant on the number of vertical lines only.
The power-delay product is, to a large extent, dependant on the number of inputs per gate. This is true both from the aspect of the AND gate capacitance and the aspect of the total capacitive load on the vertical lines. Starred combination 5 is sub-optimal; actually it is the example depicted in 0C above (page 8), duplicated for two groups of 6 bits each. The implementation of the translation logic is simple, as shown above.
Figure 6 is a graph which shows clearly that the power delay product and the area parameters which can be achieved by the new invention (bottom curve) are superior to those achieved with the prior art (top curve). The squares on the bottom curve represent points of optimum coding of the current invention, and are clearly superior to the squares on the top curve, representing prior art results. The solid dot represents sub-optimal entry 5 in the table, with a simple translation logic, and yet clearly better than the graph attainable with the prior art.
F: Better Y Select circuits.
The same principle can be applied to the Y select circuits, yielding better results in terms of speed, power and area.
Figure 7 shows a Y select circuit to select one of 8 columns. Note that a 3 bit to 2/5 code is used in this example.
In Figure 8 we reduce the circuit of Figure 5 without losing functionality, by merging column lines at the point where, from there on (to the sense amplifier), their control transistors are the same. Similar merging is needed in all embodiments of the current invention when used for Y-selection.
One knowledgeable in the art will see that this principle is expandable to any number of columns, using large enough n of m codes.
The table below compares several implementation of a 4,096 to 1 Y selection. Note that, as we assume the address is known in advance, the delay is mainly a function of the number of series transistors. Also note that prior art pre-decoding schemes with large groups are not practical (and, therefore, not shown), because each node is loaded with a big capacitive charge donated by the diffusion of all the parallel control transistors. For example, a full 4,096 way pre-decoding, will have only one series transistor, but the sense amplifier will "see" the capacitance of all 4,096 transistors, and will be slower than other configurations. This is in addition to the obvious fact that such pre-decoding is non-practical because of the big size taken by 4,096 horizontal lines.
Legend for the table:
• PA = Prior Art • CI = Current invention • PD = Pre-decoding
The results are similar to those presented for the X-Decode and, again, the advantage of the current invention is obvious.
While the invention has been particularly taught and described with reference to several embodiments, those versed in the art will appreciate that minor modifications in the form of detail may be made without departing from the spirit and scope of the invention. Accordingly, all such modifications are embodied within the scope of this patent as properly come within any contribution to the art, and are particularly pointed out by the following Claims.