DUAL BANK SHARED DATA RAM FOR EFFICIENT PIPELINED VIDEO AND DATA PROCESSING
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates to high performance multiprocessor design, and, more particularly, to dual bank shared data RAM for efficient pipelined video and data processing.
2. Description of the Related Art The performance of modern multiprocessor computer systems which may be configured in a variety of multiprocessor architectures is frequently limited by the data exchange delays between these processors. Data exchange delays are particularly troublesome in high performance applications, such as those needed for video and data processing, where processing speed is crucial. Even the most powerful processors can process data only as fast as the data can be accessed. Traditionally, data exchange has "been implemented in two ways. In one method, a FIFO (first-in-first-out) is used for individual processors having their own memories. Traditionally, the FIFO includes both logic elements and memory elements. The memory elements generally include a fast memory (hereinafter referred to as a "FIFO memory"), such as a fast static RAM. Suppose data needs to be transferred from a first processor to a second processor. In the FIFO method, the first processor will begin writing data to the FIFO memory. As the FIFO memory fills up, the logic elements of the FIFO generate an interrupt to the second processor and the second processor will begin reading data from the FIFO. In some instances, an additional step may be necessary to write the data to the second processor for further processing.
Data exchange may also be implemented using a shared common memory. The shared common memory is generally implemented using some control logic and a dedicated datapath routing logic to logically connect the respective bus structures of two processors. A special mechanism, such as a semaphore, may be implemented to prevent bus contentions. Like the FIFO, data exchange between processors with a shared common memory requires at least one read and one write instruction from the shared common memory. Additional steps to write or read from the processors may also be necessary. The FIFO and shared common memory methods have several drawbacks. Each transfer of data from one processor to another processor requires at least two actions: a write to the FIFO or shared common memory and a read from the FIFO or shared common memory. Furthermore, if data needs to be processed, a processor may need an extra step to write the data to the processor RAM. These extra steps are time-consuming and may be particularly troublesome in high performance designs where fast data access speeds are crucial. Therefore, a need exists for an apparatus that provides efficient data exchange between processors. The data should be transferred fast enough to support a high performance design, such as needed for video processing. Duplication of data among processors should also be avoided. The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.
SUMMARY OF THE INVENTION In one aspect of the present invention, a dual bank memory device is provided. The dual bank memory device includes a first memory; a second memory; a first switch that operatively couples the first memory to a first processor, wherein the first switch is capable of
operatively coupling the second memory to the first processor, wherein the first switch directs data and address information from the first processor to the first memory and directs data information from the first memory to the first processor, and wherein the first switch is capable of directing data and address information from the first processor to the second memory and directing data information from the second memory to the first processor; a second switch that operatively couples the second memory to a second processor, wherein the second switch is capable of operatively coupling the first memory to the second processor, wherein the second switch directs data and address information from the second processor to the second memory and directs data information from the second memory to the second processor, and wherein the second switch is capable of directing data and address information from the second processor to the first memory and directs data information from the first memory to the second processor; and a control logic operatively coupled to the first switch and the second switch, wherein the control logic instructs the first switch and the second switch to swap memories. In another aspect of the present invention, a method of acquiring data from a dual bank shared memory device is provided. The method includes transmitting address information from a first processor to a first memory bank, transmitting address information from a second processor to a second memory bank, transmitting data information from a first processor to a first memory bank, transmitting data information from a second processor to a second memory bank, transmitting data information from a first processor to a first memory bank, transmitting data information from a second processor to a second memory bank, and swapping the first memory and the second memory. In yet another aspect of the present invention, a multiprocessor architecture is provided. The multiprocessor architecture comprises a first processor; a second processor; one or more main memories operatively coupled to the first processor and the second
processor; one or more cache memories coupled to the first processor and the second processor; a first memory bank; a second memory bank; a first switch that operatively couples the first memory bank to a first processor, wherein the first switch is capable of operatively coupling the second memory bank to the first processor, wherein the first switch directs data and address information from the first processor to the first memory bank and directs data information from the first memory bank to the first processor, and wherein the first switch is capable of directing data and address information from the first processor to the second memory bank and directing data information from the second memory bank to the first processor; a second switch operatively couples the second memory bank to a second processor, wherein the second switch is capable of operatively coupling the first memory bank to the second processor, wherein the second switch directs data and address information from the second processor to the second memory bank and directs data information from the second memory bank to the second processor, and wherein the second switch is capable of directing data and address information from the second processor to the first memory bank and directing data information from the first memory bank to the second processor; and a control logic operatively coupled to the first switch and the second switch, wherein the control logic instructs the first switch and the second switch to swap memories banks. These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements, and in which:
FIG. 1 depicts a memory configuration of a multiprocessor architecture in accordance with one embodiment of the present invention; FIG. 2 depicts an alternate memory configuration of the multiprocessor architecture of FIG. 1; FIG. 3 depicts, in further detail, the connection of the DBSM to the processors, as described in FIG. 1 ; FIG. 4 depicts, in further detail, a schematic diagram of the DBSM architecture, as described in FIG. 1 ;
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Referring now to Figure 1 , an exemplary embodiment of a memory configuration of a multiprocessor architecture 100 in accordance with the present invention is illustrated. As shown, the multiprocessor architecture 100 comprises a first processor 105 and a second processor 110 sharing a main memory 1 15. Although only two processors are shown in Figure 1 , it is known to those skilled in the art that a typical multiprocessor architecture may include any number of processors depending on the implementation. It is also understood that one or more of the processors may comprise its own memory without sharing the main memory 115. The main memory 1 15 is typically implemented using slower, cheaper dynamic random access memory ("DRAM") devices. The main memory 115 may include random access memory ("RAM"), read-only memory ("ROM"), or both. For example, a computer may boot up using the contents of the ROM and thereafter use the RAM for temporary storage of data associated with applications and the operating system. Whenever "main memory" is mentioned, it is contemplated that it may include any combination of the ROM and RAM. The first processor 105 and second processor 110 (collectively known as "processors") each comprise a bifurcated level 1 ("LI") instruction cache memory 120 and data cache memory 125 (collectively known as a "cache"). Although not illustrated in Figure 1, in other embodiments, a level 2 ("L2") cache may be implemented on the processors 105, 110 as necessary. It is understood that the multiprocessor architecture 100 may include one or more cache memories for further reducing latency between the main memory 1 15 and the processors 105, 110. The cache 120, 125 is implemented using faster random access memory devices, such as static random access memory devices ("SRAMs") so that accessing the cache 120, 125 takes much less time to complete than to access main memory. SRAMs typically require
greater number of devices per bit of information stored, and thus are more expensive than
DRAM. To further reduce the memory latency period, in other embodiments, the cache 120,
125 may be located on the same chip as the processors 105, 110. A dual bank shared memory (hereinafter referred to as a "DBSM") 130 is shown operatively coupled to the first processor 105 and the second processor 110. The DBSM is preferably implemented using a fast memory, such as the memory used for the cache 120, 125. It is understood that additional DBSM memories may be coupled to additional processors in a multiprocessor architecture. For example, referring to Figure 1, a second DBSM (not shown) may be coupled to the second processor 110 and a third processor (not shown). Furthermore, in an alternate embodiment, the multiprocessor architecture 100 of Figure 1 may be implemented as a processor ring 200, as illustrated in Figure 2. As shown in Figure 2, four processors 205, 210, 215, 220 are coupled to each other via four respective DBSMs 130, 135, 140, 145. Although not shown, each processor 205, 210, 215, 220 may be operatively coupled to its own main memory or a shared memory and one or more cache memories, similar to the configuration described for Figure 1. Referring now to Figure 3, an exemplary embodiment of the connection 300 between the processors 105, 110 and the DBSM 130 in accordance with the present invention is illustrated. The connection between the DBSM 130 and the processors 105, 1 10 include a unidirectional address bus 305 for specifying a data read/write address and a bidirectional data bus 310 for transmitting and receiving data. An interrupt signal (described in greater detail in Figure 4) is generated by the DBSM 130 and transmitted to the processors 105, 110 via their respective interrupt lines 315, 320. It is understood that any of a variety signals known to those skilled in the art may be used in place of interrupts.
Referring now to Figure 4, a schematic diagram of the DBSM 130 in accordance with the present invention is illustrated. The DBSM 130 comprises a first memory bank 405 and a second memory bank 410. On one side of the first memory bank 405 is a first multiplexer/switch 415, and on the other side is a second multiplexer/switch 420. The first multiplexer/switch 415 is coupled to the first processor 105 (not shown in Fig. 4) via the address bus 305 and the data bus 310. The second multiplexer/switch 420 is coupled to the second processor 110 (not shown in Fig. 4). As illustrated, the first multiplexer/switch 415 directs address and data information via the unidirectional address bus 305 and the bidirectional data bus 310, respectively, to and from the first memory bank 405. The second multiplexer/switch 420 directs address and data information via the unidirectional address bus 305 and bidirectional data bus 310, respectively, to and from the second memory bank 410. The first multiplexer/switch 415 and the second multiplexer/switch 420 (collectively referred to as "switches) are commonly known in the art and may be built using standard logical gates. In alternate embodiments, the switches 415, 420 may be implemented using a combination of one or more standard cells, such as the 74LS 138, the 74LS 153, and the like. A control logic 425 directs the switches 415, 420 to send data and address information to either the first memory bank 405 or the second memory bank 410. As illustrated, the first multiplexer/switch 415 directs address and data information to and from the first memory bank 405, and the second multiplexer/switch 420 directs address and data information to and from the second memory bank 410. The control logic 425 may "swap" memory banks by activating the switches 415, 420 such that the first multiplexer/switch 415 directs address and data information to and from the second memory bank 410, and the second multiplexer/switch 420 directs address and data information to and from first memory bank 405. The DBSM 130 further comprises a first processor flag 430 and a second processor flag 435. The first processor flag 430 indicates whether the first processor 105 is done using
its associated memory bank. As illustrated, the associated memory bank for the first processor 105 is the first memory bank 405. The second processor flag 435 indicates whether the second processor 110 is done using its associated memory bank. As illustrated, the associated memory bank for the second processor 1 10 is the second memory bank 410. The first processor flag 430 receives a first signal 440 from a first address and data decoding logic 445, and the second processor flag 435 receives a second signal 450 from a second address and data decoding logic 455. The first signal 440 notifies the first processor flag 430 whether the first processor 105 is done using its associated memory bank. The second signal 450 notifies the second processor flag 435 whether the second processor 110 is done using its associated memory bank. The first address and data decoding logic 445 and the second address and data decoding logic 455 receive address and data information from the first processor 105 and second processor 1 10, respectively, via the unidirectional address bus 305 and the bidirectional data bus 310. When the first address and data decoding logic 445 and the second address and data decoding logic 455 receive a proper signal from the first processor 105 and the second processor 110, respectively, the address and data decoding logics 445, 455 notify the first and second processor flags 435, 440 that the processors 105, 1 10 are done using their associated memory banks. It is understood that any of a variety of signals known to those skilled in the art may be used to notify the address and data decoding logics 445, 455 that the processors 105, 1 10 are done. For example, as illustrated in Figure 4, the address and data decoding logics 445, 455 sends a signal if they receive a particular address and a particular data value from the processors 105, 110, respectively. Once both flags 430, 435 indicate that both processors 105, 110 are done using their associated memory banks, the control logic 425 performs three tasks. The control logic 425 instructs the switches 415, 420 to swap memory banks, sends a reset signal 460 to the processor flags 430, 435 resetting them to their original configuration, and sends an interrupt
signal 465 to both processors 105, 1 10 indicating that the processors 105, 1 10 can start using the DBSM. As mentioned previously, it is understood that any of a variety of signals known to those skilled in the art may be used in place of interrupts. The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.