WO2011048400A1

WO2011048400A1 - Memory interface compression

Info

Publication number: WO2011048400A1
Application number: PCT/GB2010/051692
Authority: WO
Inventors: Stuart David Biles; Martinus Cornelis Wezelenburg
Original assignee: Arm Limited
Priority date: 2009-10-20
Filing date: 2010-10-08
Publication date: 2011-04-28
Also published as: GB0918373D0

Abstract

Data storage within a memory is performed on a block-by-block basis. A block of memory locations is allocated within the memory. In some embodiments the block size does not depend upon the degree of compression of individual blocks and is at least as large as required for the uncompressed data thereby simplifying memory management and access. A block of data values to be stored within the memory can be compressed to form a compressed block of data values. This compressed block of data values normally has a storage capacity requirement smaller than the storage capacity of the allocated block of memory locations. This leaves an unused portion of the allocated block of memory locations. When the block of data values stored in an allocated block of memory locations is modified, then a modified compressed block of data values is formed and this can be accommodated within the originally allocated block of memory locations by utilising some of the unused portion. Whether compression is used or the particular compression scheme used may vary on a block-by-block basis, such as in dependence upon whether compression processing actually reduces size, the device generating or consuming the data concerned, the data stream identity or in some other way. The blocks of memory storage locations may be matched in size to the size of a buffer store within the memory. Control codes, such as an end-of-block code or an exception code, may be stored with the compressed blocks to control their processing.

Description

MEMORY INTERFACE COMPRESSION

This invention relates to the field of data processing systems. More particularly, this invention relates to the fields of compressing blocks of data values for storage within a memory of a data processing system.

It is known to provide data processing systems with memories in which data values are stored. These data values may conveniently be divided into blocks of data values. In order to reduce the data storage space required within such systems, it is known to compress blocks of data values before they are stored into the memory. Compressed blocks of data values will occupy less space within the memory.

Compressing the blocks of data values that are stored in the memory also has the advantages that less time is taken and less energy is consumed when writing those blocks of data values to the memory and reading those blocks of data values from the memory. This advantage is due to the fact that reading or writing a compressed block of data will involve transferring a smaller amount of data across the memory interface than if uncompressed data were transferred. This is of particular interest where the interface to a memory introduces a large latency or incurs a significant energy cost for transferring data, e.g. an off-chip memory interface.

In data compression schemes typically used with memory systems, it is normal to locate the compressed blocks of data values that are stored in the memory such that each compressed block lies adjacent in the memory to the previous compressed block. This arrangement of compressed data blocks results in the compressed data blocks using a smaller total amount of memory than if the uncompressed data blocks were used.

In practise, memory compression schemes have to cope with a number of complications. As the level of compression achieved depends on the nature of the original data, each compressed block may occupy a different amount of space in the memory. Further, writes to a block of data values will require that the block of data values be recompressed before being written to the memory again; the recompressed block of data may occupy more space than before, such that the compression scheme will need to provide management features that can cope with varying compressed block size and resultant fragmentation of the compressed data blocks stored in the memory. Finally, it is possible that a compressed block may require more space than the original uncompressed data.

To cope with these complications, it is typical to manage the memory in segments and introduce an address translation scheme that maps from uncompressed data block address to compressed data block address. A segment based management scheme permits relocation of compressed data blocks and the use of multiple segments for data that does not compress well. It is typical to choose the segment size based on average compression ratio for the data patterns typically observed in the application. A segment based management scheme incurs an overhead in terms of additional state to record the relationship between the two address spaces and a level of indirection that adds to latency when processing memory accesses with unpredictable address sequences.

Viewed from one aspect, the present invention provides a method of storing data in a memory, said method comprising the steps of:

allocating memory storage locations within said memory to form a plurality of contiguous blocks of memory storage locations;

partitioning said data into a plurality of blocks of data values;

compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed data blocks; and

storing respective ones of said compressed blocks of data values having a storage size less than a storage capacity of one of said blocks of memory storage locations within a corresponding block of memory storage locations; wherein

each of said blocks of memory storage locations has a storage capacity at least as large as required for a corresponding block of data values before compression.

The present technique recognises that the use of compression for blocks of data values is not constrained to situations in which it is desired to minimise the amount of storage space required within the memory. Each block of memory storage locations has sufficient capacity to store a block of data values or a corresponding compressed block of data values. In particular, the memory storage locations may be divided into blocks which are large enough to store corresponding uncompressed blocks of data values; such an arrangement offers no potential for storage space reduction but still offers other appreciable benefits. The reduced volume of data which needs to be written to or read from the memory when accessing such a compressed block of data values has advantageous latency and energy savings, even though there is typically unused space within the allocated block of memory storage locations. Furthermore, even though it is against the technical prejudice in the field to size allocated blocks of storage locations within the memory in a system employing compression within the memory such that the allocated blocks of storage locations have at least the capacity to store corresponding uncompressed blocks of data values, such an arrangement has the significant advantage that the unused portions of the allocated blocks of storage locations enable a simplification in the management of the memory, e.g. fragmentation within the memory to be advantageously reduced.

In a conventional system in which an allocated block of storage locations is sized to be smaller than an corresponding uncompressed block of data values, then should that compressed block of data values be altered, there is a likelihood that it will no longer fit within the original allocated block of storage locations and accordingly an additional allocated block of storage locations will have to be provided. This fragmentation will tend to lead to a less efficient memory system. In contrast, the provision of the unused portions within the allocated blocks of memory storage locations has the result that, in many cases, when a compressed block of data values is altered, it is possible to store the modified compressed block of data values into the same allocated block of storage locations as was previously used, thereby reducing memory fragmentation.

If the compression attempted has the effect of increasing the data size, then the allocation of blocks of memory storage locations at least as large as the uncompressed data size means that the uncompressed data can be stored instead. The uncompressed block may be stored based upon a compressed version of the same block being larger or, for example, as compression has been discontinued as other blocks when compressed were larger.

In some embodiments, the efficiency of the system may be enhanced if a set of physical access parameters are associated with each stored block. The physical access parameters may be used to influence the handling of a particular stored block by the memory system.

In some embodiments, the mixing of compressed and uncompressed blocks is facilitated if a physical access parameter is associated with each stored block to indicate if the block is compressed or uncompressed.

This physical access parameter can be used to control if decompression is used when the block is read.

The one or more physical access parameters associated with a block of memory storage locations may also include an end-of-block code stored with a compressed block within said block of memory storage locations and indicating an end position of the compressed block. This is useful in controlling the processing of the compressed block.

More particularly, in some embodiments, reading of the compressed block may be terminated in dependence upon detection of the end-of-block code. Thus, energy may be saved by not reading data values within the block of memory storage locations beyond the end-of-block code even though the compressed block of data has yet to be decompressed and the size of the recovered data is as yet unknown.

In some embodiments, the compression techniques used can vary from block to block. The different types of compression used may be selected in dependence upon a particular device associated with a given compressed block of data values (e.g. as a generator or consumer of that block of data values). In other embodiments, the compression used may be selected in dependence upon the block type of the block of data values to be compressed (e.g. image data may be compressed using one compression scheme with audio data being compressed using a different compression scheme).

When more than one compression scheme is used then the system should keep track of which compression scheme is used for which compressed block of data values. The physical access parameter(s) referred to above may indicate whether and what compression scheme was used for a block of data values stored in the memory, an access sequence pattern, etc. The compression used may be adaptively determined based on other blocks.

In some embodiments, the physical access parameter may further indicate that the size (directly or indirectly in some coded form) of the associated compressed data block is shorter than that of the corresponding allocated block of storage locations; this parameter may be used to prevent the prefetching of a subsequent memory read (bursts) and trigger the early termination of a current read (burst). By providing an indication of compressed data block length, burst accesses to the memory can be made more efficient, leading to a reduction in energy used and increase in available memory bandwidth, e.g. burst read may be early terminating and/or prefetching may be terminated.

The physical access parameters may be implemented in a variety of different ways, including the use of parameter storage circuitry outside of the memory, parameter storage circuitry inside of the memory, a table of physical access parameters stored within storage locations of the memory, and an physical access parameter stored within a block of memory storage locations for respective compressed block of data values and can be encoded in different ways (i.e. where data is stored and what it represents). Storing the physical access parameters separately from the block of data to which they correspond has practical advantages in the context of systems where the compressed and other access parameter are controlled adaptively, e.g. dependence upon device type and/or data type.

While the present technique may be useful and advantageous as discussed above for implementations in which the memory is an on-chip memory, the performance and energy efficiency advantages are strong in embodiments in which the storing of the compressed blocks of data values is an off-chip storage operation including generation of memory storage signals upon a memory bus coupled to the memory. Such a memory bus will typically have a relatively high capacitance and accordingly consume a significant amount of energy in being driven, whilst also introducing significant latency to the access. In the case of external synchronous DRAM memory, there may be further latency introduced due to the internal page management of such memory, requiring page open and close events. The control of the system may be simplified when a linear function exists which determines the location of a compressed block of data values corresponding to a block of data values.

In some simple embodiments, the memory storage locations may be set to have a fixed size, e.g. a size equal to the size of the blocks of data values before they are compressed. This permits simple address calculation to locate a block in memory.

The compression performed on the blocks of data values may be performed by various processing hardware in accordance with a variety of different compression algorithms. In some embodiments, a general purpose processor running compression software may compress blocks of data values before they are written out to memory. In other embodiments, the compression performed may be undertaken by a memory controller which couples other elements within the system to a memory. In this way, the compression of blocks of data values may be made more transparent to the rest of the system with the compression being performed under hardware or firmware control by the memory controller.

Efficiency gains can be made in some embodiments in which the memory has a main store and a buffer store and a read operation to the memory fills the buffer store with a contiguous block of data values from the main store with subsequent read operations within the contiguous block of data values being made to the buffer store while the buffer store continues to hold that contiguous block of data values. This type of ^'< arrangement is, for example, useful in a DRAM where the banks of memory bit cells are accessed and the contents of a complete row of bit cells stored within a buffer store within the DRAM with subsequent accesses to that row of bit cells being made to the data values stored within the buffer store.

> This arrangement of the memory can be matched to the behaviour of the current technique by arranging that each of the blocks of memory storage locations has a storage capacity equal to a storage capacity of the buffer store within the memory. Thus, when a block of memory storage locations is to be accessed, it is first read into the buffer store within the memory and then accessed from within that buffer store within the memory for improved energy efficiency and speed.

This arrangement may be further exploited by providing a processing circuitry buffer store within the processing circuitry that issues a data read request. In this way, a contiguous block of data values may be stored into the processing circuitry buffer store as they are streamed back from the memory.

Overall energy efficiency may be improved if a data read request to a contiguous block of data values is allocated a block-read priority level such that the storing of that contiguous block of data values within the processing circuitry buffer store is uninterrupted by memory access requests to the memory having a lower priority than the block-read priority level. The block-read priority level may be set relatively high such that it is unlikely that the reading of the contiguous block of data values will be interrupted such that once these have been stored into the buffer store within the memory they will all be read back to the processing circuitry and stored within the processing circuitry buffer store without intervening memory accesses possibly evicting the contiguous block of data values from the buffer store of the memory thus requiring them to be reloaded and wasting energy in such a reloading operation.

Viewed from another aspect the present invention provides apparatus for processing data, said apparatus comprising:

a memory;

allocating circuitry configured to allocate memory storage locations within said memory to form a plurality of contiguous blocks of memory storage locations;

partitioning circuitry configured to partition data into a plurality of blocks of data values;

compression circuitry configured to compress one or more of said plurality of blocks of data values to form corresponding one or more compressed data blocks; and storage circuitry configured to store respective ones of said compressed blocks of data values having a storage size less than a storage capacity of one of said blocks of memory storage locations within a corresponding block of memory storage locations; wherein each of said blocks of memory storage locations has a storage capacity at least as large as required for a corresponding block of data values before compression.

Viewed from a further aspect the present invention provides apparatus for processing data, said apparatus comprising:

memory means for storing data values;

allocating means for allocating memory storage locations within said memory to form a plurality of contiguous blocks of memory storage locations;

partition means for partitioning data into a plurality of blocks of data values; compression means for compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed data blocks; and

storage means for storing respective ones of said compressed blocks of data values having a storage size less than a storage capacity of one of said blocks of memory storage locations within a corresponding block of memory storage locations; wherein each of said blocks of memory storage locations has a storage capacity at least as large as required for a corresponding block of data values before compression.

Viewed from a further aspect the invention provides a method of storing data in a memory, said method comprising the steps of:

partitioning said data into a plurality of blocks of data values;

compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and

storing respective ones of said compressed blocks of data values within a corresponding block of memory storage locations; wherein

said memory has a main store and a buffer store and a read operation to said memory fills said buffer store with a contiguous block of data values from said main store with subsequent read operations within said contiguous block of data values being made to said buffer store while said buffer store continues to hold said contiguous block of data values and each of said blocks of memory storage locations has a storage capacity equal to a storage capacity of said buffer store. The present technique recognises that in some embodiments the constraints on the relative sizes of the blocks of memory storage locations and the compressed blocks may be removed and advantage may be gained by arranging that the memory has a main store and a buffer store with a read operation to the memory filling the buffer store with a contiguous block of data values from the main store with subsequent read operations within the contiguous block of data values being made to the buffer store and with an arrangement such that the blocks of memory storage locations into which the data is compressed have a storage capacity equal to the storage capacity of the buffer store. In this way, when compressed data is to be read a block of memory storage locations will be filled into the buffer store of the memory from the main store of the memory and then the memory accesses to that block of memory storage locations (containing the compressed data) will be serviced from the buffer store in a more energy efficient manner.

Viewed from a further aspect the present invention provides an apparatus for processing data, said apparatus comprising:

a memory;

compression circuitry configured to compress one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and

storage circuitry configured to store respective ones of said compressed blocks of data values within a corresponding block of memory storage locations; wherein

said memory has a main store and a buffer store and a read operation to said memory fills said buffer store with a contiguous block of data values from said main store with subsequent read operations within said contiguous block of data values being made to said buffer store while said buffer store continues to hold said contiguous block of data values and each of said blocks of memory storage locations has a storage capacity equal to a storage capacity of said buffer store.

Viewed from a further aspect the present invention provides an apparatus for processing data, said apparatus comprising: memory means for storing data values;

partition means for partitioning data into a plurality of blocks of data values; compression means for compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and

storage means for storing respective ones of said compressed blocks of data values within a corresponding block of memory storage locations; wherein

said memory means has main store means for storing data values and buffer store means for storing data values and a read operation to said memory fills said buffer store means with a contiguous block of data values from said main store means with subsequent read operations within said contiguous block of data values being made to said buffer store means while said buffer store means continues to hold said contiguous block of data values and each of said blocks of memory storage locations has a storage capacity equal to a storage capacity of said buffer store means.

Viewed from a further aspect the present invention provides a method of storing data in a memory, said method comprising the steps of:

partitioning said data into a plurality of blocks of data values;

compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks;

adding one or more control codes to at least one of said one or more compressed blocks, said one or more control codes providing control for a subsequent read operation of said at least one of said one or more compressed blocks; and

storing respective ones of said compressed blocks of data values within a corresponding block of memory storage locations.

The present technique also recognises that without the constraints on the relative sizes of the blocks of memory storage locations and the compressed blocks it may be advantageous to add one or more control codes for providing control of read operations of compressed blocks from the memory. Such control codes may include an end-of-block code for use in controlling terminating of reading of a compressed block of data.

Other examples of control codes which may be added include an exception code indicating that the a storage size of a compressed block is greater than a storage capacity of one of the blocks of memory storage locations leaving an overflow portion of the compressed block. This overflow portion of the compressed block may be stored in an adjacent block of memory storage locations. If greater freedom is desired, then the exception code may have a pointer field associated with it indicating an unused portion of a block of memory storage locations into which the unused portion may be stored with this unused portion of a block of memory storage locations not being constrained to being within an adjacent block of memory storage locations.

The unused portion of the block of memory storage locations into which the overflow portion may be stored can be within a block of memory storage locations that is allocated for the exclusive use of the overflow portion or may be within a block of memory storage locations that is also storing another compressed block and has spare capacity.

a memory;

compression circuitry configured to compress one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks and to add one or more control codes to at least one of said one or more compressed blocks, said one or more control codes providing control for a subsequent read operation of said at least one of said one or more compressed blocks; and storage circuitry configured to store respective ones of said compressed blocks of data values within a corresponding block of memory storage locations.

Viewed from a further aspect, the present invention provides an apparatus for processing data, said apparatus comprising:

memory means for storing data values;

partition means for partitioning data into a plurality of blocks of data values; compression means for compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks and for addin g one or more control codes to at least one of said one or more compressed blocks, said one or more control codes providing control for a subsequent read operation of said at least one of said one or more compressed blocks; and

storage means for storing respective ones of said compressed blocks of data values within a corresponding block of memory storage locations.

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

Figure 1 schematically illustrates an apparatus for processing data including a system-on-chip integrated circuit coupled to an off-chip memory;

Figure 2 schematically illustrates a known technique for compressing blocks of data values into corresponding compressed blocks of data values, contiguously stored within a memory;

Figure 3 schematically illustrates compression of blocks of data values in accordance with one example of the present technique;

Figure 4 is a flow diagram schematically illustrating the present storage technique;

Figure 5 is a diagram schematically illustrating various positions for storing data indicating a type of compression used on a block-by-block basis;

Figure 6 is a flow diagram schematically illustrating storing data to a memory on a block-by-block basis and selectively using compression; Figure 7 schematically illustrates a memory controller that may be used to perform interface compression;

Figure 8 schematically illustrates a bus controller providing memory compression in accordance with the present techniques at one bus interface;

Figure 9 schematically illustrates a plurality of blocks of memory storage locations each including a compressed block and having control codes associated therewith, in this case being end-of-block codes;

Figure 10 is a flow diagram schematically illustrating the termination of a read operation in response to detection of an end-of-block code;

Figure 1 schematically illustrates a memory in the form of a DRAM including a main store and a buffer store coupled to a system-on-chip processing circuit utilising the present techniques;

Figure 12 schematically illustrates a plurality of blocks of memory storage locations in which the size of a compressed block exceeds the size of a block of memory storage locations giving rise to an overflow portion which is stored in another block of memory storage locations pointed to by a pointer field associated with an exception code stored with the main portion of the compressed block;

Figure 13 schematically illustrates a frame buffer compression circuit for use in compressing image data being passed between processing circuitry and a memory; and

Figure 14 illustrates a further embodiment similar to that of Figure 13 but including a processing circuitry buffer store within the frame buffer compression circuit.

Figure 1 schematically illustrates a data processing apparatus 2 comprising a system-on-chip integrated circuit 4, coupled via an off-chip memory bus 6 to a separate memory 8. The integrated circuit 2 includes a variety of devices such as a microprocessor 10, a graphics processing unit 12, a cache memory 14 and a memory controller 16. The microprocessor 10, the graphics processor 12, the cache memory 14 and the memory controller 16 are linked via a system bus 18. Memory accesses (either reads or writes) to the memory 8 take place via the memory controller 1 . The memory controller 16 generates signals upon the memory bus 6 to either write data to or read data from the memory 8. Figure 2 illustrates a known technique for compressing blocks of data values for storage within a memory. The data values 20 to be stored may be divided into a plurality of blocks of data values 22, 24, 26, 28 which all have the same fixed block size. Each of these blocks of data values 22, 24, 26, 28 may then be separately compressed to form a corresponding compressed block of data values 30, 32, 34, 36. The memory storage locations within the memory 8 will then be allocated to storage of the compressed blocks of data values 30, 32, 34, 36 such that these occupy the minimum space within the memory 8, and are arranged contiguously as illustrated in Figure 2. This reduces the storage capacity consumed within the memory 8 for the storage of the data concerned. The amount of data traffic on the memory bus 6 is also reduced when writing or reading the data, providing the compression and/or decompression is performed within the memory controller 1 or elsewhere on the integrated circuit 4.

Figure 3 illustrates one example of the present storage technique. The data 38 to be stored is again divided into blocks of data values 40, 42, 44, 46, which have a fixed size that is greater than the uncompressed data block size, Each of these blocks of data values 40, 42, 44, 46 is then subject to data compression. The compression used can be the same for all the data blocks 40, 42, 44, 46 or may be varied on a block-by-block basis. Some blocks may not be compressed as compression processing actually increase their size. One way of selecting which compression technique is to be used would be to associate each block of data with a device within the integrated circuit 4 and then allocate the compression technique in dependence upon the associated device. For example, the graphics processing unit 12 may be associated with graphics type data and a compression technique is associated with graphics data, such as JPEG or MPEG, may be selected for use for all data blocks to be written from the graphics processing unit 12 or read to the graphics processing unit 12. in the case of the microprocessor 10, it might be appropriate to use a different compression technique for the blocks associated with the microprocessor 10, such as a general purpose ZIP compression technique. In other embodiments a component within the integrated circuit 4 may be provided for specific audio data support, and in this circumstance, the compression technique used for blocks of data values associated with this audio hardware may be selected to be MP3 compression, or another known audio compression technique. Different hardware components within the integrated circuit 4 will typically access data in different ways. Graphics processing unit 12 may access data in a data stream, and in this circumstance a particular compression type may be associated with a particular data stream.

Returning to Figure 3, it will be seen that the allocated blocks of memory locations 48, 50, 52, 54 all have the same fixed size and are larger than is required to store the corresponding compressed blocks of data values 56, 58, 60, 62. This leaves an unused portion 64, 66, 68, 70 within each allocated block of memory locations 48, 50, 52, 54. The allocated blocks of memory locations are contiguously disposed within the memory address space of the memory 8. When a block of data values 40, 42, 44, 46 is modified and re-compressed to form a modified compressed block of data values 56, 58, 60, 62, then the unused portion of the allocated block of memory locations will provide sufficient storage capacity that the modified compressed block of data values 56, 58, 60, 62 can be accommodated within the originally allocated block of memory locations 48, 50, 52, 54 without requiring memory fragmentation.

A comparison of Figure 3 with Figure 2 illustrates that memory storage capacity has not been optimised in Figure 3 since unused portions within the allocated blocks of memory locations 48, 50, 52, 54 are deliberately provided. This is counter to the technical prejudice in the field which would normally seek to make maximum use of memory capacity. However, the compressed blocks of data values 56, 58, 60, 62 can be written to the memory 8 and read from the memory 8 with reduced traffic and energy consumption on the memory bus 6. Thus, the power saving advantages associated with the compressed data form on the memory bus 6 may be achieved even though the storage capacity of the memory 8 is not optimised. The deliberate provision of the unused portions within the allocated blocks of memory locations does have the advantage that management such as start address calculation is simplified and memory fragmentation may be avoided, as a modified compressed block of data values can normally be accommodated within its original allocated block of memory locations.

The allocated block of memory locations may have a fixed size. This fixed size is set to be at least equal to the size of the block of data values in uncompressed form. This should ensure that the compressed block of data values can substantially always be accommodated within the allocated block of memory locations (if not the data is stored in uncompressed form, e.g. due to a trail compression not reducing the block size of the block in question or a related block). The size of the block of memory locations is thus independent of the compression ratio achieved for each compressed block of data values 56, 58, 60, 62.

When a compressed block of data values is read from the memory, one or more physical access parameters associated with that compressed block of data values may be read (e.g. from a header of the compressed block or a separate table) to indicate the size of the compressed block of data values. In this way the read can be terminated when the compressed block of data values has been read without wasting time and energy reading the unused portion. Other access parameters can indicate, for example, an access sequence with a compressed block, whether or not a block is compressed etc. These access parameters may be determined adaptively in dependence upon other blocks, e.g. the access sequence may be dependent upon that used for a preceding block.

Figure 4 is a flow diagram schematically illustrating the present technique. At step 72, blocks of memory locations are allocated to form contiguous blocks of memory locations by fixed size M. Step 74 waits until a block of data values of size M has been formed to write to the memory. Step 76 then compresses the block of data values to form a compressed block of data values of a size less than M Step 78 stores the compressed block of data values to a block of memory locations. The compressed block of data values has a size smaller than the allocated block of memory locations, and accordingly an unused portion is left within the allocated block of memory locations.

Step 80 waits until a modification of a compressed block of data values is required. Step 82 compresses the modified block of data values to form a modified compressed block of data values. This modified compressed block of data values will substantially always fit within the original allocated block of memory locations, and accordingly, step 84 says to overwrite the block of memory locations where the original compressed block of data values was stored with the now modified compressed block of data values at step 82. As previously discussed, different types of data compression may be used for different blocks of data. When this technique is employed, the system can track which block of data has been compressed by which compression technique in a variety of different ways.

Compression table circuitry 86 may be provided outside of the memory 8. Compression table circuitry 88 may alternatively be provided inside the memory 8. A further alternative is that each of the blocks of memory locations 56, 58, 60, 62 may include within it, data identifying the type of compression used for that block of data. The otherwise unused portion within each block can provide a place to store this compression type identifying data. As an alternative, a table of indicators 90 may be provided separately within the data stored by the memory 8. It will be appreciated that the different places to store the compression identifying data illustrated in Figure 5 are alternatives and an embodiment would normally use one of these storage locations.

Figure 6 is a flow diagram schematically illustrating the process of storing data within a memory. At step 92 blocks of memory storage locations are allocated within the memory. These blocks of memory storage locations have a size Z. At step 92 data to be stored is partitioned into blocks of data values. These blocks of data values have a size Z. Thus, the blocks of data values have an uncompressed size which is equal to the size of the blocks of memory storage locations that have been allocated. Thus, even if compression of the blocks of data values cannot be achieved, it is possible to store them within the allocated blocks of memory storage locations. The allocated blocks of memory storage locations can be larger than the blocks of data values in order to accommodate additional information, such as indicators of whether or not compression has been performed and the particular type of compression used as well as other information relating to the management of those data blocks,

At step 96 the first block of data values is selected. Step 98 then selects the compression technique to be used to compress that block of data values. This compression technique may be selected in dependence upon the type of data block or the device which will be a consumer or producer of that data block. As an example, if the data block relates to image data, then the compression technique that may be employed is JPEG or MPEG. If the data block is a portion of a general purpose database, then ZIP compression may be more appropriate. If the device which is to consume the data block is an audio interface, then MP 3 compression may be used.

Step 100 compresses the block of data values to form a compressed block of data values in accordance with the compression technique selected at step 98. Step 102 determines whether or not the compressed size is smaller than Z, i.e. the compressed size is smaller than the uncompressed size. It will be appreciated by those in this technical field that compression processing may in some cases not actually result in a reduction in data size. It may be, for example, that the data is already compressed and applying another compression processing algorithm will result in an increase in data size since all the redundancy has already been removed in the data and/or the second compression technique is ill-suited to the data concerned.

If the determination at step 102 was that the compressed size is smaller than Z, then step 104 stores the compressed block of data values within the memory. If the determination at step 102 was that the compressed size was greater than Z, then step 106 stores the block of data values in uncompressed form to the memory.

Step 108 determines whether or not there are any more blocks of data values that where petitioned in step 94 that need to be stored to the memory. If there are further blocks, then 110 selects the next block of data values and processing returns to step 98, otherwise processing terminates.

Figure 7 schematically illustrates an arrangement of a data processing system 112 (e.g. portion of a SoC) coupled via a memory controller 114 to a memory 116. The memory controller 1 14 can support the present techniques by compressing blocks of data being written to the memory 1 16 through this channel and decompressing blocks of data being read from the memory 1 1 through this channel. The blocks of data written and read by the data processing system as transferred over data bus 1 18 may all be uncompressed. The memory controller 114 is responsible for performing all compression and decompression of the blocks of data and generating accesses to the memory 1 16 (which may be off-chip). The data processing system 1 12 can be unaware of whether or B2010/051692

19

not compression is employed as this can be entirely devolved to the memory controller 114.

Figure 8 schematically illustrates a further example embodiment formed of a data processing system 120 coupled via a bus controller 122 and a memory controller 124 to a memory 126. Further processing elements (not illustrated) may be coupled via bus interface 128 of the bus controller and via the memory controller 124 to the memory 126. These further processing elements may be performing accesses to fragmented data (e.g. different fields within a database) and so have a pattern of use ill-suited to compression of blocks of data values. Thus, accesses via bus interface 128 may be serviced by the memory controller 124 as always requiring the use of uncompressed blocks of data values.

In contrast the data processing system 120 may have a nature (e.g. media processor, network processor, etc) in which the pattern of access is more weighted to sequential access of blocks of data values. The data processing system 120 may provide a feed forward signal to the bus controller 122 indicating the nature of the data being accessed over a data bus 130. The bus controller 122 can then perform appropriate compression/decompression processing, e.g. with the compression scheme selected to match the type of data (e.g. image, audio, etc) being accessed. The bus controller 122 can then transfer the compressed block of data values over the data bus 132 to the memory controller accompanied by physical access parameters (e.g. compressed length) on parameter bus 134. The memory controller 134 can use the access/parameters to control the access to the memory 126, e.g. terminating a read once all the compressed values have been read as indicated by the length.

Figure 9 schematically illustrates a plurality of blocks of memory storage locations 200, 202, 204, 206 each including a compressed block 208, 210, 212, 214. The compressed blocks may differ in size and in this example embodiment are all smaller in size than the size of the block of memory storage locations 200, 202, 204, 206. As shown in Figure 9, each of the compressed blocks 208, 210, 212, 214 includes a control code in the form of an end-of-block code 216 positioned at the end of the compressed block, i.e. at the boundary between the compressed block and the unused portion of the block of memory storage locations 200, 202, 204, 206. Other control codes may be associated with the compressed blocks 208, 210, 212, 214 as will be discussed below.

Figure 10 is a flow diagram schematically illustrating the termination of a read operation in response to detection of an end-of-block code 216. At step 218 the read operation reads N bytes of compressed data from within a compressed block 208, 210, 212, 214. Step 220 then examines this N bytes of compressed data to see if the bit pattern matches an end-of-block code 216. If there is a match, then the read operation terminates. If there is not a match, then processing proceeds to step 222 where the N bytes of compressed data is decompressed. Processing then returns to step 218 for the next N bytes of compressed data to be read from the compressed block 208, 210, 212, 214.

Figure 11 schematically illustrates a memory 224 in the form of a dynamic random access memory (DRAM) coupled to a system-on-chip integrated circuit 226. The memory 224 contains four memory banks 228, 230, 232 and 234 each containing an array of memory bit cells 236. Each memory bank 228, 230, 232 and 234 has an associated array of sense amplifiers 238, 240, 242 and 244 which serve to sense signal values upon bit lines 246 passing through the memory banks and representing bit values stored within the bit cells 236 of a row of bit cells. The row of bit cells is selected for access by wordline signal WL asserted across the memory banks 228, 230, 232 and 234. This type of arrangement of a memory 224 will be familiar to those in this technical field and will not be described further herein.

The output from these sense amplifiers 238, 240, 242 and 244 is stored into a buffer store 248 within the memory 224. This buffer store 248 contains the bit values from a complete row of bit cells as selected by the wordline signal WL. While the bit values continue to be stored in the buffer store 248, subsequent accesses to those same bit values will take place via the buffer store 248 rather than requiring a fresh read of the row of bit cells selected by the wordline signal WL. When a new row of bit cells is to be accessed, then the contents of the buffer store 248 may be written back into the row of bit cells concerned if any of the bit values have been changed. The buffer store 248 thus serves as a form of cache memory. The system-on-chip integrated circuit 224 contains a processor core 250, a graphics processing unit 252, a cache memory 254 and a memory controller 256. The memory controller 256 is responsible for controlling any memory accesses from the system-on-chip integrated circuit 226 to the memory 224. Arbitration circuitry 258 is associated with the memory controller 256 and serves to arbitrate between memory access requests received from different ones of the processor core 250, the graphics processing unit 252 and the cache memory 254. A processing circuitry buffer store 260 is present on the system-on-chip integrated circuit 226 and has a storage size equal in size to that of the buffer store 248 within the memory 224.

The storage size of the buffer store 248 and of the processing circuitry buffer store 260 is selected to be the same size as a block of memory storage locations into which data values are compressed as previously discussed. By matching the size of the buffer store 248 and the processing circuitry buffer store 260 to this size of the block of memory storage locations, energy efficiency gains may be made. Thus, when a block of memory storage locations is to be read from the memory 224, this may be arranged to form one complete row of bit cells within the memory banks 228, 230, 232, 234 which is read into the buffer store 248 in one operation. The transfer of the data values from that block of memory storage locations can then take place directly from the buffer store 248 to the memory controller 256 without requiring any further reads from the bit cells 236. This saves energy and increases speed.

Within the system-on-chip integrated circuit 226 the processing circuitry buffer store 260 has the same size as the buffer store 248 and the memory controller operates such that once a read of data values from a block of memory storage locations which has been placed into the buffer store 248 has started, then this block of data values can be streamed and stored within the processing circuitry buffer store 260 without interruption and awaiting decompression as may be necessary. The memory access transaction associated with this streaming of data values of a block of memory storage locations can be given a block-read priority level for use by the arbitration circuitry 258. The arbitration circuitry 258 responds to the different priority levels of memory access transactions to decide which transactions can interrupt which other transactions and the scheduling of transactions. The block-read priority level associated with the streaming of a compressed block may be set high such that this will normally proceed uninterrupted in an efficient manner. Compressed blocks will normally be required to be transferred as a whole entity and interrupting such transfers would be relatively inefficient even if it results in an increased latency for other transactions which are delayed behind the access to the compressed block.

Figure 12 schematically illustrates a plurality of blocks of memory storage locations in which the size of a compressed block may exceed the size of a block of memory storage locations giving rise to an overflow portion. In this example, each of the blocks of memory storage locations 262, 264, 266 and 268 has the same size. The compressed blocks 270, 272 and 274 all have a size smaller than the size of the block of memory storage locations 262, 264, 266 and 268. However, the compressed block 276 has a size too large to fit within the block of memory storage locations 266 and accordingly an overflow portion 278 needs to be stored elsewhere. An exception code 280 associated with the compressed block 276 includes a pointer field indicating where this overflow portion 278 is stored. It may be in some embodiments that this overflow portion 278 is always stored in an adjacent block of memory storage locations when this has an unused portion. Alternatively, more flexibility may be provided and the pointer field associated with the exception code may point to any other block of memory storage locations. In this example, the block of memory storage locations 264 already has stored therein its own compressed block 272 but it nevertheless has sufficient space left over from storage of this compressed block 272 to leave sufficient room to store the overflow portion 278.

In other embodiments it is possible that an overflow portion may not share a block of memory storage locations with another compressed block but may instead have its own dedicated block of memory storage locations allocated to it, such as under operating system control when the overflow has been detected.

It will be appreciated that in the above there have been described control codes such as the end-of block codes 216 and the exception code 280. In some embodiments these control codes may not be necessary as the decompression of the compressed block will allow the size of the decompressed data to be known and accordingly it may be determined when an end of block has been reached or an overflow has occurred without a requirement to detect a control code within the compressed block. Nevertheless, the use of control codes may be convenient as then this may relax the requirement to decompress the compressed block rapidly such that appropriate control may be exercised over the reading of data so as to early terminate the read (thereby saving energy) or cope with an overflow.

Figure 13 schematically illustrates a frame buffer compression circuit for use in accessing image data stored within a memory 282. This frame buffer compression circuit includes encoders 284 and a decoder 286 for dealing with the decompression of the compressed block. The decoder 286 also serves to detect the control codes EOB and EC.

Figure 14 is similar to the circuit of Figure 13 except that it includes a processing circuitry buffer store 288 into which data values from a block of memory storage locations are streamed from a memory 290 prior to being decompressed. In this embodiment, two dimensional compression techniques may also be used to assist in the compression of the compressed blocks as these represent image data which will often show correlation between different lines of the image which may be exploited to achieve data compression.

Various embodiments have been described herein. The primary actions of the present techniques is carried out by a memory controller as previously described. This memory controller may have the form of one of the frame buffer compression circuits of Figures 13 or 14. It may also have a more general form as illustrated in Figures 1, 5 and 1 1. This memory controller may in some embodiments be responsible for the allocation of memory storage locations to form contiguous blocks of memory storage locations, the partitioning of data into a plurality of blocks of data values to be compressed and the compression of those blocks of data values to form the compressed blocks. These tasks of allocation, partitioning and compression may be achieved in a variety of different ways and with a variety of different embodiment. The example embodiments shown in Figures 13 and 14 are illustrative of some examples of how this may be achieved.

Claims

1. A method of storing data in a memory, said method comprising the steps of:

partitioning said data into a plurality of blocks of data values;

2. A method as claimed in claim 1 , wherein

a given block of data values is associated with a first address; and

said step of storing respective compressed blocks of data stores said compressed blocks of data at a memory storage location whose address is calculable from said first address.

3. A method as claimed in any one of claims 1 and 2, wherein if a given compressed block of data values has a size exceeding a capacity of a corresponding block of memory storage locations, then said step of storing instead stores a corresponding given block of data value in an uncompressed form.

4. A method as claimed in any one of claims 1 , 2 and 3 comprising storing one or more of said plurality of blocks of data values within corresponding blocks of memory storage locations without compressing said one or more of said plurality of blocks of data values.

5. A method as claimed in any one of the preceding claims, comprising associating one or more physical access parameters with each block of memory storage locations.

6. A method as claimed in claim 5, wherein said one or more physical access parameters associated with a block of memory storage locations indicate one or more of:

(i) whether or not a compressed block of data values or a block of data values in uncompressed form is stored therein; and

(ii) an access sequence pattern of a compressed block of data values stored therein.

7. A method as claimed in any one of claims 5 and 6, wherein said one or more physical access parameters associated with a block of memory storage locations provide an indication of the compression scheme used to generate the compressed block of data values stored therein.

8. A method as claimed in any one of claims 5, 6 and 7, wherein said one or more physical access parameters associated with a block of memory storage locations provide an indication of the size of the compressed block of data values stored therein.

9. A method as claimed in claim 8, wherein said one or more physical access parameters associated with a block of memory storage locations include an end-of-block code stored with a compressed block within said block of memory storage locations and indicating an end position of said compressed block.

10. A method as claimed in claim 9, comprising terminating reading of said compressed block from said memory in dependence upon detection of said end-of-block code.

11. A method as claimed in claim 8, wherein said indication of size of the compressed block of data values is used to control an access to read said compressed block of data values from said memory.

12. A method as claimed in any one of claims 5 to 8, wherein said physical access parameter is stored in one of:

table circuitry outside of said memory;

table circuitry inside of said memory; a table of physical access parameters stored within storage locations of said memory; and

a field stored within a block of memory storage locations for a respective compressed block of data values.

13. A method as claimed in any one of claims 5 and 6, comprising reading data from a block of memory storage locations and

(i) if said physical access parameter indicates a compressed block of data values, then decompressing said data; and

(ii) if said physical access parameter indicates a block of data values, then returning said data without decompression.

14. A method as claimed in claim 7, wherein said plurality of compressed blocks of data values are associated with a plurality of devices and said step of compressing uses a compression scheme for a given compressed block of data selected in dependence upon which of said plurality of devices is associated with said given compressed block of data values.

15. A method as claimed in claim 7, wherein said plurality of blocks of data values have a plurality of different block types and said step of compressing uses a compression scheme for a given block of data values selected in dependence upon a block type of said given block of data values.

16. A method as claims in any one of the preceding claims, wherein said step of storing is an off-chip storage operation including generation of memory storage signal upon a memory bus coupled to said memory.

17. A method as claimed in any one of the preceding claims, wherein said step of allocating yields a storage capacity of a block of memory storage locations that is independent of a compression ratio of a compressed block of data values stored within said block of memory storage locations.

18. A method as claimed in any one of the preceding claims, wherein said blocks of memory storage locations have a fixed size.

19. A method as claimed in any one of the preceding claims, comprising calculating a location of a block of memory storage locations using a fixed block size for said blocks of memory storage locations.

20. A method as claimed in any one of the preceding claims, wherein said step of compressing is performed within a memory controller.

21. A method as claimed in any one of the preceding claims, wherein said memory has a main store and a buffer store and a read operation to said memory fills said buffer store with a contiguous block of data values from said main store with subsequent read operations within said contiguous block of data values being made to said buffer store while said buffer store continues to hold said contiguous block of data values.

22. A method as claimed in claim 21, wherein each of said blocks of memory storage locations has a storage capacity equal to a storage capacity of said buffer store.

23. A method as claimed in claim 22, wherein said memory is a dynamic random access memory having a plurality of rows of bit cells and said contiguous block of data values corresponds to data read from one row of said plurality of rows of bit cells.

24. A method as claimed in any one of claims 21 to 23, comprising issuing a data read request from processing circuitry to said memory and storing said contiguous block of data values within a processing circuitry buffer store of said processing circuitry.

25. A method as claimed in claim 24, comprising allocating a block-read priority level to said data read request and controlling subsequent memory access requests to said memory such that said storing of said contiguous block of data values within said processing circuitry buffer store is uninterrupted by memory access requests to said memory having a lower priority than said block-read priority level.

26. Apparatus for processing data, said apparatus comprising: a memory;

storage circuitry configured to store respective ones of said compressed blocks of data values having a storage size less than a storage capacity of one of said blocks of memory storage locations within a corresponding block of memory storage locations; wherein

each of said blocks of memory storage locations has a storage capacity at least as large as required for a corresponding block of data values.

27. Apparatus for processing data, said apparatus comprising:

memory means for storing data values;

partition means for partitioning data into a plurality of blocks of data values;

compression means for compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and

storage means for storing respective ones of said compressed blocks of data values having a storage size less than a storage capacity of one of said blocks of memory storage locations within a corresponding block of memory storage locations; wherein each of said blocks of memory storage locations has a storage capacity at least as large as required for a corresponding block of data values.

28. A method of storing data in a memory, said method comprising the steps of:

partitioning said data into a plurality of blocks of data values;

compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and storing respective ones of said compressed blocks of data values within a corresponding block of memory storage locations; wherein

29. A method as claimed in claim 28, wherein said memory is a dynamic random access memory having a plurality of rows of bit cells and said contiguous block of data values corresponds to data read from one row of said plurality of rows of bit cells.

30. A method as claimed in any of one of claims 28 and 29, comprising issuing a data read request from processing circuitry to said memory and storing said contiguous block of data values within a processing circuitry buffer store of said processing circuitry.

31. A method as claimed in claim 30, wherein a priority level is associated with said data read request and comprising allocating a first priority level to said data read request if said data read request is to a compressed block and allocating a second priority level to said data read request if said data read request is not to a compressed block, said first priority level being higher than said second priority level such that said storing of said compressed block within said processing circuitry buffer store is uninterrupted by memory access requests to said memory having a lower priority than said first priority level.

32. Apparatus for processing data, said apparatus comprising:

a memory;

partitioning circuitry configured to partition data into a plurality of blocks of data values; compression circuitry configured to compress one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks; and

33. Apparatus for processing data, said apparatus comprising:

memory means for storing data values;

34. A method of storing data in a memory, said method comprising the steps of:

partitioning said data into a plurality of blocks of data values; compressing one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks;

35. A method as claimed in claim 34, wherein said one or more control codes include an end-of-block code indicating an end position of said compressed block.

36. A method as claimed in claim 35, comprising terminating reading of said compressed block from said memory in dependence upon detection of said end-of-block code.

37. A method as claimed in claim 34, wherein said one or more control codes include an exception code indicating that a storage size of said compressed block having said exception code is greater than a storage capacity of one of said blocks of memory storage locations leaving an overflow portion of said compressed block.

38. A method as claimed in claim 37, comprising storing said overflow portion in an adjacent block of memory storage locations.

39. A method as claimed in claim 37, comprising storing said overflow portion in an unused portion of a block of memory storage locations pointed to by a pointer field associated with said exception code.

40. A method as claimed in claim 37, comprising storing said overflow portion in an used portion of a block of memory storage locations that is also storing another compressed block.

41. A method as claimed in claim 37, comprising storing said overflow portion in a block of memory storage locations allocated for exclusive use to store said overflow portion.

42. Apparatus for processing data, said apparatus comprising:

a memory;

compression circuitry configured to compress one or more of said plurality of blocks of data values to form corresponding one or more compressed blocks and to add one or more control codes to at least one of said one or more compressed blocks, said one or more control codes providing control for a subsequent read operation of said at least one of said one or more compressed blocks; and

storage circuitry configured to store respective ones of said compressed blocks of data values within a corresponding block of memory storage locations.

43. Apparatus for processing data, said apparatus comprising:

memory means for storing data values;

44. A method of storing data substantially as hereinbefore described with reference to the accompanying drawings.

45. Apparatus for processing data substantially as hereinbefore described with reference to the accompanying drawings.