US20080301400A1 - Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory - Google Patents
Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory Download PDFInfo
- Publication number
- US20080301400A1 US20080301400A1 US12/095,166 US9516606A US2008301400A1 US 20080301400 A1 US20080301400 A1 US 20080301400A1 US 9516606 A US9516606 A US 9516606A US 2008301400 A1 US2008301400 A1 US 2008301400A1
- Authority
- US
- United States
- Prior art keywords
- memory
- matrix
- elements
- address
- memory block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0607—Interleaved addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0207—Addressing or allocation; Relocation with multidimensional access, e.g. row/column, matrix
Definitions
- the invention relates to a method and an arrangement for accessing matrix elements in a memory, in particular in a general purpose memory.
- accessing does also mean storing, i.e. reading and writing.
- a matrix in memory is usually done by assigning one memory element of width W to each matrix element.
- the matrix has M*N elements, where M denotes the number of columns and N the number of rows.
- a memory for storing this matrix needs a size of M*N entries, each of width W.
- all rows or columns are concatenated to a single chain of matrix elements which is mapped to a range of addresses of the memory.
- the matrix is accessible, for example, by a relative address in relation to the beginning of the chain in the memory. Depending on whether the rows or the columns of the matrix are chained up, incrementing the address will provide row wise or column wise access, respectively.
- control logic for such a row wise or column wise access is relatively straight-forward in case only one matrix element shall be accessed at a time. If several adjacent elements shall be read or written at the same time, there occurs a bandwidth loss for at least one access type. Assuming for example that the rows are concatenated, adjacent matrix elements within one row could be located in a single memory cell of width l*W. In this case, for row-wise access, l-elements could be read or written in parallel. For column-wise access, the elements are distributed over several memory cells and can not be accessed at the same time. This assumes a single-ported memory, which is the most area and cost efficient implementation.
- the problem is solved by a method comprising the features given in claim 1 and by an arrangement comprising the features given in claim 9 .
- accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address are performed for the first of said elements in a first memory block using a first local address and for the second of said elements in a different second memory block using a second local address.
- the invention essentially performs a reordering of the matrix elements before they are written to the different memory blocks and after they have been read from these memory blocks, respectively, wherein no two adjacent matrix elements are stored in the same memory block, regardless if they are adjacent in a row or in a column. In other words, elements that are horizontally or vertically adjacent in the matrix are distributed to different memory blocks.
- the invention can be easily extended to a certain number of adjacent matrix elements greater than two if no adjacent matrix elements of this number are stored in the same memory block, i.e. if there is an equal number of memory blocks available. Such accesses can be granted row wise or column wise. This enables simultaneous access to several adjacent elements of a matrix without bandwidth loss. Besides, the number of bus transactions is minimised by this method. Both results lead to a reduction in power consumption of a system utilising the principle according to the invention. For example, in a system for digital video broadcasting for handheld appliances, the power consumption is reduced by minimising power-on time of burst based wireless transmission systems as well as reducing power consumption during power-on times.
- the number of columns and the number of rows of the matrix each are a multiple of the number of memory blocks used. Otherwise, the average bandwidth is reduced, since accesses to the matrix boundaries do not utilize the bandwidth of all memories at the same time. For example, a matrix with size 10 ⁇ 10 and four memories, when accessing one row or column, there will be three accesses, utilizing the memory bandwidth by 10/(4*3).
- said respective memory block and/or said respective local address are determined from a look-up table using said respective relative address for an index. This is a fast way for obtaining the memory blocks and/or the local addresses, but an additional memory is needed for the look-up table.
- said respective memory block is determined from a first sub-group of bits of the respective relative address and/or said respective local address is determined from a second sub-group of bits of the respective relative address.
- said respective memory block and/or said respective local address are calculationally determined from said respective global linear address. This is an easy way for obtaining the memory blocks and/or the local addresses. Memory for a look-up table is not needed.
- the determination can be advantageously performed by shifting or swapping bits of said respective relative address for obtaining said respective memory block and/or for obtaining said respective local address, the local addresses having a narrower address space than the relative addresses.
- bit shifting or swapping operations can be performed without time-consuming additions, subtractions, divisions and multiplications.
- a bit rotation is performed as said shifting or swapping operation. This way, only one operation is necessary to obtain a respective memory block and/or a respective local address.
- a memory block can be determined using a small look-up table having the same size as the pattern after the relative address has been calculationally reduced to the pattern size.
- the local address is then determined from a sub-group of bits of the relative address after rotating the bits.
- a number of memory blocks is used that is a power of two. Several simplifications in determining the memory blocks and the local addresses can be used then. It is necessary to use memory blocks that are accessible simultaneously and independently from each other.
- the arrangement according to the invention comprises a plurality of memory blocks and a memory controller connected to said memory blocks, wherein the memory controller, in case of accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address, performs a first sub-access for the first of said elements in a first memory block using a first local address and a second sub-access for the second of said elements in a different second memory block using a second local address.
- results from one address calculation might be used to determine other addresses. For certain accesses for example, the local addresses might be the same for each memory.
- said memory controller determines said respective memory block and/or said respective local address with said respective relative address.
- the number of memory blocks, the width of the matrix and the height of the matrix are powers of two.
- said first memory block and said second memory block are accessible simultaneously and independently from each other.
- FIG. 1 shows a block diagram of an arrangement according to the invention
- FIG. 2 shows a corresponding scheme of matrix elements, related memory blocks and local addresses and
- FIG. 3 shows a second scheme of matrix elements, related memory blocks and local addresses.
- the arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S.
- the matrix is stored in the memory blocks B p by the memory controller C in such a way that for any group of four adjacent matrix elements, regardless if they are adjacent in a row r or in a column c, each member of such a group is stored in a different one of the four memory blocks B p . This enables accessing four adjacent matrix elements with one single bus request R to the memory controller C.
- the central processing unit U then sends a request R to the memory controller C via the system bus S, the request R containing the type of access to the matrix, i.e. row wise or column wise in read or write mode, a relative address a r for a row wise access or a c for a column wise access and, in case of a write request, a value for the matrix element to be written.
- the memory controller C uses the relative address a r or a c specified in the request R to determine the number of the corresponding memory block B p into which to write or from which to read the requested matrix element and the local address of the corresponding memory cell within the determined memory block B p , both according to the type of access specified in the request R.
- the type of the access, row wise or column wise is determined by a higher address line.
- the matrix is then visible to the programmer of the central processing unit twice, with row access and column access starting at two different base addresses.
- the memory controller C Because no P adjacent matrix elements are stored in the same memory block B p and because all of the memory blocks B p can be simultaneously accessed by the memory controller C, the memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
- any 4 horizontally or vertically adjacent matrix elements can be simultaneously accessed by one single 32-bit bus request R to the arrangement A. If, for example, four horizontally adjacent matrix elements having relative addresses:
- the association of row and column elements changes with each row and column, periodically every P columns and rows.
- Section S 1 shows which element of the matrix is stored in which memory block B p .
- Section S 2 denotes the relative addresses a r that are specified by a processor accessing the matrix row wise.
- Section S 3 shows the relative addresses a c that are specified by the processor accessing the matrix column wise.
- Section S 4 illustrates the local addresses a′ that are used for selecting the matrix element within the corresponding memory block B p . Obviously, no two matrix elements have both the same memory block B p and the same address a′ associated at the same time.
- Section S 5 is equal to section S 4 , but the local addresses a′ arc determined from relative addresses a r according to section S 2 by dividing the relative addresses a r by P:
- this division is the operation that has to be performed on the specified relative address a r given to the memory controller C to create the local address a′ in the related memory buffer B.
- Section S 6 is equal to sections S 4 and S 5 , of course, but is calculated from the relative addresses a c of section S 3 for column wise access.
- the local address a′ is determined then from:
- a ′ (( a c SHL 2) OR ( a c SHR 6)) AND 63.
- the rotation has to be carried out using the bit width of the relative address space, i.e. eight bits in this example.
- p ((a r,c AND 3)+(a r,c SHR 2)) [AND 3 if applicable].
- the rule implies reduction of the relative address to the smallest repeating pattern of memory blocks B p within section S 1 .
- a look-up table could be used for determining the number p of the respective memory block B p .
- Such a look-up table can be as small as the smallest repeating pattern if the relative address is reduced to it first.
- FIGS. 3 and 4 show a arrangement A simplified in comparison to that of FIG. 1 and the schema related thereto, respectively.
- the arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S in the same way as in FIG. 1 . It serves for row wise and/or column wise access requests R as proposed by the invention.
- the local addresses a′ can be determined from a respective sub-group of bits of the relative addresses a r according to section S 2 by:
- the local address a′ can determined from a respective sub-group of bits of the relative addresses a r according to section S 2 by:
- the number p of the respective memory block B P can be determined for row wise and for column wise access requests R by:
Abstract
Description
- The invention relates to a method and an arrangement for accessing matrix elements in a memory, in particular in a general purpose memory.
- According to the invention, accessing does also mean storing, i.e. reading and writing.
- Implementing a matrix in memory is usually done by assigning one memory element of width W to each matrix element. The matrix has M*N elements, where M denotes the number of columns and N the number of rows. Obviously, a memory for storing this matrix needs a size of M*N entries, each of width W. For the implementation, all rows or columns are concatenated to a single chain of matrix elements which is mapped to a range of addresses of the memory. The matrix is accessible, for example, by a relative address in relation to the beginning of the chain in the memory. Depending on whether the rows or the columns of the matrix are chained up, incrementing the address will provide row wise or column wise access, respectively. In order to access concatenated rows column wise, the relative address has to be increased by the number of columns in each step and vice versa. For example, if the rows are chained up, an element in column m, row n can be accessed using the relative address n*M+m, where m=0 . . . M−1, n=0 . . . N−1.
- The control logic for such a row wise or column wise access is relatively straight-forward in case only one matrix element shall be accessed at a time. If several adjacent elements shall be read or written at the same time, there occurs a bandwidth loss for at least one access type. Assuming for example that the rows are concatenated, adjacent matrix elements within one row could be located in a single memory cell of width l*W. In this case, for row-wise access, l-elements could be read or written in parallel. For column-wise access, the elements are distributed over several memory cells and can not be accessed at the same time. This assumes a single-ported memory, which is the most area and cost efficient implementation.
- It is thus an object of this invention to specify a method and an arrangement for accessing matrix elements by which it is possible to access several adjacent elements at the same time without bandwidth loss for row as well as column wise access.
- The problem is solved by a method comprising the features given in
claim 1 and by an arrangement comprising the features given inclaim 9. - Advantageous embodiments of the invention are given in the respective dependent claims.
- According to the invention, accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address are performed for the first of said elements in a first memory block using a first local address and for the second of said elements in a different second memory block using a second local address. In comparison to the prior art, the invention essentially performs a reordering of the matrix elements before they are written to the different memory blocks and after they have been read from these memory blocks, respectively, wherein no two adjacent matrix elements are stored in the same memory block, regardless if they are adjacent in a row or in a column. In other words, elements that are horizontally or vertically adjacent in the matrix are distributed to different memory blocks. The invention can be easily extended to a certain number of adjacent matrix elements greater than two if no adjacent matrix elements of this number are stored in the same memory block, i.e. if there is an equal number of memory blocks available. Such accesses can be granted row wise or column wise. This enables simultaneous access to several adjacent elements of a matrix without bandwidth loss. Besides, the number of bus transactions is minimised by this method. Both results lead to a reduction in power consumption of a system utilising the principle according to the invention. For example, in a system for digital video broadcasting for handheld appliances, the power consumption is reduced by minimising power-on time of burst based wireless transmission systems as well as reducing power consumption during power-on times.
- In an advantageous embodiment, the number of columns and the number of rows of the matrix each are a multiple of the number of memory blocks used. Otherwise, the average bandwidth is reduced, since accesses to the matrix boundaries do not utilize the bandwidth of all memories at the same time. For example, a matrix with
size 10×10 and four memories, when accessing one row or column, there will be three accesses, utilizing the memory bandwidth by 10/(4*3). - In a first possible embodiment, for each of said matrix elements said respective memory block and/or said respective local address are determined from a look-up table using said respective relative address for an index. This is a fast way for obtaining the memory blocks and/or the local addresses, but an additional memory is needed for the look-up table.
- In a second possible embodiment, for each of said matrix elements said respective memory block is determined from a first sub-group of bits of the respective relative address and/or said respective local address is determined from a second sub-group of bits of the respective relative address. This is a fast way for obtaining the memory blocks and/or the local addresses, too. A lookup-table is not required and thus less memory is needed.
- In a third possible embodiment, for each of said matrix elements said respective memory block and/or said respective local address are calculationally determined from said respective global linear address. This is an easy way for obtaining the memory blocks and/or the local addresses. Memory for a look-up table is not needed.
- The determination can be advantageously performed by shifting or swapping bits of said respective relative address for obtaining said respective memory block and/or for obtaining said respective local address, the local addresses having a narrower address space than the relative addresses. Such bit shifting or swapping operations can be performed without time-consuming additions, subtractions, divisions and multiplications.
- Preferably, a bit rotation is performed as said shifting or swapping operation. This way, only one operation is necessary to obtain a respective memory block and/or a respective local address.
- The three embodiments and their enhancements mentioned above can be combined, of course. For example, if the memory blocks are assigned to relative addresses according to a repeated pattern a memory block can be determined using a small look-up table having the same size as the pattern after the relative address has been calculationally reduced to the pattern size. As one possibility, the local address is then determined from a sub-group of bits of the relative address after rotating the bits.
- Preferably, a number of memory blocks is used that is a power of two. Several simplifications in determining the memory blocks and the local addresses can be used then. It is necessary to use memory blocks that are accessible simultaneously and independently from each other.
- The arrangement according to the invention comprises a plurality of memory blocks and a memory controller connected to said memory blocks, wherein the memory controller, in case of accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address, performs a first sub-access for the first of said elements in a first memory block using a first local address and a second sub-access for the second of said elements in a different second memory block using a second local address. Depending on the parameters chosen, results from one address calculation might be used to determine other addresses. For certain accesses for example, the local addresses might be the same for each memory.
- Preferably, for each of said matrix elements said memory controller determines said respective memory block and/or said respective local address with said respective relative address.
- In an advantageous embodiment, the number of memory blocks, the width of the matrix and the height of the matrix are powers of two. Several simplifications in determining the memory blocks and the local addresses can be used for a fast memory access then.
- Necessarily, said first memory block and said second memory block are accessible simultaneously and independently from each other.
- In the following, the invention is explained in further detail with drawings.
-
FIG. 1 shows a block diagram of an arrangement according to the invention, -
FIG. 2 shows a corresponding scheme of matrix elements, related memory blocks and local addresses and -
FIG. 3 shows a second scheme of matrix elements, related memory blocks and local addresses. - The arrangement A of
FIG. 1 comprises four memory blocks Bp with P=4, numbered from p=0 to p=3 and connected to a memory controller C. The arrangement A provides 32-bit read/write capability for a matrix having (M=16)*(N=16)=256 elements of 8 bits size. The arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S. - The matrix is stored in the memory blocks Bp by the memory controller C in such a way that for any group of four adjacent matrix elements, regardless if they are adjacent in a row r or in a column c, each member of such a group is stored in a different one of the four memory blocks Bp. This enables accessing four adjacent matrix elements with one single bus request R to the memory controller C.
- If a matrix element (m,n), where m=0 . . . M−1 and n=0 . . . N−1, is to be accessed by the central processing unit U the central processing unit U calculates a relative address ar for a row wise access or a: for a column wise access according to the instructions it is programmed with. The central processing unit U then sends a request R to the memory controller C via the system bus S, the request R containing the type of access to the matrix, i.e. row wise or column wise in read or write mode, a relative address ar for a row wise access or ac for a column wise access and, in case of a write request, a value for the matrix element to be written. If the memory controller C receives such a request R it uses the relative address ar or ac specified in the request R to determine the number of the corresponding memory block Bp into which to write or from which to read the requested matrix element and the local address of the corresponding memory cell within the determined memory block Bp, both according to the type of access specified in the request R.
- In an advantageous embodiment, the type of the access, row wise or column wise is determined by a higher address line. The matrix is then visible to the programmer of the central processing unit twice, with row access and column access starting at two different base addresses.
- In general, the invention can be implemented using the following steps:
-
- a) Organising a memory, in particular a general purpose memory, into P independently and simultaneously accessible memory blocks of depth N*M/P elements having width W. To simplify the address generation logic, the parameters N, M and P should be chosen to be powers of 2 (see for more detail
FIGS. 3 and 4 ). - b) Arranging the relationship between matrix and memory elements, for example as follows:
- The associated memory block Bp for each matrix element is cycled from 0 to P−1, starting from p=0 for row r with n=0 and column c with m=0, starting at p=1 for row r with n=1 and column c with m=1 and so on. Row n=0 to n=P−1 of column m=0 are assigned to the memory blocks Bp with p=0 to p=P−1, respectively, the same is applied to row n=i*P to n=(i+1)*P−1, until the column is fully assigned.
- The rows of column m=1 are assigned to the memory blocks Bp with p=1 to p=P−1 and p=0, so the association for the second row n=1 is repeated with the same pattern, but starting at p=1 instead of p=0. These patterns are repeated throughout the matrix. This cycling applies to both row wise and column wise view. Of course, there are several other possibilities for assigning the memory buffers Bp to matrix elements, for example simply the other way round or even randomly. The essential condition is that no P adjacent matrix elements are stored in the same memory block Bp.
- c) Implementing shuffle logic in the memory controller C for accessing the matrix elements. This can be done, for example, by means of a look-up table, by rotating the elements during a row wise or column wise access, or by calculating the number p of the respective memory block Bp and the respective local address a′ otherwise.
- a) Organising a memory, in particular a general purpose memory, into P independently and simultaneously accessible memory blocks of depth N*M/P elements having width W. To simplify the address generation logic, the parameters N, M and P should be chosen to be powers of 2 (see for more detail
- Because no P adjacent matrix elements are stored in the same memory block Bp and because all of the memory blocks Bp can be simultaneously accessed by the memory controller C, the memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
- In the example of
FIG. 1 , any 4 horizontally or vertically adjacent matrix elements can be simultaneously accessed by one single 32-bit bus request R to the arrangement A. If, for example, four horizontally adjacent matrix elements having relative addresses: -
ar1=81, a r2 =a r1+1=82, a r3 =a r1+2=83, a r4 =a r1+3=84 - are row wise requested by the central processing unit U the memory controller C determines the related first, second, third an fourth memory blocks Bp1, Bp2, Bp3, Bp4 and the related first, second, third and fourth local addresses a′1, a′2, a′3, a′4 from the respective relative addresses ar1, ar2, ar3, ar4, resulting in p=2, 3, 0, 1 and a′=20, 20, 20, 21, respectively.
- If the arrangement A is used in a burst based wireless transmission system, this leads to a reduction in power consumption by minimising power-on time of as well as reducing power consumption during power-on times.
-
FIG. 2 illustrates the schema for the example of M=16, N==16, P=4 as described above. It can be easily adapted to numbers like M=256 and N=1024 as used in digital video broadcasting for handheld appliances. The elements of row n=0, 4, 8 . . . are associated with memory blocks Bp with p=0, 1, 2, 3, 0, 1, 2, 3 . . . . The elements of rows n=1, 5, 9 . . . are associated with memory blocks Bp with p=1, 2, 3, 0, 1, 2, 3, 0 . . . , the elements of rows n=2, 6, 10 . . . are associated with memory blocks Bp with p=2, 3, 0, 1, 2, 3, 0, 1 . . . . The association of row and column elements changes with each row and column, periodically every P columns and rows. - Section S1 shows which element of the matrix is stored in which memory block Bp.
- Section S2 denotes the relative addresses ar that are specified by a processor accessing the matrix row wise.
- Section S3 shows the relative addresses ac that are specified by the processor accessing the matrix column wise.
- Section S4 illustrates the local addresses a′ that are used for selecting the matrix element within the corresponding memory block Bp. Obviously, no two matrix elements have both the same memory block Bp and the same address a′ associated at the same time. The first P elements of
row 0 are accessed via a local address a′=0, the next P elements via a local address a′=1. The first P elements ofrow 1 are accessed using a local address a′=P=4. The same rules apply for both row wise and column wise access, of course. - Section S5 is equal to section S4, but the local addresses a′ arc determined from relative addresses ar according to section S2 by dividing the relative addresses ar by P:
-
a′=ar DIV P. - Thus, this division is the operation that has to be performed on the specified relative address ar given to the memory controller C to create the local address a′ in the related memory buffer B. The division can be replaced by a corresponding bit shifting operation as P is a power of 2 in this example: a′=ar
SHR 2. So the local address a′ is determined from a group of the upper six bits of ar in row wise access mode. - Section S6 is equal to sections S4 and S5, of course, but is calculated from the relative addresses ac of section S3 for column wise access. For example, the element having m=7, n=6 is specified by the relative address
-
a c=7*16+6=11 - in column wise access mode. The local address a′ is determined then from:
-
a′=(a c SHL 2) OR (a c SHR 6), - of course narrowed to the address space of the memory blocks Bp, i.e.
-
a′=((a c SHL 2) OR (a c SHR 6)) AND 63. - This combination of shifting operations can be expressed as a single rotation operation: a′=ac
ROTL 2 and a′=(ac ROTL 2) AND 63, respectively. The rotation has to be carried out using the bit width of the relative address space, i.e. eight bits in this example. - For both row and column access the address translation can be performed with high speed. It is worth noting that no addition or multiplication is necessary to determine the local address a′, thus avoiding carry-chains and therefore keeping the critical paths short. This is valid as long as M, N and P are powers of two.
- In this example, the first elements of row n=0, 4, 8 are located in memory block B0, whereas the first elements of row n=1, 5, 9 are located in memory block B1. Therefore, the P inputs and outputs of the memory blocks Bp have to be rotated according to the relative address ar or ac, respectively, for creating the input and output data of the memory controller C. For example, the number p of the respective memory block Bp can be calculationally determined by: p=((ar,c MOD P)+(ar,c DIV P)) followed by MOD P if applicable. This rule applies both for row wise and for column wise access requests R. As in this example P is a power of 2, this calculation can be performed using fast bit operations:
- p=((ar,c AND 3)+(ar,c SHR 2)) [AND 3 if applicable]. The rule implies reduction of the relative address to the smallest repeating pattern of memory blocks Bp within section S1. Of course, instead of such a rule a look-up table could be used for determining the number p of the respective memory block Bp. Such a look-up table can be as small as the smallest repeating pattern if the relative address is reduced to it first.
-
FIGS. 3 and 4 show a arrangement A simplified in comparison to that ofFIG. 1 and the schema related thereto, respectively. The arrangement A comprises two memory blocks Bp with P=2, numbered from p=0 to p=1 and connected to a memory controller C. Both memory blocks Bp are accessible independently and simultaneously. The arrangement A provides 32-bit read/write capability for a matrix having (M=4)*(N=4)=16 elements of 8 bits size. The arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S in the same way as inFIG. 1 . It serves for row wise and/or column wise access requests R as proposed by the invention. - The numbers p=0, p=1 of the memory blocks Bp assigned to the matrix elements are alternating in all rows and all columns. No two matrix elements adjacent in a row or in a column are therefore stored in the same memory block Bp. Both memory blocks Bp can be simultaneously accessed by the memory controller C. The memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
- For a row wise access, the local addresses a′ can be determined from a respective sub-group of bits of the relative addresses ar according to section S2 by:
-
a′=arSHR 1. - For column wise access mode, the local address a′ can determined from a respective sub-group of bits of the relative addresses ar according to section S2 by:
-
a′=(a c SHL 1) OR (a c SHR 3), - This combination of shifting operations can be expressed as a single rotation operation in a 4-bits address space: a′=ac
ROTL 1. - The number p of the respective memory block BP can be determined for row wise and for column wise access requests R by:
-
p=((a r/c AND 1)+(a r/c SHR 1)) - All calculations and bit operations are restricted to the 3-bits address space of the memory blocks Bp.
-
- A Arrangement
- ar Relative address for row wise access
- ac Relative address for column wise access
- a′ Local address
- Bp Memory blocks
- C Memory controller
- M Number of columns
- m Column
- N Number of rows
- n Row
- P Number of memory blocks
- p Number of memory block
- R Request
- S System bus
- U Central processing unit
Claims (12)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05111546 | 2005-12-01 | ||
EP05111546.7 | 2005-12-01 | ||
PCT/IB2006/054500 WO2007063501A2 (en) | 2005-12-01 | 2006-11-29 | Method and arrangement for efficiently accessing matrix elements in a memory |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080301400A1 true US20080301400A1 (en) | 2008-12-04 |
Family
ID=38090785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/095,166 Abandoned US20080301400A1 (en) | 2005-12-01 | 2006-11-29 | Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory |
Country Status (5)
Country | Link |
---|---|
US (1) | US20080301400A1 (en) |
EP (1) | EP1958069A2 (en) |
JP (1) | JP2009517763A (en) |
CN (1) | CN101322107A (en) |
WO (1) | WO2007063501A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541749A (en) * | 2011-12-31 | 2012-07-04 | 中国科学院自动化研究所 | Multi-granularity parallel storage system |
US20140223445A1 (en) * | 2013-02-07 | 2014-08-07 | Advanced Micro Devices, Inc. | Selecting a Resource from a Set of Resources for Performing an Operation |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101782878B (en) * | 2009-04-03 | 2011-11-16 | 北京理工大学 | Data storing method based on distributed memory |
CN108053852B (en) * | 2017-11-03 | 2020-05-19 | 华中科技大学 | Writing method of resistive random access memory based on cross point array |
CN111176582A (en) * | 2019-12-31 | 2020-05-19 | 北京百度网讯科技有限公司 | Matrix storage method, matrix access device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4918600A (en) * | 1988-08-01 | 1990-04-17 | Board Of Regents, University Of Texas System | Dynamic address mapping for conflict-free vector access |
US6297857B1 (en) * | 1994-03-24 | 2001-10-02 | Discovision Associates | Method for accessing banks of DRAM |
US20050071409A1 (en) * | 2003-09-29 | 2005-03-31 | International Business Machines Corporation | Method and structure for producing high performance linear algebra routines using register block data format routines |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6386061A (en) * | 1986-09-30 | 1988-04-16 | Hitachi Ltd | Memory allocating method for multi-processor |
JPH08194641A (en) * | 1995-01-17 | 1996-07-30 | Fujitsu Ltd | Method for storing two-dimensional data into synchronizing dram and synchronizing dram access controller |
US6604166B1 (en) * | 1998-12-30 | 2003-08-05 | Silicon Automation Systems Limited | Memory architecture for parallel data access along any given dimension of an n-dimensional rectangular data array |
JP3985797B2 (en) * | 2004-04-16 | 2007-10-03 | ソニー株式会社 | Processor |
-
2006
- 2006-11-29 WO PCT/IB2006/054500 patent/WO2007063501A2/en active Application Filing
- 2006-11-29 US US12/095,166 patent/US20080301400A1/en not_active Abandoned
- 2006-11-29 JP JP2008542915A patent/JP2009517763A/en not_active Withdrawn
- 2006-11-29 EP EP06831995A patent/EP1958069A2/en not_active Ceased
- 2006-11-29 CN CNA2006800451086A patent/CN101322107A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4918600A (en) * | 1988-08-01 | 1990-04-17 | Board Of Regents, University Of Texas System | Dynamic address mapping for conflict-free vector access |
US6297857B1 (en) * | 1994-03-24 | 2001-10-02 | Discovision Associates | Method for accessing banks of DRAM |
US20050071409A1 (en) * | 2003-09-29 | 2005-03-31 | International Business Machines Corporation | Method and structure for producing high performance linear algebra routines using register block data format routines |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541749A (en) * | 2011-12-31 | 2012-07-04 | 中国科学院自动化研究所 | Multi-granularity parallel storage system |
CN102541749B (en) * | 2011-12-31 | 2014-09-17 | 中国科学院自动化研究所 | Multi-granularity parallel storage system |
US20140223445A1 (en) * | 2013-02-07 | 2014-08-07 | Advanced Micro Devices, Inc. | Selecting a Resource from a Set of Resources for Performing an Operation |
US9183055B2 (en) * | 2013-02-07 | 2015-11-10 | Advanced Micro Devices, Inc. | Selecting a resource from a set of resources for performing an operation |
US20160062803A1 (en) * | 2013-02-07 | 2016-03-03 | Advanced Micro Devices, Inc. | Selecting a resource from a set of resources for performing an operation |
US9766936B2 (en) * | 2013-02-07 | 2017-09-19 | Advanced Micro Devices, Inc. | Selecting a resource from a set of resources for performing an operation |
Also Published As
Publication number | Publication date |
---|---|
WO2007063501A3 (en) | 2007-11-15 |
WO2007063501A2 (en) | 2007-06-07 |
CN101322107A (en) | 2008-12-10 |
JP2009517763A (en) | 2009-04-30 |
EP1958069A2 (en) | 2008-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6381668B1 (en) | Address mapping for system memory | |
US6144604A (en) | Simultaneous addressing using single-port RAMs | |
EP0507577B1 (en) | Flexible N-way memory interleaving | |
US6662285B1 (en) | User configurable memory system having local and global memory blocks | |
US6430672B1 (en) | Method for performing address mapping using two lookup tables | |
CN110096450B (en) | Multi-granularity parallel storage system and storage | |
US20080301400A1 (en) | Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory | |
CN106846255A (en) | Image rotation implementation method and device | |
US6233665B1 (en) | Mapping shared DRAM address bits by accessing data memory in page mode cache status memory in word mode | |
US9146696B2 (en) | Multi-granularity parallel storage system and storage | |
US6453380B1 (en) | Address mapping for configurable memory system | |
KR20180006645A (en) | Semiconductor device including a memory buffer | |
KR20110121641A (en) | Multimode accessible storage facility | |
US6906978B2 (en) | Flexible integrated memory | |
US20110208939A1 (en) | Memory access system and memory access control method | |
JP5059330B2 (en) | Memory address generation circuit and memory controller including the same | |
KR20060113019A (en) | System for controlling memory | |
EP0837474B1 (en) | Method for optimising a memory cell matrix for a semiconductor integrated microcontroller | |
US20030031072A1 (en) | Memory with row-wise write and column-wise read | |
US20030009642A1 (en) | Data storing circuit and data processing apparatus | |
US20230307036A1 (en) | Storage and Accessing Methods for Parameters in Streaming AI Accelerator Chip | |
CN116150046B (en) | Cache circuit | |
US7457937B1 (en) | Method and system for implementing low overhead memory access in transpose operations | |
KR20170100415A (en) | Memory controller and integrated circuit system | |
US6834334B2 (en) | Method and apparatus for address decoding of embedded DRAM devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:038017/0058 Effective date: 20160218 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12092129 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:039361/0212 Effective date: 20160218 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:042762/0145 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:042985/0001 Effective date: 20160218 |
|
AS | Assignment |
Owner name: NXP B.V., NETHERLANDS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC.;REEL/FRAME:050745/0001 Effective date: 20190903 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042762 FRAME 0145. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051145/0184 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0387 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042985 FRAME 0001. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0001 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051030/0001 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 042985 FRAME 0001. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0001 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0387 Effective date: 20160218 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 042762 FRAME 0145. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051145/0184 Effective date: 20160218 |