US20080301400A1 - Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory - Google Patents

Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory Download PDF

Info

Publication number
US20080301400A1
US20080301400A1 US12/095,166 US9516606A US2008301400A1 US 20080301400 A1 US20080301400 A1 US 20080301400A1 US 9516606 A US9516606 A US 9516606A US 2008301400 A1 US2008301400 A1 US 2008301400A1
Authority
US
United States
Prior art keywords
memory
matrix
elements
address
memory block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/095,166
Inventor
Dietmar Gassmann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Morgan Stanley Senior Funding Inc
Original Assignee
NXP BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NXP BV filed Critical NXP BV
Publication of US20080301400A1 publication Critical patent/US20080301400A1/en
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY AGREEMENT SUPPLEMENT Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12092129 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to NXP B.V. reassignment NXP B.V. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: MORGAN STANLEY SENIOR FUNDING, INC.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042985 FRAME 0001. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042762 FRAME 0145. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT. Assignors: NXP B.V.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0607Interleaved addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0207Addressing or allocation; Relocation with multidimensional access, e.g. row/column, matrix

Definitions

  • the invention relates to a method and an arrangement for accessing matrix elements in a memory, in particular in a general purpose memory.
  • accessing does also mean storing, i.e. reading and writing.
  • a matrix in memory is usually done by assigning one memory element of width W to each matrix element.
  • the matrix has M*N elements, where M denotes the number of columns and N the number of rows.
  • a memory for storing this matrix needs a size of M*N entries, each of width W.
  • all rows or columns are concatenated to a single chain of matrix elements which is mapped to a range of addresses of the memory.
  • the matrix is accessible, for example, by a relative address in relation to the beginning of the chain in the memory. Depending on whether the rows or the columns of the matrix are chained up, incrementing the address will provide row wise or column wise access, respectively.
  • control logic for such a row wise or column wise access is relatively straight-forward in case only one matrix element shall be accessed at a time. If several adjacent elements shall be read or written at the same time, there occurs a bandwidth loss for at least one access type. Assuming for example that the rows are concatenated, adjacent matrix elements within one row could be located in a single memory cell of width l*W. In this case, for row-wise access, l-elements could be read or written in parallel. For column-wise access, the elements are distributed over several memory cells and can not be accessed at the same time. This assumes a single-ported memory, which is the most area and cost efficient implementation.
  • the problem is solved by a method comprising the features given in claim 1 and by an arrangement comprising the features given in claim 9 .
  • accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address are performed for the first of said elements in a first memory block using a first local address and for the second of said elements in a different second memory block using a second local address.
  • the invention essentially performs a reordering of the matrix elements before they are written to the different memory blocks and after they have been read from these memory blocks, respectively, wherein no two adjacent matrix elements are stored in the same memory block, regardless if they are adjacent in a row or in a column. In other words, elements that are horizontally or vertically adjacent in the matrix are distributed to different memory blocks.
  • the invention can be easily extended to a certain number of adjacent matrix elements greater than two if no adjacent matrix elements of this number are stored in the same memory block, i.e. if there is an equal number of memory blocks available. Such accesses can be granted row wise or column wise. This enables simultaneous access to several adjacent elements of a matrix without bandwidth loss. Besides, the number of bus transactions is minimised by this method. Both results lead to a reduction in power consumption of a system utilising the principle according to the invention. For example, in a system for digital video broadcasting for handheld appliances, the power consumption is reduced by minimising power-on time of burst based wireless transmission systems as well as reducing power consumption during power-on times.
  • the number of columns and the number of rows of the matrix each are a multiple of the number of memory blocks used. Otherwise, the average bandwidth is reduced, since accesses to the matrix boundaries do not utilize the bandwidth of all memories at the same time. For example, a matrix with size 10 ⁇ 10 and four memories, when accessing one row or column, there will be three accesses, utilizing the memory bandwidth by 10/(4*3).
  • said respective memory block and/or said respective local address are determined from a look-up table using said respective relative address for an index. This is a fast way for obtaining the memory blocks and/or the local addresses, but an additional memory is needed for the look-up table.
  • said respective memory block is determined from a first sub-group of bits of the respective relative address and/or said respective local address is determined from a second sub-group of bits of the respective relative address.
  • said respective memory block and/or said respective local address are calculationally determined from said respective global linear address. This is an easy way for obtaining the memory blocks and/or the local addresses. Memory for a look-up table is not needed.
  • the determination can be advantageously performed by shifting or swapping bits of said respective relative address for obtaining said respective memory block and/or for obtaining said respective local address, the local addresses having a narrower address space than the relative addresses.
  • bit shifting or swapping operations can be performed without time-consuming additions, subtractions, divisions and multiplications.
  • a bit rotation is performed as said shifting or swapping operation. This way, only one operation is necessary to obtain a respective memory block and/or a respective local address.
  • a memory block can be determined using a small look-up table having the same size as the pattern after the relative address has been calculationally reduced to the pattern size.
  • the local address is then determined from a sub-group of bits of the relative address after rotating the bits.
  • a number of memory blocks is used that is a power of two. Several simplifications in determining the memory blocks and the local addresses can be used then. It is necessary to use memory blocks that are accessible simultaneously and independently from each other.
  • the arrangement according to the invention comprises a plurality of memory blocks and a memory controller connected to said memory blocks, wherein the memory controller, in case of accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address, performs a first sub-access for the first of said elements in a first memory block using a first local address and a second sub-access for the second of said elements in a different second memory block using a second local address.
  • results from one address calculation might be used to determine other addresses. For certain accesses for example, the local addresses might be the same for each memory.
  • said memory controller determines said respective memory block and/or said respective local address with said respective relative address.
  • the number of memory blocks, the width of the matrix and the height of the matrix are powers of two.
  • said first memory block and said second memory block are accessible simultaneously and independently from each other.
  • FIG. 1 shows a block diagram of an arrangement according to the invention
  • FIG. 2 shows a corresponding scheme of matrix elements, related memory blocks and local addresses and
  • FIG. 3 shows a second scheme of matrix elements, related memory blocks and local addresses.
  • the arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S.
  • the matrix is stored in the memory blocks B p by the memory controller C in such a way that for any group of four adjacent matrix elements, regardless if they are adjacent in a row r or in a column c, each member of such a group is stored in a different one of the four memory blocks B p . This enables accessing four adjacent matrix elements with one single bus request R to the memory controller C.
  • the central processing unit U then sends a request R to the memory controller C via the system bus S, the request R containing the type of access to the matrix, i.e. row wise or column wise in read or write mode, a relative address a r for a row wise access or a c for a column wise access and, in case of a write request, a value for the matrix element to be written.
  • the memory controller C uses the relative address a r or a c specified in the request R to determine the number of the corresponding memory block B p into which to write or from which to read the requested matrix element and the local address of the corresponding memory cell within the determined memory block B p , both according to the type of access specified in the request R.
  • the type of the access, row wise or column wise is determined by a higher address line.
  • the matrix is then visible to the programmer of the central processing unit twice, with row access and column access starting at two different base addresses.
  • the memory controller C Because no P adjacent matrix elements are stored in the same memory block B p and because all of the memory blocks B p can be simultaneously accessed by the memory controller C, the memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
  • any 4 horizontally or vertically adjacent matrix elements can be simultaneously accessed by one single 32-bit bus request R to the arrangement A. If, for example, four horizontally adjacent matrix elements having relative addresses:
  • the association of row and column elements changes with each row and column, periodically every P columns and rows.
  • Section S 1 shows which element of the matrix is stored in which memory block B p .
  • Section S 2 denotes the relative addresses a r that are specified by a processor accessing the matrix row wise.
  • Section S 3 shows the relative addresses a c that are specified by the processor accessing the matrix column wise.
  • Section S 4 illustrates the local addresses a′ that are used for selecting the matrix element within the corresponding memory block B p . Obviously, no two matrix elements have both the same memory block B p and the same address a′ associated at the same time.
  • Section S 5 is equal to section S 4 , but the local addresses a′ arc determined from relative addresses a r according to section S 2 by dividing the relative addresses a r by P:
  • this division is the operation that has to be performed on the specified relative address a r given to the memory controller C to create the local address a′ in the related memory buffer B.
  • Section S 6 is equal to sections S 4 and S 5 , of course, but is calculated from the relative addresses a c of section S 3 for column wise access.
  • the local address a′ is determined then from:
  • a ′ (( a c SHL 2) OR ( a c SHR 6)) AND 63.
  • the rotation has to be carried out using the bit width of the relative address space, i.e. eight bits in this example.
  • p ((a r,c AND 3)+(a r,c SHR 2)) [AND 3 if applicable].
  • the rule implies reduction of the relative address to the smallest repeating pattern of memory blocks B p within section S 1 .
  • a look-up table could be used for determining the number p of the respective memory block B p .
  • Such a look-up table can be as small as the smallest repeating pattern if the relative address is reduced to it first.
  • FIGS. 3 and 4 show a arrangement A simplified in comparison to that of FIG. 1 and the schema related thereto, respectively.
  • the arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S in the same way as in FIG. 1 . It serves for row wise and/or column wise access requests R as proposed by the invention.
  • the local addresses a′ can be determined from a respective sub-group of bits of the relative addresses a r according to section S 2 by:
  • the local address a′ can determined from a respective sub-group of bits of the relative addresses a r according to section S 2 by:
  • the number p of the respective memory block B P can be determined for row wise and for column wise access requests R by:

Abstract

The invention relates to a method for accessing matrix elements, wherein accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address (ar, ac) are performed for the first of said elements in a first memory block (Bp1) using a first local address (a′1) and for the second of said elements in a different second memory block (Bp2) using a second local address (a′2)

Description

  • The invention relates to a method and an arrangement for accessing matrix elements in a memory, in particular in a general purpose memory.
  • According to the invention, accessing does also mean storing, i.e. reading and writing.
  • Implementing a matrix in memory is usually done by assigning one memory element of width W to each matrix element. The matrix has M*N elements, where M denotes the number of columns and N the number of rows. Obviously, a memory for storing this matrix needs a size of M*N entries, each of width W. For the implementation, all rows or columns are concatenated to a single chain of matrix elements which is mapped to a range of addresses of the memory. The matrix is accessible, for example, by a relative address in relation to the beginning of the chain in the memory. Depending on whether the rows or the columns of the matrix are chained up, incrementing the address will provide row wise or column wise access, respectively. In order to access concatenated rows column wise, the relative address has to be increased by the number of columns in each step and vice versa. For example, if the rows are chained up, an element in column m, row n can be accessed using the relative address n*M+m, where m=0 . . . M−1, n=0 . . . N−1.
  • The control logic for such a row wise or column wise access is relatively straight-forward in case only one matrix element shall be accessed at a time. If several adjacent elements shall be read or written at the same time, there occurs a bandwidth loss for at least one access type. Assuming for example that the rows are concatenated, adjacent matrix elements within one row could be located in a single memory cell of width l*W. In this case, for row-wise access, l-elements could be read or written in parallel. For column-wise access, the elements are distributed over several memory cells and can not be accessed at the same time. This assumes a single-ported memory, which is the most area and cost efficient implementation.
  • It is thus an object of this invention to specify a method and an arrangement for accessing matrix elements by which it is possible to access several adjacent elements at the same time without bandwidth loss for row as well as column wise access.
  • The problem is solved by a method comprising the features given in claim 1 and by an arrangement comprising the features given in claim 9.
  • Advantageous embodiments of the invention are given in the respective dependent claims.
  • According to the invention, accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address are performed for the first of said elements in a first memory block using a first local address and for the second of said elements in a different second memory block using a second local address. In comparison to the prior art, the invention essentially performs a reordering of the matrix elements before they are written to the different memory blocks and after they have been read from these memory blocks, respectively, wherein no two adjacent matrix elements are stored in the same memory block, regardless if they are adjacent in a row or in a column. In other words, elements that are horizontally or vertically adjacent in the matrix are distributed to different memory blocks. The invention can be easily extended to a certain number of adjacent matrix elements greater than two if no adjacent matrix elements of this number are stored in the same memory block, i.e. if there is an equal number of memory blocks available. Such accesses can be granted row wise or column wise. This enables simultaneous access to several adjacent elements of a matrix without bandwidth loss. Besides, the number of bus transactions is minimised by this method. Both results lead to a reduction in power consumption of a system utilising the principle according to the invention. For example, in a system for digital video broadcasting for handheld appliances, the power consumption is reduced by minimising power-on time of burst based wireless transmission systems as well as reducing power consumption during power-on times.
  • In an advantageous embodiment, the number of columns and the number of rows of the matrix each are a multiple of the number of memory blocks used. Otherwise, the average bandwidth is reduced, since accesses to the matrix boundaries do not utilize the bandwidth of all memories at the same time. For example, a matrix with size 10×10 and four memories, when accessing one row or column, there will be three accesses, utilizing the memory bandwidth by 10/(4*3).
  • In a first possible embodiment, for each of said matrix elements said respective memory block and/or said respective local address are determined from a look-up table using said respective relative address for an index. This is a fast way for obtaining the memory blocks and/or the local addresses, but an additional memory is needed for the look-up table.
  • In a second possible embodiment, for each of said matrix elements said respective memory block is determined from a first sub-group of bits of the respective relative address and/or said respective local address is determined from a second sub-group of bits of the respective relative address. This is a fast way for obtaining the memory blocks and/or the local addresses, too. A lookup-table is not required and thus less memory is needed.
  • In a third possible embodiment, for each of said matrix elements said respective memory block and/or said respective local address are calculationally determined from said respective global linear address. This is an easy way for obtaining the memory blocks and/or the local addresses. Memory for a look-up table is not needed.
  • The determination can be advantageously performed by shifting or swapping bits of said respective relative address for obtaining said respective memory block and/or for obtaining said respective local address, the local addresses having a narrower address space than the relative addresses. Such bit shifting or swapping operations can be performed without time-consuming additions, subtractions, divisions and multiplications.
  • Preferably, a bit rotation is performed as said shifting or swapping operation. This way, only one operation is necessary to obtain a respective memory block and/or a respective local address.
  • The three embodiments and their enhancements mentioned above can be combined, of course. For example, if the memory blocks are assigned to relative addresses according to a repeated pattern a memory block can be determined using a small look-up table having the same size as the pattern after the relative address has been calculationally reduced to the pattern size. As one possibility, the local address is then determined from a sub-group of bits of the relative address after rotating the bits.
  • Preferably, a number of memory blocks is used that is a power of two. Several simplifications in determining the memory blocks and the local addresses can be used then. It is necessary to use memory blocks that are accessible simultaneously and independently from each other.
  • The arrangement according to the invention comprises a plurality of memory blocks and a memory controller connected to said memory blocks, wherein the memory controller, in case of accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address, performs a first sub-access for the first of said elements in a first memory block using a first local address and a second sub-access for the second of said elements in a different second memory block using a second local address. Depending on the parameters chosen, results from one address calculation might be used to determine other addresses. For certain accesses for example, the local addresses might be the same for each memory.
  • Preferably, for each of said matrix elements said memory controller determines said respective memory block and/or said respective local address with said respective relative address.
  • In an advantageous embodiment, the number of memory blocks, the width of the matrix and the height of the matrix are powers of two. Several simplifications in determining the memory blocks and the local addresses can be used for a fast memory access then.
  • Necessarily, said first memory block and said second memory block are accessible simultaneously and independently from each other.
  • In the following, the invention is explained in further detail with drawings.
  • FIG. 1 shows a block diagram of an arrangement according to the invention,
  • FIG. 2 shows a corresponding scheme of matrix elements, related memory blocks and local addresses and
  • FIG. 3 shows a second scheme of matrix elements, related memory blocks and local addresses.
  • The arrangement A of FIG. 1 comprises four memory blocks Bp with P=4, numbered from p=0 to p=3 and connected to a memory controller C. The arrangement A provides 32-bit read/write capability for a matrix having (M=16)*(N=16)=256 elements of 8 bits size. The arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S.
  • The matrix is stored in the memory blocks Bp by the memory controller C in such a way that for any group of four adjacent matrix elements, regardless if they are adjacent in a row r or in a column c, each member of such a group is stored in a different one of the four memory blocks Bp. This enables accessing four adjacent matrix elements with one single bus request R to the memory controller C.
  • If a matrix element (m,n), where m=0 . . . M−1 and n=0 . . . N−1, is to be accessed by the central processing unit U the central processing unit U calculates a relative address ar for a row wise access or a: for a column wise access according to the instructions it is programmed with. The central processing unit U then sends a request R to the memory controller C via the system bus S, the request R containing the type of access to the matrix, i.e. row wise or column wise in read or write mode, a relative address ar for a row wise access or ac for a column wise access and, in case of a write request, a value for the matrix element to be written. If the memory controller C receives such a request R it uses the relative address ar or ac specified in the request R to determine the number of the corresponding memory block Bp into which to write or from which to read the requested matrix element and the local address of the corresponding memory cell within the determined memory block Bp, both according to the type of access specified in the request R.
  • In an advantageous embodiment, the type of the access, row wise or column wise is determined by a higher address line. The matrix is then visible to the programmer of the central processing unit twice, with row access and column access starting at two different base addresses.
  • In general, the invention can be implemented using the following steps:
      • a) Organising a memory, in particular a general purpose memory, into P independently and simultaneously accessible memory blocks of depth N*M/P elements having width W. To simplify the address generation logic, the parameters N, M and P should be chosen to be powers of 2 (see for more detail FIGS. 3 and 4).
      • b) Arranging the relationship between matrix and memory elements, for example as follows:
        • The associated memory block Bp for each matrix element is cycled from 0 to P−1, starting from p=0 for row r with n=0 and column c with m=0, starting at p=1 for row r with n=1 and column c with m=1 and so on. Row n=0 to n=P−1 of column m=0 are assigned to the memory blocks Bp with p=0 to p=P−1, respectively, the same is applied to row n=i*P to n=(i+1)*P−1, until the column is fully assigned.
        • The rows of column m=1 are assigned to the memory blocks Bp with p=1 to p=P−1 and p=0, so the association for the second row n=1 is repeated with the same pattern, but starting at p=1 instead of p=0. These patterns are repeated throughout the matrix. This cycling applies to both row wise and column wise view. Of course, there are several other possibilities for assigning the memory buffers Bp to matrix elements, for example simply the other way round or even randomly. The essential condition is that no P adjacent matrix elements are stored in the same memory block Bp.
      • c) Implementing shuffle logic in the memory controller C for accessing the matrix elements. This can be done, for example, by means of a look-up table, by rotating the elements during a row wise or column wise access, or by calculating the number p of the respective memory block Bp and the respective local address a′ otherwise.
  • Because no P adjacent matrix elements are stored in the same memory block Bp and because all of the memory blocks Bp can be simultaneously accessed by the memory controller C, the memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
  • In the example of FIG. 1, any 4 horizontally or vertically adjacent matrix elements can be simultaneously accessed by one single 32-bit bus request R to the arrangement A. If, for example, four horizontally adjacent matrix elements having relative addresses:

  • ar1=81, a r2 =a r1+1=82, a r3 =a r1+2=83, a r4 =a r1+3=84
  • are row wise requested by the central processing unit U the memory controller C determines the related first, second, third an fourth memory blocks Bp1, Bp2, Bp3, Bp4 and the related first, second, third and fourth local addresses a′1, a′2, a′3, a′4 from the respective relative addresses ar1, ar2, ar3, ar4, resulting in p=2, 3, 0, 1 and a′=20, 20, 20, 21, respectively.
  • If the arrangement A is used in a burst based wireless transmission system, this leads to a reduction in power consumption by minimising power-on time of as well as reducing power consumption during power-on times.
  • FIG. 2 illustrates the schema for the example of M=16, N==16, P=4 as described above. It can be easily adapted to numbers like M=256 and N=1024 as used in digital video broadcasting for handheld appliances. The elements of row n=0, 4, 8 . . . are associated with memory blocks Bp with p=0, 1, 2, 3, 0, 1, 2, 3 . . . . The elements of rows n=1, 5, 9 . . . are associated with memory blocks Bp with p=1, 2, 3, 0, 1, 2, 3, 0 . . . , the elements of rows n=2, 6, 10 . . . are associated with memory blocks Bp with p=2, 3, 0, 1, 2, 3, 0, 1 . . . . The association of row and column elements changes with each row and column, periodically every P columns and rows.
  • Section S1 shows which element of the matrix is stored in which memory block Bp.
  • Section S2 denotes the relative addresses ar that are specified by a processor accessing the matrix row wise.
  • Section S3 shows the relative addresses ac that are specified by the processor accessing the matrix column wise.
  • Section S4 illustrates the local addresses a′ that are used for selecting the matrix element within the corresponding memory block Bp. Obviously, no two matrix elements have both the same memory block Bp and the same address a′ associated at the same time. The first P elements of row 0 are accessed via a local address a′=0, the next P elements via a local address a′=1. The first P elements of row 1 are accessed using a local address a′=P=4. The same rules apply for both row wise and column wise access, of course.
  • Section S5 is equal to section S4, but the local addresses a′ arc determined from relative addresses ar according to section S2 by dividing the relative addresses ar by P:

  • a′=ar DIV P.
  • Thus, this division is the operation that has to be performed on the specified relative address ar given to the memory controller C to create the local address a′ in the related memory buffer B. The division can be replaced by a corresponding bit shifting operation as P is a power of 2 in this example: a′=ar SHR 2. So the local address a′ is determined from a group of the upper six bits of ar in row wise access mode.
  • Section S6 is equal to sections S4 and S5, of course, but is calculated from the relative addresses ac of section S3 for column wise access. For example, the element having m=7, n=6 is specified by the relative address

  • a c=7*16+6=11
  • in column wise access mode. The local address a′ is determined then from:

  • a′=(a c SHL 2) OR (a c SHR 6),
  • of course narrowed to the address space of the memory blocks Bp, i.e.

  • a′=((a c SHL 2) OR (a c SHR 6)) AND 63.
  • This combination of shifting operations can be expressed as a single rotation operation: a′=ac ROTL 2 and a′=(ac ROTL 2) AND 63, respectively. The rotation has to be carried out using the bit width of the relative address space, i.e. eight bits in this example.
  • For both row and column access the address translation can be performed with high speed. It is worth noting that no addition or multiplication is necessary to determine the local address a′, thus avoiding carry-chains and therefore keeping the critical paths short. This is valid as long as M, N and P are powers of two.
  • In this example, the first elements of row n=0, 4, 8 are located in memory block B0, whereas the first elements of row n=1, 5, 9 are located in memory block B1. Therefore, the P inputs and outputs of the memory blocks Bp have to be rotated according to the relative address ar or ac, respectively, for creating the input and output data of the memory controller C. For example, the number p of the respective memory block Bp can be calculationally determined by: p=((ar,c MOD P)+(ar,c DIV P)) followed by MOD P if applicable. This rule applies both for row wise and for column wise access requests R. As in this example P is a power of 2, this calculation can be performed using fast bit operations:
  • p=((ar,c AND 3)+(ar,c SHR 2)) [AND 3 if applicable]. The rule implies reduction of the relative address to the smallest repeating pattern of memory blocks Bp within section S1. Of course, instead of such a rule a look-up table could be used for determining the number p of the respective memory block Bp. Such a look-up table can be as small as the smallest repeating pattern if the relative address is reduced to it first.
  • FIGS. 3 and 4 show a arrangement A simplified in comparison to that of FIG. 1 and the schema related thereto, respectively. The arrangement A comprises two memory blocks Bp with P=2, numbered from p=0 to p=1 and connected to a memory controller C. Both memory blocks Bp are accessible independently and simultaneously. The arrangement A provides 32-bit read/write capability for a matrix having (M=4)*(N=4)=16 elements of 8 bits size. The arrangement A, especially the memory controller C is connected to a central processing unit U via a system bus S in the same way as in FIG. 1. It serves for row wise and/or column wise access requests R as proposed by the invention.
  • The numbers p=0, p=1 of the memory blocks Bp assigned to the matrix elements are alternating in all rows and all columns. No two matrix elements adjacent in a row or in a column are therefore stored in the same memory block Bp. Both memory blocks Bp can be simultaneously accessed by the memory controller C. The memory controller C will provide access to the rows and columns of the matrix without any loss in bandwidth. The number of bus transactions on the arrangement A is minimised.
  • For a row wise access, the local addresses a′ can be determined from a respective sub-group of bits of the relative addresses ar according to section S2 by:

  • a′=ar SHR 1.
  • For column wise access mode, the local address a′ can determined from a respective sub-group of bits of the relative addresses ar according to section S2 by:

  • a′=(a c SHL 1) OR (a c SHR 3),
  • This combination of shifting operations can be expressed as a single rotation operation in a 4-bits address space: a′=ac ROTL 1.
  • The number p of the respective memory block BP can be determined for row wise and for column wise access requests R by:

  • p=((a r/c AND 1)+(a r/c SHR 1))
  • All calculations and bit operations are restricted to the 3-bits address space of the memory blocks Bp.
  • LIST OF REFERENCE NUMERALS
    • A Arrangement
    • ar Relative address for row wise access
    • ac Relative address for column wise access
    • a′ Local address
    • Bp Memory blocks
    • C Memory controller
    • M Number of columns
    • m Column
    • N Number of rows
    • n Row
    • P Number of memory blocks
    • p Number of memory block
    • R Request
    • S System bus
    • U Central processing unit

Claims (12)

1. A method for accessing matrix elements, wherein accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address (ar, ac) are performed for the first of said elements in a first memory block (Bp1) using a first local address (a′1) and for the second of said elements in a different second memory block (Bp2) using a second local address (a′2).
2. The method according to claim 1, wherein for each of said matrix elements said respective memory block (Bp) and/or said respective local address (a′) are determined from a look-up table using said respective relative address (ar, ac) for an index.
3. The method according to claim 1, wherein for each of said matrix elements said respective memory block (Bp) is determined from a first sub-group of bits of the respective relative address (ar, ac) and/or said respective local address (a′) is determined from a second sub-group of bits of the respective relative address (ar, ac).
4. The method according to claim 1, wherein for each of said matrix elements said respective memory block (Bp) and/or said respective local address (a′) are calculationally determined from said respective relative address (ar, ac).
5. The method according to claim 3 or 4, wherein bits of said respective relative address (ar, ac) are shifted and/or swapped for obtaining said respective memory block (Bp) and/or for obtaining said respective local address (a′), the local addresses (a′) having a narrower address space than the relative addresses (ar, ac).
6. The method according to claim 5, wherein a bit rotation is performed as said swapping operation.
7. The method according to one of the preceding claims, wherein a number (P) of memory blocks (Bp) is used that is a power of two.
8. The method according to one of the preceding claims, wherein memory blocks (Bp) are used that are accessible simultaneously and independently from each other.
9. An arrangement (A) for accessing matrix elements, comprising a plurality of memory blocks (Bp) and a memory controller (C) connected to said memory blocks (Bp), wherein the memory controller (C), in case of accesses to two matrix elements that are adjacent in a row or in a column of a matrix and that are each specified by a respective relative address (ar, ac), performs a first sub-access for the first of said elements in a first memory block (Bp1) using a first local address (a′1) and a second sub-access for the second of said elements in a different second memory block (Bp2) using a second local address (a′2).
10. The arrangement (A) according to claim 9, wherein for each of said matrix elements said memory controller determines said respective memory block (Bp) and/or said respective local address (a′) with said respective relative address (ar, ac).
11. The arrangement (A) according to claim 9 or 10, wherein the number (P) of memory blocks (Bp), the width (M) of the matrix and the height (N) of the matrix are powers of two.
12. The arrangement (A) according to one of the claims 9 to 11, wherein said first memory block (Bp1) and said second memory block (Bp2) are accessible simultaneously and independently from each other.
US12/095,166 2005-12-01 2006-11-29 Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory Abandoned US20080301400A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP05111546 2005-12-01
EP05111546.7 2005-12-01
PCT/IB2006/054500 WO2007063501A2 (en) 2005-12-01 2006-11-29 Method and arrangement for efficiently accessing matrix elements in a memory

Publications (1)

Publication Number Publication Date
US20080301400A1 true US20080301400A1 (en) 2008-12-04

Family

ID=38090785

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/095,166 Abandoned US20080301400A1 (en) 2005-12-01 2006-11-29 Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory

Country Status (5)

Country Link
US (1) US20080301400A1 (en)
EP (1) EP1958069A2 (en)
JP (1) JP2009517763A (en)
CN (1) CN101322107A (en)
WO (1) WO2007063501A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541749A (en) * 2011-12-31 2012-07-04 中国科学院自动化研究所 Multi-granularity parallel storage system
US20140223445A1 (en) * 2013-02-07 2014-08-07 Advanced Micro Devices, Inc. Selecting a Resource from a Set of Resources for Performing an Operation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782878B (en) * 2009-04-03 2011-11-16 北京理工大学 Data storing method based on distributed memory
CN108053852B (en) * 2017-11-03 2020-05-19 华中科技大学 Writing method of resistive random access memory based on cross point array
CN111176582A (en) * 2019-12-31 2020-05-19 北京百度网讯科技有限公司 Matrix storage method, matrix access device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4918600A (en) * 1988-08-01 1990-04-17 Board Of Regents, University Of Texas System Dynamic address mapping for conflict-free vector access
US6297857B1 (en) * 1994-03-24 2001-10-02 Discovision Associates Method for accessing banks of DRAM
US20050071409A1 (en) * 2003-09-29 2005-03-31 International Business Machines Corporation Method and structure for producing high performance linear algebra routines using register block data format routines

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6386061A (en) * 1986-09-30 1988-04-16 Hitachi Ltd Memory allocating method for multi-processor
JPH08194641A (en) * 1995-01-17 1996-07-30 Fujitsu Ltd Method for storing two-dimensional data into synchronizing dram and synchronizing dram access controller
US6604166B1 (en) * 1998-12-30 2003-08-05 Silicon Automation Systems Limited Memory architecture for parallel data access along any given dimension of an n-dimensional rectangular data array
JP3985797B2 (en) * 2004-04-16 2007-10-03 ソニー株式会社 Processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4918600A (en) * 1988-08-01 1990-04-17 Board Of Regents, University Of Texas System Dynamic address mapping for conflict-free vector access
US6297857B1 (en) * 1994-03-24 2001-10-02 Discovision Associates Method for accessing banks of DRAM
US20050071409A1 (en) * 2003-09-29 2005-03-31 International Business Machines Corporation Method and structure for producing high performance linear algebra routines using register block data format routines

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541749A (en) * 2011-12-31 2012-07-04 中国科学院自动化研究所 Multi-granularity parallel storage system
CN102541749B (en) * 2011-12-31 2014-09-17 中国科学院自动化研究所 Multi-granularity parallel storage system
US20140223445A1 (en) * 2013-02-07 2014-08-07 Advanced Micro Devices, Inc. Selecting a Resource from a Set of Resources for Performing an Operation
US9183055B2 (en) * 2013-02-07 2015-11-10 Advanced Micro Devices, Inc. Selecting a resource from a set of resources for performing an operation
US20160062803A1 (en) * 2013-02-07 2016-03-03 Advanced Micro Devices, Inc. Selecting a resource from a set of resources for performing an operation
US9766936B2 (en) * 2013-02-07 2017-09-19 Advanced Micro Devices, Inc. Selecting a resource from a set of resources for performing an operation

Also Published As

Publication number Publication date
WO2007063501A3 (en) 2007-11-15
WO2007063501A2 (en) 2007-06-07
CN101322107A (en) 2008-12-10
JP2009517763A (en) 2009-04-30
EP1958069A2 (en) 2008-08-20

Similar Documents

Publication Publication Date Title
US6381668B1 (en) Address mapping for system memory
US6144604A (en) Simultaneous addressing using single-port RAMs
EP0507577B1 (en) Flexible N-way memory interleaving
US6662285B1 (en) User configurable memory system having local and global memory blocks
US6430672B1 (en) Method for performing address mapping using two lookup tables
CN110096450B (en) Multi-granularity parallel storage system and storage
US20080301400A1 (en) Method and Arrangement for Efficiently Accessing Matrix Elements in a Memory
CN106846255A (en) Image rotation implementation method and device
US6233665B1 (en) Mapping shared DRAM address bits by accessing data memory in page mode cache status memory in word mode
US9146696B2 (en) Multi-granularity parallel storage system and storage
US6453380B1 (en) Address mapping for configurable memory system
KR20180006645A (en) Semiconductor device including a memory buffer
KR20110121641A (en) Multimode accessible storage facility
US6906978B2 (en) Flexible integrated memory
US20110208939A1 (en) Memory access system and memory access control method
JP5059330B2 (en) Memory address generation circuit and memory controller including the same
KR20060113019A (en) System for controlling memory
EP0837474B1 (en) Method for optimising a memory cell matrix for a semiconductor integrated microcontroller
US20030031072A1 (en) Memory with row-wise write and column-wise read
US20030009642A1 (en) Data storing circuit and data processing apparatus
US20230307036A1 (en) Storage and Accessing Methods for Parameters in Streaming AI Accelerator Chip
CN116150046B (en) Cache circuit
US7457937B1 (en) Method and system for implementing low overhead memory access in transpose operations
KR20170100415A (en) Memory controller and integrated circuit system
US6834334B2 (en) Method and apparatus for address decoding of embedded DRAM devices

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:038017/0058

Effective date: 20160218

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12092129 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:039361/0212

Effective date: 20160218

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:042762/0145

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12681366 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:042985/0001

Effective date: 20160218

AS Assignment

Owner name: NXP B.V., NETHERLANDS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC.;REEL/FRAME:050745/0001

Effective date: 20190903

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042762 FRAME 0145. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051145/0184

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0387

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 042985 FRAME 0001. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0001

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION 12298143 PREVIOUSLY RECORDED ON REEL 038017 FRAME 0058. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051030/0001

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 042985 FRAME 0001. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0001

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 039361 FRAME 0212. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051029/0387

Effective date: 20160218

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE APPLICATION12298143 PREVIOUSLY RECORDED ON REEL 042762 FRAME 0145. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY AGREEMENT SUPPLEMENT;ASSIGNOR:NXP B.V.;REEL/FRAME:051145/0184

Effective date: 20160218