US20050225557A1 - Method and apparatus for reading texture data from a cache - Google Patents

Method and apparatus for reading texture data from a cache Download PDF

Info

Publication number
US20050225557A1
US20050225557A1 US11/147,621 US14762105A US2005225557A1 US 20050225557 A1 US20050225557 A1 US 20050225557A1 US 14762105 A US14762105 A US 14762105A US 2005225557 A1 US2005225557 A1 US 2005225557A1
Authority
US
United States
Prior art keywords
cache
memory
cache memory
texture
texture data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/147,621
Inventor
Satyaki Koneru
Steven Spangler
Val Cook
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/147,621 priority Critical patent/US20050225557A1/en
Publication of US20050225557A1 publication Critical patent/US20050225557A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour

Definitions

  • a graphics engine is commonly used for displaying images on a display screen that can be comprised of two dimensional data and/or three dimensional graphical objects that are rendered to a two dimensional surface in memory. This rendering is typically accomplished by breaking the previously mentioned objects up into a series of polygons, typically, triangles. At each vertex attribute values such as color, lighting, fog, depth, etc. and texture coordinates are assigned. By utilizing texture mapping in addition to attribute interpolation such as color, depth, lighting, fog, etc., significant detail can be applied to each pixel of a polygon to make them appear more realistic.
  • the texture map can combine a pattern or image with the interpolated attributes of the polygon to produce a modified color per pixel with the added detail of the texture map. For example, given the outline of a featureless cube and a texture map containing a wood-grain pattern, texture mapping can be used to map the wood-grain pattern onto the cube. Typically, a two-dimensional texture pattern is mapped or warped onto a three-dimensional surface. Perspective transformations are used to calculate the addresses within the texture map of the texels (pixels within the texture map) needed to render the individual pixels of the primitive (triangle, line, point) on the display screen.
  • texture addresses have been calculated for each pixel to be rendered, the texture map that is stored in a main memory are accessed, or fetched into a cache on the graphics engine.
  • the number of cache read ports must equal the number of pixels that can be accessed in parallel. Oft times, however, this many read ports are not necessary due to spatial locality of the pixels.
  • FIG. 1 illustrates a functional block diagram of an embodiment of an exemplary computer system including a graphics engine embodying the present invention.
  • FIG. 2 illustrates a functional block diagram of an embodiment of a texture reading apparatus.
  • FIG. 3 illustrates a logic diagram of an embodiment of an address comparator.
  • FIG. 4 illustrates a diagram of an embodiment of a cache controller.
  • FIG. 5 illustrates a diagram of an embodiment of a cache memory organization.
  • FIG. 6 illustrates a diagram of an embodiment of the back end of texture reading apparatus.
  • Embodiments of the present invention provide for selectively reading texture data for a greater number of pixels per clock than the number of available cache read ports. For example, the present invention is able to selectively process four pixels per clock, instead of just two pixels per clock, with a two-port read cache. This allows an almost doubling in pixel rate with less die growth than would be required by doubling the number of cache ports. Embodiments of the invention default to reading texture for two pixels per clock from the two-port read cache.
  • FIG. 1 illustrates a functional block diagram of an embodiment 100 of an exemplary computer system including a graphics processor 108 embodying the present invention.
  • This system generally includes processing unit 102 , bridge 104 , main memory 106 , graphics processor 108 , display 110 , graphics memory 112 and input devices 114 .
  • Graphics processor 108 determines the graphical information to be sent to display 110 based on inputs from processing unit 102 and data in main memory 106 and graphics memory 112 .
  • Processing unit 102 has access to data stored on disk, networks, or CD-ROM, etc. and based on power on sequences, programs booted at start up, and user inputs by the input devices, processing unit 102 will determine the data stream sent to the graphics processor 108 .
  • Graphics processor 108 uses the data stream to create the desired image on display 110 .
  • the user is linked to the computer image generation process through input control device 114 such as a keyboard, mouse, joystick, etc.
  • processing unit 102 obtains database information from one of it's data input, loads texture maps into main memory or graphics memory and then performs preprocessing of database information for graphics processor 108 .
  • Graphics processor 108 then will receive state data and triangle, line, or point (primitive) information. From this input data, graphics processor 108 determines attribute data (such as diffuse red, green, blue colors, alpha, fog, depth, texture coordinates, etc) for each pixel of the primitive.
  • attribute data such as diffuse red, green, blue colors, alpha, fog, depth, texture coordinates, etc
  • the texture coordinate attributes and pixel screen location are used to read texture, previous color and depth information. This data is then used to determine the new color and depth of each pixel to be stored in either graphics memory 112 or main memory 106 .
  • processing unit 102 schedules the resulting rendered scene to be displayed on display 110 if desired.
  • Texture mapping is used to place texture data such as pattern or natural images on an object in computer graphics.
  • the object is typically formed using a plurality of polygons, such as a triangle polygon or a square polygon.
  • texture mapping By using texture mapping, a realistic picture can be generated. Since an object is generally formed from a plurality of polygons such as triangles, texture data is mapped as each polygon unit.
  • mip mapping or trilinear interpolation is enabled, the projected pixel size on the texel map approaches a texel size of the properly selected texture LOD. If the projected pixel increases or decreases in size appreciably, the next level of detail texture resolution map is used. This can be either a higher or lower map.
  • the texture data is stored in a memory beforehand.
  • FIG. 2 illustrates a functional block diagram of an embodiment 200 of an exemplary apparatus for reading texture data from a memory.
  • Texture data reading apparatus 200 includes memory address comparator 202 , cache lookup 204 , cache 206 and read multiplexor 208 .
  • Texture data is mapped in such a manner that a texture data reading apparatus 200 reads texture data from cache 206 that temporarily stores texture data.
  • Texture dating reading apparatus 200 reads texture data from cache 206 at high speed.
  • Embodiments of the present invention provide for selectively reading texture data for a greater number of pixels per clock than the number of available cache read ports.
  • memory address comparator 202 compares the memory address of incoming pixels (and for which texture data is read from the cache 206 ) and determines whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). If the number of cache regions accessed is less than or equal to the number of read ports on the cache 206 , all of the incoming pixels can be accessed in the same clock cycle. However, if the number of cache regions accessed is greater than the number of read ports on the cache 206 , than the incoming pixels are read in more than one clock cycle. For example, the present invention is able to selectively process four pixels per clock, instead of just two pixels per clock, with a two-port read cache. This allows an almost doubling in pixel rate with less die growth than would be required by doubling the number of cache ports. Embodiments of the invention default to reading texture for two pixels per clock from the two-port read cache.
  • Each pixel supplies a texture map memory address to address comparator 202 through input terminals.
  • Memory address includes U, V, W, LOD, and other parameters.
  • a texture address calculator calculates the texture memory addresses (U, V) for each pixel and also the specific LODs from which the texture addresses are to be retrieved.
  • the texture data from the memory is read in accordance with calculated memory addressing as follows:
  • Texture coordinates (S 1 , T 1 , W 1 ), (S 2 , T 2 , W 2 ), and (S 3 , T 3 , W 3 ) are designated to vertexes of a triangle.
  • texture coordinates (S, T, W) of an inner point of the triangle are obtained.
  • address comparator 202 For example, four pixels are applied to address comparator 202 for processing by two-port cache 206 .
  • Address comparator 202 makes a determination regarding whether it can read all of the data out of the cache 206 in a single clock or more than once cycle (for example, by defaulting to sequencing two pixels at a time).
  • address comparator 202 determines whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). Once the appropriate texture data is determined to be able to be read in a single cycle from cache 206 because the same cache regions are being accessed, cache lookup 204 does the cache lookup on the selected addresses and accesses the data from cache 206 based on the selected addresses. The compared result is provided to cache lookup 204 . Cache lookup 204 selects only the memory addresses of the texture data that should be read from cache 206 in accordance with the compared result. For example, in a typical implementation, cache lookup 204 accesses just two addresses for four pixels and accesses data out of cache 206 for those two selected addresses. Thus, where there is a need to access only two cache lines, address comparator 202 takes four addresses coming in and consolidates them into two addresses that are applied to cache lookup 204 .
  • the address comparator determines that more cache lines are accessed than available cache ports, the first two pixels are processed via the two ports on a first clock cycle, and then on the next clock cycle, the other two are processed via the two ports.
  • the texture data accessed is reassembled into four pixels at the back end.
  • the address comparator stalls the pipeline to allow for two clock cycles, rather than one clock cycle, to process four pixels.
  • Cache 206 outputs the appropriate texel data into read multiplexor 208 to assemble the accessed texture data with the incoming pixels.
  • Read multiplexor 208 takes into account how the ports were mapped with the pixels during the assembling process. For example, in a typical implementation, cache 206 includes output ports A and B. Port A reads cache lines for pixels 0 and 1 and port B reads cache lines for pixels 2 and 3 . Read multiplexor 208 expands the texel data back out to four pixels.
  • FIG. 3 illustrates a logic diagram of an embodiment 300 of address comparator 302 and port select 304 .
  • the address comparator receives cache line addresses. In a cache having four sectors, each pixel may need four cache lines but the four sectors (for example, W, X, Y, Z) can be treated independently. For each sector, only two cache lines can be read at a time. Thus, in a two-port read cache configuration, 2 W's, 2 X's, 2 Y's and 2 Z's can be read in a typical implementation.
  • the address comparator compares the addresses (for example, the U's and V's) for all four incoming pixels and determines whether it can read the cache line for the four pixels out of two-port read cache.
  • the incoming pixels may have addresses that do not access data from the W sector at all and only accesses data from the X sector.
  • the cache lines for the X, Y and Z sector are considered in the same manner before determining whether one or two clocks are needed (i.e. whether all four pixels can be accessed in the same clock cycle). However, if the four incoming pixels have addresses that access data from 3 or 4 different W's, then the pixels are processed two, rather than four, pixels per clock.
  • Address comparator 302 compares addresses for a single sector (W/X/Y/Z), taking into account the need bits for each pixel. Output is a horizontal (“horz”) and vertical (“vert”) compare indicator, which, when asserted, indicates that this sector can share ports in this direction. For example, “horz” indicates that a port can be shared between each pair of horizontally adjacent pixels and “vert” indicates the same for vertically adjacent pixels.
  • Port select 304 selects the address to be used for this sector on each of the two ports. This determination takes into account the global “pair” bit and two of the need bits as well as the global horizontal indicator. The “pair” signal indicates that the result of the comparison on all four sectors indicated a need to run the subspan as two pixel pairs. In this case, “pairclk” indicates which clock (0 or 1) of the pair is on.
  • FIG. 4 illustrates a diagram of an embodiment 400 of a cache controller.
  • Cache controller includes W, X, Y and Z cache controllers 402 , 404 , 406 and 408 , each of which regulate and keep track of what is accessed and stored in the sectors of the cache.
  • Cache controller 400 includes four separate controllers, one for each the W, X, Y and Z partition. Each one of these controllers contains a plurality of stages, with each stage referencing a double quad word in the cache memory bank. The controllers regulate and keep track of what is stored in the cache memory banks. There is a separate cache controller for each of the W, X, Y, and Z partitions.
  • Embodiments of the present invention compare the memory address of incoming pixels (and for which texture data is read from the cache) and determine whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). If the number of cache regions accessed is less than or equal to the number of read ports on the cache, all of the incoming pixels can be accessed in the same clock cycle. However, if the number of cache regions accessed is greater than the number of read ports on the cache, than the incoming pixels will have to be read in more than one clock cycle. For example, as noted above, addresses for a single sector (W/X/Y/Z), taking into account the need bits for each pixel are compared. Output is a horizontal (“horz”) and vertical (“vert”) compare indicator, which, when asserted, indicates that this sector can share ports in this direction.
  • horz horizontal
  • vertical (“vert”) compare indicator which, when asserted, indicates that this sector can share ports in this direction.
  • FIG. 5 illustrates a diagram of an embodiment 500 of an exemplary cache memory organization.
  • the cache memory storage organization is indexed by parameters, including W, X, Y and Z; tag 0 , tag 1 , tag 2 , and tag 3 .
  • the cache memory includes four sectors of memory: W, X, Y and Z. W, X, Y and Z contain 8 cache lines, each cache line containing 8 texels.
  • Each row has four sub-rows identified by tags 0 , 1 , 2 , and 3 .
  • Each tagged sub-row has an odd and even sub-row associated with it.
  • Each comparator cache controller provides the mapping from U, V, and LOD to the proper tag location for access to necessary texels. This is performed by the four stages in each cache controller.
  • cache controller 400 and cache memory 500 work together as address decoder and memory storage, respectively.
  • the cache controller 400 When the cache controller 400 is presented with a U, V, Q, LOD and other address parameters, it responds with the proper tags where the proper A, B, C, and D texels can be retrieved from cache memory 500 . This retrieval process can happen per clock since the data has been pre-fetched and is residing in the texture cache memory.
  • the cache controller 400 uses the texture addresses most significant bits to determine its location and hit, miss information, while the selection of the unique location of the A, B, C, and D types, and the partition block descriptor W, X, Y, Z is determined from the least significant bits.
  • FIG. 6 illustrates a diagram of an embodiment 600 of the back end of texture reading apparatus.
  • Cache 602 outputs the appropriate texel data into read multiplexor 604 to assemble the accessed texture data with the incoming pixels.
  • Read multiplexor 604 takes into account how the ports were mapped with the pixels during the assembling process. For example, in a typical implementation, cache 602 includes output ports A and B. Port A reads cache lines for pixels 0 and 1 and port B reads cache lines for pixels 2 and 3 . Read multiplexor 604 expands the texel data back out to four pixels.

Abstract

A texture data reading apparatus includes a cache memory including a plurality of read ports and a plurality of regions to store pixel texture data. An address comparator includes a plurality of input ports to receive incoming pixels, wherein the address comparator compares the memory addresses associated with the incoming pixels to determine which regions of cache memory are accessed. A cache lookup device accesses new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports.

Description

    BACKGROUND
  • A graphics engine is commonly used for displaying images on a display screen that can be comprised of two dimensional data and/or three dimensional graphical objects that are rendered to a two dimensional surface in memory. This rendering is typically accomplished by breaking the previously mentioned objects up into a series of polygons, typically, triangles. At each vertex attribute values such as color, lighting, fog, depth, etc. and texture coordinates are assigned. By utilizing texture mapping in addition to attribute interpolation such as color, depth, lighting, fog, etc., significant detail can be applied to each pixel of a polygon to make them appear more realistic.
  • The texture map can combine a pattern or image with the interpolated attributes of the polygon to produce a modified color per pixel with the added detail of the texture map. For example, given the outline of a featureless cube and a texture map containing a wood-grain pattern, texture mapping can be used to map the wood-grain pattern onto the cube. Typically, a two-dimensional texture pattern is mapped or warped onto a three-dimensional surface. Perspective transformations are used to calculate the addresses within the texture map of the texels (pixels within the texture map) needed to render the individual pixels of the primitive (triangle, line, point) on the display screen. Once texture addresses have been calculated for each pixel to be rendered, the texture map that is stored in a main memory are accessed, or fetched into a cache on the graphics engine. Conventionally, the number of cache read ports must equal the number of pixels that can be accessed in parallel. Oft times, however, this many read ports are not necessary due to spatial locality of the pixels.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a functional block diagram of an embodiment of an exemplary computer system including a graphics engine embodying the present invention.
  • FIG. 2 illustrates a functional block diagram of an embodiment of a texture reading apparatus.
  • FIG. 3 illustrates a logic diagram of an embodiment of an address comparator.
  • FIG. 4 illustrates a diagram of an embodiment of a cache controller.
  • FIG. 5 illustrates a diagram of an embodiment of a cache memory organization.
  • FIG. 6 illustrates a diagram of an embodiment of the back end of texture reading apparatus.
  • DETAILED DESCRIPTION
  • In the following description, numerous specific details are set forth such as specific memory configurations, address ranges, protection schemes, etc., in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well known apparatus and steps have not been described in detail in order to avoid obscuring the invention.
  • Embodiments of the present invention provide for selectively reading texture data for a greater number of pixels per clock than the number of available cache read ports. For example, the present invention is able to selectively process four pixels per clock, instead of just two pixels per clock, with a two-port read cache. This allows an almost doubling in pixel rate with less die growth than would be required by doubling the number of cache ports. Embodiments of the invention default to reading texture for two pixels per clock from the two-port read cache.
  • FIG. 1 illustrates a functional block diagram of an embodiment 100 of an exemplary computer system including a graphics processor 108 embodying the present invention. This system generally includes processing unit 102, bridge 104, main memory 106, graphics processor 108, display 110, graphics memory 112 and input devices 114. Graphics processor 108 determines the graphical information to be sent to display 110 based on inputs from processing unit 102 and data in main memory 106 and graphics memory 112. Processing unit 102 has access to data stored on disk, networks, or CD-ROM, etc. and based on power on sequences, programs booted at start up, and user inputs by the input devices, processing unit 102 will determine the data stream sent to the graphics processor 108. Graphics processor 108 uses the data stream to create the desired image on display 110. The user is linked to the computer image generation process through input control device 114 such as a keyboard, mouse, joystick, etc.
  • In particular, processing unit 102 obtains database information from one of it's data input, loads texture maps into main memory or graphics memory and then performs preprocessing of database information for graphics processor 108. Graphics processor 108 then will receive state data and triangle, line, or point (primitive) information. From this input data, graphics processor 108 determines attribute data (such as diffuse red, green, blue colors, alpha, fog, depth, texture coordinates, etc) for each pixel of the primitive. The texture coordinate attributes and pixel screen location are used to read texture, previous color and depth information. This data is then used to determine the new color and depth of each pixel to be stored in either graphics memory 112 or main memory 106. When the primitives have been rendered, processing unit 102 schedules the resulting rendered scene to be displayed on display 110 if desired.
  • Texture mapping is used to place texture data such as pattern or natural images on an object in computer graphics. The object is typically formed using a plurality of polygons, such as a triangle polygon or a square polygon. By using texture mapping, a realistic picture can be generated. Since an object is generally formed from a plurality of polygons such as triangles, texture data is mapped as each polygon unit. When mip mapping or trilinear interpolation is enabled, the projected pixel size on the texel map approaches a texel size of the properly selected texture LOD. If the projected pixel increases or decreases in size appreciably, the next level of detail texture resolution map is used. This can be either a higher or lower map. With respect to such texture data mapping, the texture data is stored in a memory beforehand.
  • FIG. 2 illustrates a functional block diagram of an embodiment 200 of an exemplary apparatus for reading texture data from a memory. Texture data reading apparatus 200 includes memory address comparator 202, cache lookup 204, cache 206 and read multiplexor 208. Texture data is mapped in such a manner that a texture data reading apparatus 200 reads texture data from cache 206 that temporarily stores texture data. Texture dating reading apparatus 200 reads texture data from cache 206 at high speed. Embodiments of the present invention provide for selectively reading texture data for a greater number of pixels per clock than the number of available cache read ports.
  • In particular, memory address comparator 202 compares the memory address of incoming pixels (and for which texture data is read from the cache 206) and determines whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). If the number of cache regions accessed is less than or equal to the number of read ports on the cache 206, all of the incoming pixels can be accessed in the same clock cycle. However, if the number of cache regions accessed is greater than the number of read ports on the cache 206, than the incoming pixels are read in more than one clock cycle. For example, the present invention is able to selectively process four pixels per clock, instead of just two pixels per clock, with a two-port read cache. This allows an almost doubling in pixel rate with less die growth than would be required by doubling the number of cache ports. Embodiments of the invention default to reading texture for two pixels per clock from the two-port read cache.
  • Each pixel supplies a texture map memory address to address comparator 202 through input terminals. Memory address includes U, V, W, LOD, and other parameters. For example, a texture address calculator calculates the texture memory addresses (U, V) for each pixel and also the specific LODs from which the texture addresses are to be retrieved. For texture mapping, the texture data from the memory is read in accordance with calculated memory addressing as follows:
  • Texture coordinates (S1, T1, W1), (S2, T2, W2), and (S3, T3, W3) are designated to vertexes of a triangle.
  • By linearly interpolating the texture coordinates of the vertexes of the triangle, texture coordinates (S, T, W) of an inner point of the triangle are obtained.
  • By performing dividing operations of U=S/W and V=T/W, a memory address (U, V) is obtained.
  • As shown in FIG. 2, for example, four pixels are applied to address comparator 202 for processing by two-port cache 206. When two of the incoming pixels have memory addresses that access the same cache line and the remaining two of the pixels have memory addresses that access another cache line, data for all four of the incoming pixels can be read in one clock cycle since two cache lines can be read at the same time. Address comparator 202 thus makes a determination regarding whether it can read all of the data out of the cache 206 in a single clock or more than once cycle (for example, by defaulting to sequencing two pixels at a time).
  • Referring to FIG. 2, address comparator 202 determines whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). Once the appropriate texture data is determined to be able to be read in a single cycle from cache 206 because the same cache regions are being accessed, cache lookup 204 does the cache lookup on the selected addresses and accesses the data from cache 206 based on the selected addresses. The compared result is provided to cache lookup 204. Cache lookup 204 selects only the memory addresses of the texture data that should be read from cache 206 in accordance with the compared result. For example, in a typical implementation, cache lookup 204 accesses just two addresses for four pixels and accesses data out of cache 206 for those two selected addresses. Thus, where there is a need to access only two cache lines, address comparator 202 takes four addresses coming in and consolidates them into two addresses that are applied to cache lookup 204.
  • If the address comparator determines that more cache lines are accessed than available cache ports, the first two pixels are processed via the two ports on a first clock cycle, and then on the next clock cycle, the other two are processed via the two ports. The texture data accessed is reassembled into four pixels at the back end. The address comparator stalls the pipeline to allow for two clock cycles, rather than one clock cycle, to process four pixels.
  • Cache 206 outputs the appropriate texel data into read multiplexor 208 to assemble the accessed texture data with the incoming pixels. Read multiplexor 208 takes into account how the ports were mapped with the pixels during the assembling process. For example, in a typical implementation, cache 206 includes output ports A and B. Port A reads cache lines for pixels 0 and 1 and port B reads cache lines for pixels 2 and 3. Read multiplexor 208 expands the texel data back out to four pixels.
  • FIG. 3 illustrates a logic diagram of an embodiment 300 of address comparator 302 and port select 304. The address comparator receives cache line addresses. In a cache having four sectors, each pixel may need four cache lines but the four sectors (for example, W, X, Y, Z) can be treated independently. For each sector, only two cache lines can be read at a time. Thus, in a two-port read cache configuration, 2 W's, 2 X's, 2 Y's and 2 Z's can be read in a typical implementation. The address comparator compares the addresses (for example, the U's and V's) for all four incoming pixels and determines whether it can read the cache line for the four pixels out of two-port read cache. In some cases, the incoming pixels may have addresses that do not access data from the W sector at all and only accesses data from the X sector. The cache lines for the X, Y and Z sector are considered in the same manner before determining whether one or two clocks are needed (i.e. whether all four pixels can be accessed in the same clock cycle). However, if the four incoming pixels have addresses that access data from 3 or 4 different W's, then the pixels are processed two, rather than four, pixels per clock.
  • Address comparator 302 compares addresses for a single sector (W/X/Y/Z), taking into account the need bits for each pixel. Output is a horizontal (“horz”) and vertical (“vert”) compare indicator, which, when asserted, indicates that this sector can share ports in this direction. For example, “horz” indicates that a port can be shared between each pair of horizontally adjacent pixels and “vert” indicates the same for vertically adjacent pixels. Port select 304 selects the address to be used for this sector on each of the two ports. This determination takes into account the global “pair” bit and two of the need bits as well as the global horizontal indicator. The “pair” signal indicates that the result of the comparison on all four sectors indicated a need to run the subspan as two pixel pairs. In this case, “pairclk” indicates which clock (0 or 1) of the pair is on. One skilled in the art will recognize that the above comparison method is for exemplary purposes only. The present invention can be implemented with any viable comparison method compatible with the invention.
  • FIG. 4 illustrates a diagram of an embodiment 400 of a cache controller. Cache controller includes W, X, Y and Z cache controllers 402, 404, 406 and 408, each of which regulate and keep track of what is accessed and stored in the sectors of the cache. Cache controller 400 includes four separate controllers, one for each the W, X, Y and Z partition. Each one of these controllers contains a plurality of stages, with each stage referencing a double quad word in the cache memory bank. The controllers regulate and keep track of what is stored in the cache memory banks. There is a separate cache controller for each of the W, X, Y, and Z partitions.
  • Embodiments of the present invention compare the memory address of incoming pixels (and for which texture data is read from the cache) and determine whether or not one or more pixels have memory addresses that access the same cache region (for example, cache line). If the number of cache regions accessed is less than or equal to the number of read ports on the cache, all of the incoming pixels can be accessed in the same clock cycle. However, if the number of cache regions accessed is greater than the number of read ports on the cache, than the incoming pixels will have to be read in more than one clock cycle. For example, as noted above, addresses for a single sector (W/X/Y/Z), taking into account the need bits for each pixel are compared. Output is a horizontal (“horz”) and vertical (“vert”) compare indicator, which, when asserted, indicates that this sector can share ports in this direction.
  • FIG. 5 illustrates a diagram of an embodiment 500 of an exemplary cache memory organization. One skilled in the art will recognize that particular configuration of the cache is not critical to the invention. In one exemplary configuration, the cache memory storage organization is indexed by parameters, including W, X, Y and Z; tag 0, tag 1, tag 2, and tag 3. The cache memory includes four sectors of memory: W, X, Y and Z. W, X, Y and Z contain 8 cache lines, each cache line containing 8 texels.
  • There are four rows of data corresponding to the four cache controllers labeled W, X, Y, and Z shown in FIG. 4. Each row has four sub-rows identified by tags 0, 1, 2, and 3. Each tagged sub-row has an odd and even sub-row associated with it. Each comparator cache controller provides the mapping from U, V, and LOD to the proper tag location for access to necessary texels. This is performed by the four stages in each cache controller.
  • Referring to FIGS. 4 and 5, cache controller 400 and cache memory 500 work together as address decoder and memory storage, respectively. When the cache controller 400 is presented with a U, V, Q, LOD and other address parameters, it responds with the proper tags where the proper A, B, C, and D texels can be retrieved from cache memory 500. This retrieval process can happen per clock since the data has been pre-fetched and is residing in the texture cache memory. The cache controller 400 uses the texture addresses most significant bits to determine its location and hit, miss information, while the selection of the unique location of the A, B, C, and D types, and the partition block descriptor W, X, Y, Z is determined from the least significant bits.
  • FIG. 6 illustrates a diagram of an embodiment 600 of the back end of texture reading apparatus. Cache 602 outputs the appropriate texel data into read multiplexor 604 to assemble the accessed texture data with the incoming pixels. Read multiplexor 604 takes into account how the ports were mapped with the pixels during the assembling process. For example, in a typical implementation, cache 602 includes output ports A and B. Port A reads cache lines for pixels 0 and 1 and port B reads cache lines for pixels 2 and 3. Read multiplexor 604 expands the texel data back out to four pixels.
  • Having now described the invention in accordance with the requirements of the patent statutes, those skilled in the art will understand how to make changes and modifications to the present invention to meet their specific requirements or conditions. Such changes and modifications may be made without departing from the scope and spirit of the invention as set forth in the following claims.

Claims (17)

1. A texture data reading apparatus, comprising:
a cache memory including a plurality of read ports and a plurality of regions to store pixel texture data;
an address comparator including a plurality of input ports to receive incoming pixels, wherein the address comparator compares the memory addresses associated with the incoming pixels to determine which regions of cache memory are accessed; and
a cache lookup device to access new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports.
2. The texture data reading apparatus claimed in claim 1, further comprising:
a multiplexor to associate the pixel texture data accessed from the cache memory region associated with each incoming pixel.
3. The texture data reading apparatus claimed in claim 1, wherein the cache regions include cache lines.
4. The texture data reading apparatus claimed in claim 1, wherein the number of cache read ports is less than the number of address comparator input ports.
5. The texture data reading apparatus, wherein the cache lookup device accesses new texture data from the cache memory for the incoming pixels in more than one clock cycle in response to the number of memory regions accessed being greater than the number of cache memory read ports.
6. A rendering apparatus for generating drawing image data, comprising:
a coordinate processing unit for receiving vertex data of a polygon including coordinates of the vertices, and for generating coordinate data representing coordinates in the polygon from the coordinates of the vertices of the polygon;
a cache memory including a plurality of memory read ports and a plurality of regions to store pixel texture data;
an address comparator to receive vertex data of the polygon including texture coordinates of the vertices and to generate texture addresses in the polygon from the texture coordinates of the vertices of the polygon, the texture addresses referring to texture data in the cache memory, wherein the address comparator compares the texture addresses associated with incoming pixels to determine which regions of cache memory are accessed; and
a cache lookup device to access new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports.
7. A rendering apparatus for generating drawing image data, comprising:
a cache memory including a plurality of memory read ports and a plurality of regions to store pixel texture data; and
a plurality of rendering units for receiving vertices data of a polygon and for generating data for drawing an image each rendering unit including a texture memory and a reading unit for reading texture data from the texture memory; and wherein each reading unit includes:
an address comparator to receive vertex data of the polygon including texture coordinates of the vertices and to generate texture addresses in the polygon from the texture coordinates of the vertices of the polygon, the texture addresses referring to texture data in the cache memory, wherein the address comparator compares the texture addresses associated with incoming pixels to determine which regions of cache memory are accessed; and
a cache lookup device to access new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports.
8. A machine readable medium having stored therein a plurality of machine readable instructions executable by a processor to read texture data, comprising:
instructions to compare the memory addresses associated with incoming pixels to determine which regions of cache memory are accessed;
instructions to access new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports; and
instructions to read cache ports and a plurality of regions to store pixel texture data.
9. The machine readable medium claimed in claim 8, further comprising:
instructions to associate the pixel texture data accessed from the cache memory region associated with each incoming pixel.
10. The machine readable medium claimed in claim 8, wherein the cache regions include cache lines.
11. The machine readable medium claimed in claim 8, wherein the number of cache read ports is less than the number of address comparator input ports.
12. The machine readable medium claimed in claim 8, further comprising:
instructions to access new texture data from the cache memory for the incoming pixels in more than one clock cycle in response to the number of memory regions accessed being greater than the number of cache memory read ports.
13. A method to read texture data, comprising:
comparing the memory addresses associated with incoming pixels to determine which regions of cache memory are accessed;
accessing new texture data from the cache memory for the incoming pixels in the same clock cycle in response to the number of memory regions accessed being less than or equal to the number of cache memory read ports; and
reading cache ports and a plurality of regions to store pixel texture data.
14. The method claimed in claim 13, further comprising:
associating the pixel texture data accessed from the cache memory region associated with each incoming pixel.
15. The method claimed in claim 13, wherein the cache regions include cache lines.
16. The method claimed in claim 13, wherein the number of cache read ports is less than the number of address comparator input ports.
17. The method claimed in claim 13, further comprising:
accessing new texture data from the cache memory for the incoming pixels in more than one clock cycle in response to the number of memory regions accessed being greater than the number of cache memory read ports.
US11/147,621 2002-12-24 2005-06-07 Method and apparatus for reading texture data from a cache Abandoned US20050225557A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/147,621 US20050225557A1 (en) 2002-12-24 2005-06-07 Method and apparatus for reading texture data from a cache

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/328,988 US6924812B2 (en) 2002-12-24 2002-12-24 Method and apparatus for reading texture data from a cache
US11/147,621 US20050225557A1 (en) 2002-12-24 2005-06-07 Method and apparatus for reading texture data from a cache

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/328,988 Continuation US6924812B2 (en) 2002-12-24 2002-12-24 Method and apparatus for reading texture data from a cache

Publications (1)

Publication Number Publication Date
US20050225557A1 true US20050225557A1 (en) 2005-10-13

Family

ID=32594642

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/328,988 Expired - Fee Related US6924812B2 (en) 2002-12-24 2002-12-24 Method and apparatus for reading texture data from a cache
US11/147,621 Abandoned US20050225557A1 (en) 2002-12-24 2005-06-07 Method and apparatus for reading texture data from a cache

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/328,988 Expired - Fee Related US6924812B2 (en) 2002-12-24 2002-12-24 Method and apparatus for reading texture data from a cache

Country Status (1)

Country Link
US (2) US6924812B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150097851A1 (en) * 2013-10-09 2015-04-09 Nvidia Corporation Approach to caching decoded texture data with variable dimensions

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7249242B2 (en) * 2002-10-28 2007-07-24 Nvidia Corporation Input pipeline registers for a node in an adaptive computing engine
JP3780954B2 (en) * 2002-02-06 2006-05-31 ソニー株式会社 Image generating apparatus and method
US7936359B2 (en) * 2006-03-13 2011-05-03 Intel Corporation Reconfigurable floating point filter
US20150228106A1 (en) * 2014-02-13 2015-08-13 Vixs Systems Inc. Low latency video texture mapping via tight integration of codec engine with 3d graphics engine
US20150279055A1 (en) * 2014-03-28 2015-10-01 Nikos Kaburlasos Mipmap compression
US20220414011A1 (en) * 2021-06-23 2022-12-29 Intel Corporation Opportunistic late depth testing to prevent stalling for overlapping cache lines
CN117795946A (en) * 2021-07-28 2024-03-29 华为技术有限公司 Data reading device and related method

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4794384A (en) * 1984-09-27 1988-12-27 Xerox Corporation Optical translator device
US4905141A (en) * 1988-10-25 1990-02-27 International Business Machines Corporation Partitioned cache memory with partition look-aside table (PLAT) for early partition assignment identification
US5189403A (en) * 1989-09-26 1993-02-23 Home Row, Inc. Integrated keyboard and pointing device system with automatic mode change
US5578813A (en) * 1995-03-02 1996-11-26 Allen; Ross R. Freehand image scanning device which compensates for non-linear movement
US5786804A (en) * 1995-10-06 1998-07-28 Hewlett-Packard Company Method and system for tracking attitude
US5994710A (en) * 1998-04-30 1999-11-30 Hewlett-Packard Company Scanning mouse for a computer system
US6094190A (en) * 1997-03-03 2000-07-25 Telefonaktiebolaget Lm Ericsson Device for controlling a position indicator on a visual display
US6151015A (en) * 1998-04-27 2000-11-21 Agilent Technologies Pen like computer pointing device
US6243080B1 (en) * 1998-07-14 2001-06-05 Ericsson Inc. Touch-sensitive panel with selector
US6351657B2 (en) * 1996-11-29 2002-02-26 Sony Corporation Information input device, cursor moving device and portable telephone
US6433789B1 (en) * 2000-02-18 2002-08-13 Neomagic Corp. Steaming prefetching texture cache for level of detail maps in a 3D-graphics engine
US20020180880A1 (en) * 2001-06-01 2002-12-05 Bean Heather Noel Conductively coated and grounded optics to eliminate dielectric dust attraction
US20020190953A1 (en) * 1998-03-30 2002-12-19 Agilent Technologies, Inc. Seeing eye mouse for a computer system
US20030001078A1 (en) * 2001-06-28 2003-01-02 Izhak Baharav Bad pixel detection and correction in an image sensing device
US20030006965A1 (en) * 2001-07-06 2003-01-09 Bohn David D. Method and apparatus for indicating an operating mode of a computer-pointing device
US6507540B1 (en) * 1999-08-31 2003-01-14 Terastor Corporation Hybrid optical head for data storage
US20030028688A1 (en) * 2001-04-10 2003-02-06 Logitech Europe S.A. Hybrid presentation controller and computer input device
US20030103037A1 (en) * 2001-12-05 2003-06-05 Em Microelectronic-Marin Sa Sensing device for optical pointing devices such as an optical mouse
US6650314B2 (en) * 2000-09-04 2003-11-18 Telefonaktiebolaget Lm Ericsson (Publ) Method and an electronic apparatus for positioning a cursor on a display
US20040051798A1 (en) * 2002-09-18 2004-03-18 Ramakrishna Kakarala Method for detecting and correcting defective pixels in a digital image sensor
US6744438B1 (en) * 1999-06-09 2004-06-01 3Dlabs Inc., Ltd. Texture caching with background preloading
US20040123001A1 (en) * 1998-12-28 2004-06-24 Alps Electric Co., Ltd. Dual pointing device used to control a cursor having absolute and relative pointing devices
US20040212586A1 (en) * 2003-04-25 2004-10-28 Denny Trueman H. Multi-function pointing device

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4794384A (en) * 1984-09-27 1988-12-27 Xerox Corporation Optical translator device
US4905141A (en) * 1988-10-25 1990-02-27 International Business Machines Corporation Partitioned cache memory with partition look-aside table (PLAT) for early partition assignment identification
US5189403A (en) * 1989-09-26 1993-02-23 Home Row, Inc. Integrated keyboard and pointing device system with automatic mode change
US5644139A (en) * 1995-03-02 1997-07-01 Allen; Ross R. Navigation technique for detecting movement of navigation sensors relative to an object
US5578813A (en) * 1995-03-02 1996-11-26 Allen; Ross R. Freehand image scanning device which compensates for non-linear movement
US5786804A (en) * 1995-10-06 1998-07-28 Hewlett-Packard Company Method and system for tracking attitude
US6281882B1 (en) * 1995-10-06 2001-08-28 Agilent Technologies, Inc. Proximity detector for a seeing eye mouse
US6351657B2 (en) * 1996-11-29 2002-02-26 Sony Corporation Information input device, cursor moving device and portable telephone
US6094190A (en) * 1997-03-03 2000-07-25 Telefonaktiebolaget Lm Ericsson Device for controlling a position indicator on a visual display
US20020190953A1 (en) * 1998-03-30 2002-12-19 Agilent Technologies, Inc. Seeing eye mouse for a computer system
US6151015A (en) * 1998-04-27 2000-11-21 Agilent Technologies Pen like computer pointing device
US5994710A (en) * 1998-04-30 1999-11-30 Hewlett-Packard Company Scanning mouse for a computer system
US6243080B1 (en) * 1998-07-14 2001-06-05 Ericsson Inc. Touch-sensitive panel with selector
US20040123001A1 (en) * 1998-12-28 2004-06-24 Alps Electric Co., Ltd. Dual pointing device used to control a cursor having absolute and relative pointing devices
US6744438B1 (en) * 1999-06-09 2004-06-01 3Dlabs Inc., Ltd. Texture caching with background preloading
US6507540B1 (en) * 1999-08-31 2003-01-14 Terastor Corporation Hybrid optical head for data storage
US6433789B1 (en) * 2000-02-18 2002-08-13 Neomagic Corp. Steaming prefetching texture cache for level of detail maps in a 3D-graphics engine
US6650314B2 (en) * 2000-09-04 2003-11-18 Telefonaktiebolaget Lm Ericsson (Publ) Method and an electronic apparatus for positioning a cursor on a display
US20030028688A1 (en) * 2001-04-10 2003-02-06 Logitech Europe S.A. Hybrid presentation controller and computer input device
US20020180880A1 (en) * 2001-06-01 2002-12-05 Bean Heather Noel Conductively coated and grounded optics to eliminate dielectric dust attraction
US20030001078A1 (en) * 2001-06-28 2003-01-02 Izhak Baharav Bad pixel detection and correction in an image sensing device
US20030006965A1 (en) * 2001-07-06 2003-01-09 Bohn David D. Method and apparatus for indicating an operating mode of a computer-pointing device
US20030103037A1 (en) * 2001-12-05 2003-06-05 Em Microelectronic-Marin Sa Sensing device for optical pointing devices such as an optical mouse
US20040051798A1 (en) * 2002-09-18 2004-03-18 Ramakrishna Kakarala Method for detecting and correcting defective pixels in a digital image sensor
US20040212586A1 (en) * 2003-04-25 2004-10-28 Denny Trueman H. Multi-function pointing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150097851A1 (en) * 2013-10-09 2015-04-09 Nvidia Corporation Approach to caching decoded texture data with variable dimensions
US10032246B2 (en) * 2013-10-09 2018-07-24 Nvidia Corporation Approach to caching decoded texture data with variable dimensions

Also Published As

Publication number Publication date
US6924812B2 (en) 2005-08-02
US20040119719A1 (en) 2004-06-24

Similar Documents

Publication Publication Date Title
US7050063B1 (en) 3-D rendering texture caching scheme
KR101034925B1 (en) Method and apparatus for encoding texture information
US7456835B2 (en) Register based queuing for texture requests
US8189007B2 (en) Graphics engine and method of distributing pixel data
US20050225557A1 (en) Method and apparatus for reading texture data from a cache
US8233006B2 (en) Texture level tracking, feedback, and clamping system for graphics processors
US7580042B2 (en) Systems and methods for storing and fetching texture data using bank interleaving
US5757374A (en) Method and apparatus for performing texture mapping
US20080150951A1 (en) 3-d rendering engine with embedded memory
EP0613098B1 (en) Image processing apparatus and method of controlling the same
US6661424B1 (en) Anti-aliasing in a computer graphics system using a texture mapping subsystem to down-sample super-sampled images
EP1994506A1 (en) Texture unit for multi processor environment
JPH08212382A (en) Z-buffer tag memory constitution
US6812928B2 (en) Performance texture mapping by combining requests for image data
US7069387B2 (en) Optimized cache structure for multi-texturing
US6091428A (en) Frame buffer memory system for reducing page misses when rendering with color and Z buffers
US10019349B2 (en) Cache memory and method of managing the same
US20020171672A1 (en) Graphics data accumulation for improved multi-layer texture performance
US6982719B2 (en) Switching sample buffer context in response to sample requests for real-time sample filtering and video generation
KR20140056146A (en) Method for estimation of occlusion in a virtual environment
US7053902B1 (en) Image processing apparatus and method of processing images that stops operation of pixel processing circuits when pixel data to be processed is not needed
US6819320B2 (en) Reading or writing a non-super sampled image into a super sampled buffer
US6590579B1 (en) System for low miss rate replacement of texture cache lines
US20030231180A1 (en) Image processing apparatus and method of same
JP3548648B2 (en) Drawing apparatus and drawing method

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION