US20160071234A1 - Block-based lossless compression of geometric data - Google Patents

Block-based lossless compression of geometric data Download PDF

Info

Publication number
US20160071234A1
US20160071234A1 US14/737,343 US201514737343A US2016071234A1 US 20160071234 A1 US20160071234 A1 US 20160071234A1 US 201514737343 A US201514737343 A US 201514737343A US 2016071234 A1 US2016071234 A1 US 2016071234A1
Authority
US
United States
Prior art keywords
geometric
vertex
data
compression block
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/737,343
Inventor
Jaakko T. Lehtinen
Timo Oskari Aila
Tero Tapani KARRAS
Alexander Keller
Nikolaus Binder
Carsten Alexander Waechter
Samuli Matias Laine
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Priority to US14/737,343 priority Critical patent/US20160071234A1/en
Assigned to NVIDIA CORPORATION reassignment NVIDIA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KELLER, ALEXANDER, AILA, TIMO OSKARI, KARRAS, TERO TAPANI, LAINE, SAMULI MATIAS, LEHTINEN, JAAKKO T., BINDER, NIKOLAUS, WAECHTER, CARSTEN ALEXANDER
Publication of US20160071234A1 publication Critical patent/US20160071234A1/en
Priority to US16/502,415 priority patent/US10866990B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/08Volume rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/80Shading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/129Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the present invention relates to numerical geometric data representation, and more particularly to block-based lossless compression and decompression of numerical geometric data.
  • Three-dimensional (3D) computer graphics rendering techniques may generate a two-dimensional (2D) representation of a 3D scene.
  • a given 3D scene is typically represented as a collection of geometric primitives (e.g., points, lines, triangles, quads, meshes, etc.).
  • Each geometric primitive may include vertex information represented as floating-point values.
  • a triangle primitive may include three vertices, and each one of the three vertices may include a 3D coordinate represented as an ordered set of three floating-point values.
  • Object-based rasterization and ray tracing are two commonly implemented techniques for generating a 2D representation of a 3D scene. Both techniques frequently access geometric primitive data stored in memory and generate intensive memory bandwidth demands. Because the number of geometric primitives in a typical scene may be quite large (e.g., on the order of many millions of triangles, etc.), memory bandwidth limitations may constrain overall rendering performance. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.
  • An apparatus, computer readable medium, and method for generating decompressed geometric data from a compression block.
  • the method comprises receiving a compression block configured to store a header and compressed geometric data for at least two geometric primitives and identifying a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives, based on a first local index.
  • the method also includes generating a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data, based on at least a first anchor value, where the first set of decompressed geometric data comprises more bits of data than the first set of compressed geometric data.
  • the apparatus may comprise circuitry within a processing unit, such as a graphics processing unit (GPU), or a parallel processing unit, decompression unit, or memory interface unit therein.
  • the apparatus may include circuitry to implement one or more decompression techniques for decompressing vertex information associated with triangle primitives.
  • Other embodiments include software, hardware, and systems configured to perform method steps for generating decompressed geometric data from the compression block.
  • FIG. 1A illustrates a flowchart of a method for generating decompressed geometric data from a compression block, in accordance with one embodiment
  • FIG. 1B illustrates a compression block structure configured to store uncompressed triangle data, in accordance with one embodiment
  • FIG. 1C illustrates a compression block structure configured to store compressed triangle data, in accordance with one embodiment
  • FIG. 1D illustrates a flowchart of a method for identifying a compression block based on a global identifier, in accordance with one embodiment
  • FIG. 1E illustrates an indirection data structure comprising a plurality of indirection blocks, in accordance with one embodiment
  • FIG. 1F illustrates an exemplary structure of an indirection block, in accordance with one embodiment
  • FIG. 1G illustrates a geometric data processing system configured to decompress geometric data from a compression block residing within memory, in accordance with one embodiment
  • FIG. 2 illustrates a parallel processing unit, in accordance with one embodiment
  • FIG. 3 illustrates a general processing cluster of the parallel processing unit of FIG. 2 , in accordance with one embodiment
  • FIG. 4 illustrates an exemplary system in which the various architecture and/or functionality of the various previous embodiments may be implemented.
  • Three-dimensional (3D) graphics rendering techniques typically represent a 3D scene as a collection of geometric primitives.
  • Each geometric primitive may include geometric data such as vertex coordinates, texture coordinates, or any other technically relevant information.
  • the collection of geometric primitives may be stored in a memory subsystem and accessed from the memory subsystem to render the scene.
  • scene rendering is performed, at least in part, by a graphics processing unit (GPU), and the collection of geometric primitives representing a given 3D scene is stored in a memory subsystem coupled to the GPU.
  • GPU graphics processing unit
  • Geometric data for one or more geometric primitives may be stored within a compression block.
  • Each compression block may correspond in size to a cache line within the GPU.
  • the collection of geometric primitives for the 3D scene may be stored in a plurality of compression blocks, with a variable number of geometric primitives stored in any one compression block.
  • the number of geometric primitives stored within a given compression block is a function of data similarity of geometric data values for associated geometric primitives.
  • the compression blocks may be identified by a compression block number, with sequential compression blocks having corresponding sequential compression block numbers. Furthermore, sequential compression blocks may provide storage for sequentially identified geometric primitives.
  • Each geometric primitive may be identified using a unique identifier, such as a unique thirty-two bit integer value.
  • the unique identifier may also be described as a global identifier because each value is globally unique within an identifier space for primitives.
  • Accessing data for a given geometric primitive specified by an associated identifier involves first locating an appropriate compression block within the memory subsystem where the geometric primitive resides.
  • a mapping data structure may be constructed to locate the appropriate compression block and data for the geometric primitive. The mapping data structure accounts for the variable number of geometric primitives stored in each compression block associated with the 3D scene.
  • Rendering techniques based on ray tracing may organize 3D primitives occupying a 3D space using a bounding volume hierarchy (BVH), a data structure designed to efficiently encode spatial relationships among 3D objects comprising sets of 3D primitives.
  • Each 3D primitive within the BVH may be represented as a hounding volume, such as an axis-aligned bounding box (AABB), defined by a pair of bounding planes in each of three dimensions.
  • Geometric primitives within a given AABB may include spatially similar coordinate positions and corresponding numeric representations of associated geometric data, such as vertex coordinates, may include similar bit patterns. In certain usage cases, the similar coordinate positions may align to powers of two fractional increments corresponding to an authoring tool grid resolution.
  • geometric primitives associated with fans or meshes may include common vertex coordinates. Similar and common numeric information associated with geometric primitives may be identified as the basis for compression of the numeric information.
  • geometric data stored within a compression block is decompressed.
  • Certain embodiments of the present invention implement logic circuitry within the GPU that receives a primitive identifier associated with a geometric primitive and returns geometric data for the geometric primitive.
  • the logic circuitry may be associated with a memory controller or a processing core within the to provide transparent decompression of geometric data. Compression of uncompressed geometric data may be implemented using any technically feasible technique that generates suitably formatted compression blocks.
  • FIG. 1A illustrates a flowchart of a method 100 for generating decompressed geometric data from a compression block, in accordance with one embodiment.
  • a decompression unit such as decompression unit 196 of FIG. 1G is configured to perform method 100 .
  • the decompression unit may reside within memory partition unit 280 of FIG. 2 , or within any other technically feasible circuitry associated with parallel processing unit (PPU) 200 of FIG. 2 .
  • the decompression unit may reside within any technically feasible functional unit or units associated with a computer system architecture.
  • the decompression unit may be implemented using function-specific logic circuitry, such as a function-specific portion of a processing pipeline configured to perform at least method 100 .
  • the decompression unit may be implemented as instructions or microcode for controlling a processing unit.
  • the instructions may be encoded within non-transitory computer-readable medium such as a read-only solid-state memory or a programmable solid-state flash memory.
  • Method 100 begins at step 102 , where the decompression unit receives a compression block configured to store a header and compressed geometric data for at least two geometric primitives.
  • Each of the at least two geometric primitives is associated with a local index within the compression block.
  • the local index may be determined based on a global identifier of the primitive that uniquely identifies a geometric primitive within a set of geometric primitives that collectively define a 3D scene.
  • the header may include at least one mode bit that indicates whether geometric data within the compression block is stored in an uncompressed format or in a compressed format.
  • the uncompressed format may be compatible with a compressed format for representing other geometric data that is compressed.
  • the geometric data compresses according to a data-dependent compression ratio, allowing geometric data representing a variable number of geometric primitives to be stored within the compression block.
  • An uncompressed format for representing geometric data is described in more detail in conjunction with FIG. 1B and a compressed format for representing geometric data is described in more detail in conjunction with FIG. 1C .
  • method 100 is applied to data in the compressed format illustrated in FIG. 1C .
  • Each of the multiple compression blocks may represent geometric data in the compressed format or the uncompressed format, as indicated by the at least one mode bit.
  • Each of the multiple compression blocks may include geometric data for multiple geometric primitives, such as triangles. All geometric data for any one geometric primitive (e.g. one triangle) may reside entirely within one associated compression block.
  • geometric data for a varying number of geometric primitives may reside within the compression block. Consequently, geometric data for a specific geometric primitive may be located at a variable location within the compression block.
  • the variable location is a function of the number of geometric primitives represented within the compression block. The variable location, along with location information for geometric data associated with other geometric primitives within the compression block is recorded within a topology field of the compression block.
  • the decompression unit based on the first local index, identifies a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives.
  • the first local index is received in conjunction with receiving the compression block.
  • the first geometric primitive is a triangle and the first set of compressed geometric data comprises three vertex positions, each of which includes a three-dimensional coordinate.
  • Each three-dimensional coordinate may include three floating-point values, which may be stored in a compressed format. Each of the three floating-point values may be stored using a compressed representation of a thirty-two bit floating-point encoding. Alternatively, each three-dimensional coordinate may include three fixed-point values, three integer values, or three values defined by any technically feasible numeric representation, any of which may be stored in a compressed format.
  • a second local index may be received in conjunction with receiving the compression block for identifying a second set of compressed geometric data for a second geometric primitive.
  • One or more vertex positions associated with the second geometric primitive may be represented as references to equivalent vertex positions associated with the first geometric primitive.
  • the decompression unit generates a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data based on at least a first anchor value.
  • the first anchor value is one of three anchor values of a three-dimensional anchor position. Each one of the three anchor values may correspond to one of the dimensions of the three-dimensional anchor position. Additional geometric primitive vertex positions may be represented using three-dimensional offsets relative to the three-dimensional anchor position.
  • the anchor position may serve as one vertex position (e.g. vertex position zero), while other vertex positions are defined as offsets relative to the anchor position.
  • the first set of decompressed geometric data includes three vertex positions, each comprising a three-dimensional position.
  • Each of the three vertex positions may be represented within the compression block as a three-dimensional offset position relative to the three-dimensional anchor position.
  • Each three-dimensional offset position may be represented as a set of compressed numeric values, and each of the compressed numeric values may be compressed according to a different compression ratio.
  • FIG. 1B illustrates a compression block 140 configured to store uncompressed triangle data, in accordance with one embodiment.
  • compression block 140 includes one-thousand twenty-four (1024 or 2 ⁇ 10) bits, starting at bit 0 and ending at bit 1023 .
  • compression block 140 includes a header field 148 , a triangle 0 field 142 , a triangle 1 field 144 , and a triangle 2 field 146 .
  • Each field 142 , 144 , 146 , 148 includes subfields, and each subfield is labeled with a hit count in a second line.
  • the subfield labeled “Mode” of header field 148 includes three bits, as indicated by the “3” on the second line for the subfield.
  • the mode subfield specifies how to interpret other bits within compression block 140 . At least one of the possible eight different bit codes for the mode subfield specifies that compression block 140 should be interpreted as shown here, having data for three different triangles stored in an uncompressed format.
  • Header 148 may also include three alpha ( ⁇ ) bits, an MD2 subfield having 32 bits, an MD1 subfield having 32 bits, and an MD0 subfield having 32 bits.
  • subfield MD2 stores an application-specific triangle metadata value associated with a triangle 2
  • subfield MD1 stores a triangle metadata value associated with a triangle 1
  • subfield MD0 stores a triangle metadata value associated with triangle 0.
  • each of the three alpha bits may indicate whether a corresponding triangle (e.g., triangle 2, triangle 1, triangle 0) is fully opaque (or, alternatively, partially transparent).
  • triangle 0 field 142 includes three vertices.
  • a first of the three vertices may include coordinates (X0, Y0, Z0), specified by corresponding 32-bit values.
  • a second of the three vertices may include coordinates (X1, Y1, Z1), specified by corresponding 32-bit values.
  • a third of the three vertices may include coordinates (X2, Y2, Z2), specified by corresponding 32-bit values.
  • compression block 140 may include a different number of bits specified as a power of two, such as 512 (2 ⁇ 9) bits, 2048 (2 ⁇ 11) bits, or 4096 (2 ⁇ 12) bits. In alternative embodiments, compression block 140 may include a number of bits that is not an integer power of two. In certain embodiments, the number of bits included within compression block 140 corresponds to the number of bits included within a cache line tier an associated processing unit. Compression block 140 is structured to be compatible with other formats that store geometric data in a compressed format, as illustrated below in FIG. 1C .
  • FIG. 1C illustrates a compression block 150 configured to store compressed triangle data, in accordance with one embodiment.
  • compression block 150 includes one-thousand twenty-four (1024 or 2 ⁇ 10) bits, starting at bit 0 and ending at bit 1023 .
  • compression block 150 includes a vertex positions field 152 , a topology field 154 , and header field 156 .
  • Header field 156 includes a precision subfield 160 , a number of triangles subfield 161 , a shift subfield 162 , and a mode subfield 163 .
  • mode subfield 163 includes three bits and specifies how to interpret the remaining bits of compression block 150 from a set of enumerated compression block formats. Compression block 140 illustrates one such format, and compression block 150 illustrates another such format. Mode subfield 163 and the mode subfield of compression block 140 occupy the same upper bits (bit 1023 , bit 1022 , and bit 1021 ) of a 1024 bit compression block format. In other embodiments, mode subfield 163 may include a different number of bits.
  • Precision subfield 160 includes subfields P.X, P.Y, P.Z, and P.MD.
  • the subfields P.X, P.Y, P.Z, and P.MD of precision subfield 160 each include five bits.
  • Precision subfield P.X specifies a number of bits for representing vertex position offsets in the x-dimension within compression block 150
  • precision subfield P.Y specifies a number of bits for representing vertex position offsets in the y-dimension within compression block 150
  • precision subfield P.Z specifies a number of bits for representing vertex position offsets in the z-dimension within compression block 150 .
  • Precision subfield P.MD specifies a number of bits for a triangle metadata offset.
  • the number of triangles stored within compression block 150 is indicated by the number of triangles subfield 161 .
  • Precision subfields P.X, P. Y, P.Z, and P.MD, along with number of triangles subfield 161 may store a given value represented by the value minus one.
  • precision subfield P.X may store a value of seven.
  • shift subfield 162 indicates the lowest bit position affected when position offsets 169 are combined with values in vertex position anchor subfield 167 .
  • Vertex positions field 152 includes a vertex position anchor subfield 167 and a vertex position offset subfield 168 .
  • vertex position anchor subfield 167 includes subfields for X, Y, and Z.
  • vertex position anchor subfield 167 comprises the three-dimensional anchor position of FIG. 1A .
  • each of the subfields X, Y, Z within the vertex position anchor subfield 167 may represent a thirty-two bit floating-point value.
  • Three-dimensional position offsets 169 from the three-dimensional position anchor are represented by X, Y, and Z offsets within vertex position offset subfield 168 .
  • Each three-dimensional position offset 169 represents a corresponding vertex position within a three-dimensional space.
  • Position offset 169 ( 1 ) being associated with vertex position one
  • position offset 169 ( 2 ) being associated with vertex position two
  • Each position offset 169 may be combined to the three-dimensional anchor position to generate a corresponding vertex position.
  • the three-dimensional anchor position may be associated with vertex position zero so that a reference to vertex position zero refers to the anchor position.
  • a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by replacing the P.X lowest bits of the vertex position anchor x value by the vertex position offset x value 169
  • a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by replacing the P.Y lowest bits of the vertex position anchor y value by the vertex position offset y value 169
  • a vertex position z coordinate is generated by combining a vertex position offset z value 169 with a vertex position anchor z value from vertex position anchor 167 by replacing the P.Z lowest bits of the vertex position anchor z value by the vertex position offset z value 169 .
  • a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by replacing bits SHIFT 162 . . . SHIFT+P.X. ⁇ 1 of the vertex position anchor x value by the vertex position offset x value 169
  • a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by replacing bits SHIFT 162 .
  • a vertex position z coordinate is generated by combining a vertex position offset z value 169 with vertex position anchor z value from vertex position anchor 167 by replacing bits SHIFT 162 . . . SHIFT+P.Z ⁇ 1 of the vertex position anchor z value by the vertex position offset z value 169 .
  • a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by performing a binary integer addition of vertex position anchor x value and the vertex position offset x value 169
  • a vertex position y coordinate is generated by combining a vertex position offset y value 169 with vertex position anchor y value 167 by performing a binary integer addition of vertex position anchor y value and vertex position offset y value 169
  • a vertex position z coordinate is generated by combining a vertex position offset z value 169 with vertex position anchor z value 167 by performing a binary integer addition of the vertex position anchor z value and the vertex position offset z value 169 .
  • the vertex position offset values 169 may be sign-extended to 32 bits before the binary integer addition is performed.
  • a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by performing a binary integer addition of the vertex position offset x value 169 shifted left by a number of bit positions specified by a shift value stored in the SHIFT subfield 162 and a vertex position anchor x value from vertex position anchor 167
  • a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by performing a binary integer addition of the vertex position offset y value 169 shifted left by a number of bit positions specified by a shift value stored in the SHIFT subfield 162 and a vertex position anchor y value from vertex position anchor 167
  • a vertex position z coordinate is generated by combining a vertex position offset z value 169 with a vertex position anchor z value from vertex position anchor 167
  • Each vertex position may be referenced by one or more triangles stored within the compression block 150 .
  • a first triangle forming a quad may share two vertices with a second triangle forming the quad. If the first triangle and the second triangle are stored within the same compression block 150 , then vertex position information for each of the two shared vertices need only be stored once within compression block 150 .
  • Vertex positions for the first triangle may include references to the two shared vertex positions as well as a reference to a third vertex position.
  • vertex positions for the second triangle may include references to the two shared vertex positions as well as a reference to a fourth vertex position. In total, the quad needs only four vertex positions represented within compression block 150 rather than six because two are shared.
  • Each dimension of each vertex position offset may include a different number of bits of precision.
  • the x-dimension offset may be specified by a number of bits shown as P.X, which corresponds to a value (stored as the value minus one) in the X subfield of precision subfield 160 of header field 156 .
  • Very different precision may be required in each dimension, based on triangle positions.
  • a set of vertex positions may be narrow in the x-dimension, but wider in the y-dimension and z-dimensions. In such a situation, the x-dimension may require fewer bits of precision to represent an offset from the vertex anchor without loss.
  • Topology field 154 associates triangles with vertex position data. Each triangle may be associated with an application-specific triangle metadata (MD) value.
  • a triangle metadata anchor subfield 166 indicates an anchor value for triangle metadata values for triangles stored within compression block 150 .
  • triangle metadata anchor subfield 166 includes a thirty-two bit value.
  • Triangle metadata offset subfield 164 includes a set of offset values that may be used in conjunction with triangle metadata anchor subfield 166 for associating a metadata value for each triangle stored within compression block 150 .
  • Each metadata offset value includes a number of bits specified by the P.MD subfield of precision subfield 160 . For example, if the P.MD subfield specifies five bits, then each subfield within the triangle metadata offset subfield 164 includes five bits.
  • thirty-two bit metadata values for triangles stored within compression block 150 may be represented using only five bits each rather than thirty-two bits each.
  • Each triangle stored within compression block 150 may be identified by a local index.
  • vertex IDs subfield 165 includes an alpha (a) bit for each triangle within compression block 150 to indicate whether the triangle is fully opaque (or, alternatively, partially transparent). Furthermore, a set of three vertex indices is included within vertex IDs subfield 165 for each triangle 1 through M ⁇ 1 within compression block 150 .
  • the three vertex indices of triangle 0 within compression block 150 may be fixed to values 0, 1, and 2.
  • each vertex index within a set of three vertex indices is allocated four bits (twelve bits per triangle), providing an index space for referencing sixteen different vertex positions. For a given triangle, a first vertex position is determined by a first vertex index into vertex positions field 152 . A second vertex position is determined by a second vertex index into vertex positions field 152 , and a third vertex position is determined by a third vertex index into vertex positions field 152 .
  • vertex positions are shared among triangles, as is common in meshes and fans, more triangles may fit within compression block 150 because fewer vertex positions may be needed per triangle.
  • vertex positions may be represented as relatively small offsets to the vertex position anchor, fewer bits may be needed per vertex position offsets 169 , and more triangles may fit within compression block 150 .
  • vertex positions are snapped to a grid, whereby lower mantissa values for the vertex positions are constant, thereby requiring fewer bits to represent position offsets 169 .
  • subfields within 156 may be written to indicate an appropriate number of bits needed to represent vertex positions and an appropriate number of triangles.
  • triangle metadata offset subfield 164 includes (M ⁇ 1)*P.MD bits, where M is a number of triangles and P.MD is the number of bits specified by the MI) subfield within the precision subfield 160 .
  • vertex IDs subfield 165 includes (M ⁇ 1)*13+1 bits.
  • vertex position offset subfield 168 includes (N ⁇ 1)*(P.X+P.Y+P.Z) bits, where N corresponds to the total number of vertex positions represented in the compression block 150 . Consequently, a highly variable (three to sixteen) number of triangles may fit within compression block 150 .
  • compression block 150 includes a larger number of bits (e.g. 2048, 4096), and more triangles may be stored therein.
  • the process of generating compression block 150 may be performed using any technically feasible technique. For example, in ray-tracing systems that implement a bounding volume hierarchy (BVH) tree, triangles are organized according to spatial locality. In such a system, generating a compression block 150 with candidate triangles for compression involves linearly scanning through a list of triangles in BVH leaf order and adding sequential triangles to a compression block until no more triangles can fit. That is, if T triangles may be successfully encoded into the compression block then encoding T+1 triangles is attempted. If encoding 1+1 succeeds, then encoding T+2 triangles is attempted, and so on. When encoding one more triangle fails, then the previous encoding is used.
  • BVH bounding volume hierarchy
  • each compression block 150 is self-contained in that vertex positions for each triangle are available. This approach preserves locality in that triangles that were near to each other in an uncompressed representation remain near each other after compression.
  • each compression block 150 is sized according to a host system's cache line size. In the above examples, this size is assumed to be 1024 bits, but the teachings disclosed herein will be readily understood by persons of ordinary skill in the art as being equally applicable to smaller cache lines (e.g., 512 bits), and larger cache lines (2048 bits, 4096 bits, or more).
  • the first technique involves opportunistic indexing.
  • the second technique is lossless delta encoding of vertex positions.
  • Each of the two techniques may be implemented alone or in combination. In one embodiment, both techniques are implemented to generate compression block 150 .
  • Opportunistic indexing involves checking whether a newly added triangle shares any vertices with any other triangles already added to compression block 150 . If any vertices are shared, then the shared vertex positions are referenced rather than explicitly added as new vertex offset positions 169 when including the newly added triangle to compression block 150 .
  • lossless delta encoding involves encoding floating-point values associated with a particular dimension (x, y, z) relative to corresponding values for the anchor position. For example, encoding a new floating-point value associated with the x-dimension involves encoding the new value relative to a floating-point anchor position value for the x-dimension. If the new value and the anchor position value are close to each other, their binary representations typically differ only in some number of the lowest-order bits. For example, a bit-wise difference between two nearby floating-point values frequently requires less than twenty-three bits.
  • the X subfield of precision subfield 160 indicates how many bits are necessary to store all vertex position offsets in the x-dimension without loss for triangles stored within compression block 150 .
  • the Y subfield of precision subfield 160 indicates how many bits are necessary to store all vertex position offsets in the y-dimension without loss for triangles stored within compression block 150
  • the Z subfield of precision subfield 160 indicates how many bits are necessary to store all vertex position offsets in the z-dimension without loss.
  • the value for each subfield P.X, P.Y, and P.Z is data-dependent and may vary accordingly.
  • the above technique for compressing vertex data into compression blocks generates compression blocks that contain a potentially variable number of vertices and triangles each. If random access to this data is needed, a mechanism for mapping global primitive indices to compression blocks and further to individual triangle primitives through local indices within the compression blocks is needed.
  • This mapping is achieved using an indirection data structure that comprises a set of indirection blocks. Each such indirection block stores a header field and a payload field that includes one bit per triangle. The one bit per triangle indicates whether a corresponding triangle begins a new compression block.
  • the header identifies a compression block by index and a local index within the identified compression block for the first triangle of the indirection block.
  • FIG. 1D illustrates a flowchart of a method 170 for identifying a compression block based on a global identifier, in accordance with one embodiment.
  • a decompression unit such as decompression unit 196 of FIG. 1G is configured to perform method 170 .
  • the decompression unit may reside within memory partition unit 280 of FIG. 2 , or within any other technically feasible circuitry associated with parallel processing unit (PPU) 200 of FIG. 2 .
  • the decompression unit may reside within any technically feasible functional unit or units associated with a computer system architecture.
  • the decompression unit may be implemented using function-specific logic circuitry, such as a function-specific portion of a processing pipeline configured to perform at least method 170 .
  • the decompression unit is realized by reconfigurable logic that may include (but is not restricted to) field programmable gate arrays (FPGAs).
  • the decompression unit may be implemented as instructions or microcode for controlling a processing unit.
  • the instructions may be encoded within non-transitory computer-readable medium such as a read-only solid-state memory or a programmable solid-state flash memory.
  • Method 170 begins at step 172 , where the decompression unit receives a global identifier associated with a geometric primitive.
  • the global identifier comprises a global triangle index and the geometric primitive comprises a triangle.
  • the decompression unit identifies an indirection block based on the global identifier.
  • the decompression unit identifies a compression block and a local index based on the global identifier and the indirection block.
  • An exemplary data structure for implementing method 170 is described below in FIGS. 1E and 1F .
  • FIG. 1E illustrates an indirection data structure 180 comprising a plurality of indirection blocks 182 , in accordance with one embodiment.
  • Each indirection block 182 may include a number of bits equal to a cache line size.
  • each indirection block 182 may include one-thousand twenty-four (1024) bits for systems with cache lines sized to have 1024 bits.
  • Indirection data structure 180 may include a number of indirection blocks 182 depending on the number of global identifiers needed to represent a complete scene.
  • Each sequential indirection block 182 may be identified as having an indirection block number, and may be disposed in corresponding contiguous memory addresses, or further mapped through another level of indirection that maps an indirection block, number to an indirection block 182 at a memory location.
  • FIG. 1F illustrates an exemplary structure of an indirection block 182 , in accordance with one embodiment.
  • each indirection block 182 includes a header field 184 and a payload field 186 .
  • Header field 184 comprises a compression block index subfield 187 , and a local index subfield 188 .
  • Payload field 186 includes a number of bits (P) equal to line size (L, e.g. 1024 bits) minus header size (H).
  • P number of bits
  • L line size
  • H header size
  • Each payload bit corresponds to a unique global identifier number and may father correspond to a geometric primitive identified by the global identifier number.
  • bits within payload field 186 may be identified as payload bit 0 or PB[0] through payload bit P ⁇ 1 or RB[P ⁇ 1].
  • indirection data structure 180 may include a number of indirection blocks 182 equal to the number of global identifiers in the scene divided by the number of payload bits (P), with the resulting quotient rounded up to the next integer.
  • a first indirection block 182 ( 0 ) includes P payload bits corresponding to global identifier values from 0 to P ⁇ 1; a second indirection block 182 ( 1 ) includes P payload bits corresponding to global identifier values from P to 2P ⁇ 1; a third indirection block 182 ( 2 ) includes P payload bits corresponding to global identifier values from 2P to 3P ⁇ 1, and so forth.
  • a fixed mapping from a global identifier e.g., a global triangle index
  • an indirection block 182 may be performed by dividing the global identifier by the number of payload bits (P) and rounding the quotient down to the nearest integer. In one embodiment, the fixed mapping is performed in step 174 of method 170 .
  • mapping from a global identifier to an indirection block 182 is fixed and direct. For example, a global identifier within the range of 4P to 5P ⁇ 1 will map directly to a fifth indirection block 182 ( 4 ). However, mapping the global identifier further from indirection block 182 ( 4 ) to a specific compression block 150 is variable because a variable number of compression blocks 150 may be needed to store geometric data for the geometric primitives (e.g. triangles) preceding, in order, the geometric primitive identified by the global identifier number. Such variability depends on the actual geometric data values and their compressibility.
  • geometric primitives e.g. triangles
  • indirection block 182 ( 4 ) includes payload bits PB[0] through PB[P ⁇ 1], corresponding to global identifiers 4P through 5P ⁇ 1. To map an arbitrary global identifier in the range 4P to 5P ⁇ 1, the payload bits of indirection block 182 ( 4 ) need to be examined along with header field 184 .
  • a compression block 150 is identified by compression block index subfield 187 . Global identifiers ranging from 4P to the first occurrence of a payload bit value of one (“1”) map into this identified compression block 150 .
  • Local index subfield 188 indicates how many global identifiers are mapped to the identified compression block 150 from a prior indirection block 182 . Thus, local index subfield 188 provides an offset for locating geometric data within the identified compression block 150 .
  • compression block index subfield 187 contains the value one-hundred ninety-seven (“197”) and local index field 188 contains the value three (“3”).
  • a global identifier with value 4P maps to compression block number “197”, with a local index of “3”.
  • geometric objects 0, 1, and 2 stored in compression block number “197” are associated with a previous indirection block 182 mapping.
  • global identifiers 4P through 4P+4 are mapped to compression block number “197” with corresponding local index values of “3” through “7”, respectively.
  • Global identifiers in the range 4P+5 through 4P ⁇ 13 map to compression block number “198” with local index values “0” through “7”, respectively.
  • global identifier 4P+9 may be assigned a local index value of “4”.
  • Global identifier 4P+14 maps to compression block number “199”, and so forth.
  • method 170 performs step 175 in the context of the above description for indirection data structure 180 and indirection block 182 .
  • FIG. 1G illustrates a geometric data processing system 190 configured to decompress geometric data from compression blocks residing within memory, in accordance with one embodiment.
  • geometric data processing system 190 comprises a decompression unit 196 coupled to a processing unit 198 and to a memory interface 194 , which may be further coupled to a memory subsystem 192 .
  • one or more of the decompression unit 196 , the processing unit 198 , and the memory interface 194 is realized by reconfigurable logic that may include (but is not restricted to) FPGAs.
  • Processing unit 198 may include a multi-threaded processor, such as a multi-threaded processor comprising a graphics processing unit (GPU).
  • GPU graphics processing unit
  • processing unit 198 is configured to perform graphics rendering based on ray-tracing of scene data comprising triangles that are organized within a BVH.
  • Data for the triangles may be stored in a compressed format within compression blocks 150 , as described in FIG. 1C .
  • certain data for the triangles within the BVH may be stored in an uncompressed format within compression blocks 140 , as described in FIG. 1B .
  • Triangles stored in compression blocks 140 may be poorly suited for compression using the techniques disclosed herein, while triangles stored in compression blocks 150 may be more suitable for compression.
  • Processing unit 198 may generate access requests 195 to receive decompressed triangle data 197 corresponding to compressed triangle vertex data residing within compression blocks 150 .
  • Access requests 195 may comprise a global triangle index per triangle requested.
  • an access request may include a compression block index and a local index for embodiments where direct access to compression blocks is provided without indirection.
  • Decompressed triangle data 197 may comprise three-dimensional vertex position information represented as numeric values in each of three dimensions. As discussed previously, the numeric values may be represented as floating-point numbers.
  • Decompression unit 196 may perform method 170 to identify a specific compression block 150 as an access request target based on a global triangle index. Method 170 may access indirection data structure 180 to identify the specific compression block 150 . Decompression unit 196 may then perform method 100 to decompress vertex data to generate decompressed triangle data 197 .
  • Memory interface 194 may operate to receive access requests from decompression unit 196 and generate appropriate media-specific signals 193 , such as DRAM control protocol signals for accessing memory subsystem 192 .
  • decompression unit 196 resides within a memory control subsystem, such as a memory partition unit 280 (U) of FIG. 2 .
  • Memory interface 194 may further include an additional port (not shown) for receiving conventional access requests from processing unit 198 .
  • Memory interface 194 may include cache memory for caching blocks of data residing within memory subsystem 192 .
  • decompression unit 196 and processing unit 198 may each include cache memory for caching related data, such as decompressed triangle data 197 .
  • FIG. 2 illustrates a parallel processing unit (PPU) 200 , in accordance with one embodiment.
  • the PPU 200 is a multi-threaded processor that is implemented on one or more integrated circuit devices.
  • the PPU 200 is a latency hiding architecture designed to process a large number of threads in parallel.
  • a thread i.e., a thread of execution
  • the PPU 200 is a graphics processing unit (GPU) configured to implement a graphics rendering pipeline for processing three-dimensional (3D) graphics data in order to generate two-dimensional (2D) image data for display on a display device such as a liquid crystal display (LCD) device.
  • GPU graphics processing unit
  • the PPU 200 may be utilized for performing general-purpose computations. While one exemplary parallel processor is provided herein for illustrative purposes, it should be strongly noted that such processor is set forth for illustrative purposes only, and that any processor may be employed to supplement and/or substitute for the same.
  • the PPU 200 includes an Input/Output (I/O) unit 205 , a host interface unit 210 , a front end unit 215 , a compute scheduler unit (CSU) 220 , a compute work distribution unit (CWDU) 225 , a graphics primitive distribution unit (GPDU) 230 , a hub 235 , a crossbar (Xbar) 270 , one or more general processing clusters (GPCs) 250 , and one or more memory partition units 280 .
  • the PPU 200 may be connected to a host processor or other peripheral devices via a system bus 202 .
  • the PPU 200 may also be connected to a local memory comprising a number of memory devices 204 .
  • the local memory may comprise a number of dynamic random access memory (DRAM) devices.
  • DRAM dynamic random access memory
  • the I/O unit 205 is configured to transmit and receive communications (i.e., commands, data, etc.) from a host processor (not shown) over the system bus 202 .
  • the I/O unit 205 may communicate with the host processor directly via the system bus 202 or through one or more intermediate devices such as a memory bridge.
  • the I/O unit 205 implements a Peripheral Component Interconnect Express (PCIe) interface for communications over a PCIe bus.
  • PCIe Peripheral Component Interconnect Express
  • the I/O unit 205 may implement other types of well-known interfaces for communicating with external devices.
  • the I/O unit 205 is coupled to a host interface unit 210 that decodes packets received via the system bus 202 .
  • the packets represent commands configured to cause the PPU 200 to perform various operations.
  • the host interface unit 210 transmits the decoded commands to various other units of the PPU 200 as the commands may specify. For example, some commands may be transmitted to the front end unit 215 . Other commands may be transmitted to the hub 235 or other units of the PPU 200 such as one or more copy engines, a video encoder, a video decoder, a power management unit, etc. (not explicitly shown).
  • the host interface unit 210 is configured to route communications between and among the various logical units of the PPU 200 .
  • a program executed by the host processor encodes a command stream in a buffer that provides workloads to the PPU 200 for processing.
  • a workload may comprise a number of instructions and pointers to data to be processed by those instructions.
  • the buffer is a region in a memory that is accessible (i.e., read/write) by both the host processor and the PPU 200 .
  • the host interface unit 210 may be configured to access the buffer in a system memory connected to the system bus 202 via memory requests transmitted over the system bus 202 by the I/O unit 205 .
  • the host processor writes the command stream to the buffer and then transmits a pointer to the start of the command stream to the PPU 200 .
  • the host interface unit 210 manages the scheduling of instructions from one or more command streams written by the host processor (i.e., channels) on the various sub-units of the PPU 200 .
  • the front end unit 215 receives instructions from the host interface unit 210 from one or more command streams and forwards those instructions to the correct sub-unit of the PPU 200 . Instructions associated with a compute pipeline may be received by the front end unit 215 . These compute instructions are then forwarded to a compute scheduler unit 220 .
  • the compute scheduler unit 220 is configured to track state information related to the various tasks managed by the compute scheduler unit 220 . The state may indicate which GPC 250 a task is assigned to, whether the task is active or inactive, a priority level associated with the task, and so forth.
  • the compute scheduler unit 220 manages the execution of a plurality of tasks on the one or more GPCs 250 .
  • the compute scheduler unit 220 is coupled to a compute work distribution unit 225 that is configured to dispatch tasks for execution on the GPCs 250 .
  • the compute work distribution unit 225 may track a number of scheduled tasks received from the compute scheduler unit 220 .
  • the compute work distribution unit 225 manages a pending task pool and an active task pool for each of the GPCs 250 .
  • the pending task pool may comprise a number of slots (e.g., 16 slots) that contain tasks assigned to be processed by a particular GPC 250 .
  • the active task pool may comprise a number of slots (e.g., 4 slots) for tasks that are actively being processed by the GPCs 250 .
  • a GPC 250 finishes the execution of a task, that task is evicted from the active task pool for the GPC 250 and one of the other tasks from the pending task pool is selected and scheduled for execution on the GPC 250 . If an active task has been idle on the GPC 250 , such as while waiting for a data dependency to be resolved, then the active task may be evicted from the GPC 250 and returned to the pending task pool while another task in the pending task pool is selected and scheduled for execution on the GPC 250 .
  • instructions associated with a graphics pipeline may be received by the front end unit 215 . These graphics instructions are then forwarded to a graphics primitive distribution unit 230 .
  • the graphics primitive distribution unit 230 fetches vertex data from the memory 204 or the system memory via the system bus 202 for various graphics primitives. Graphics primitives may include points, lines, triangles, quads, triangle strips, and the like.
  • the graphics primitive distribution unit 230 groups the vertices into batches of primitives and dispatches tasks to the GPCs 250 for processing the batches of primitives.
  • Processing may involve executing a shader (i.e., a Vertex Shader, Tessellation Shader, Geometry Shader, etc.) on a programmable processing unit as well as performing fixed function operations on the vertices such as clipping, culling, and viewport transformation using a fixed function unit.
  • a shader i.e., a Vertex Shader, Tessellation Shader, Geometry Shader, etc.
  • the compute work distribution unit 225 and the graphics primitive distribution unit 230 communicate with the one or more GPCs 250 via a XBar 270 .
  • the XBar 270 is an interconnect network that couples many of the units of the PPU 200 to other units of the PPU 200 .
  • the XBar 270 may be configured to couple the compute work distribution unit 225 to a particular GPC 250 .
  • one or more other units of the PPU 200 are coupled to the host unit 210 .
  • the other units may also be connected to the XBar 270 via a hub 235 .
  • the tasks associated with the compute pipeline are managed by the compute scheduler unit 220 and dispatched to a GPC 250 by the compute work distribution unit 225 .
  • the tasks associated with the graphics pipeline are managed and distributed to a GPC 250 by the graphics primitive distribution unit 230 .
  • the GPC 250 is configured to process the tasks and generate results.
  • the results may be consumed by other tasks within the GPC 250 , routed to a different GPC 250 via the XBar 270 , or stored in the memory 204 .
  • the results can be written to the memory 204 via the memory partition units 280 , which implement a memory interface for reading and writing data to/from the memory 204 .
  • the PPU 200 includes a number U of memory partition units 280 that is equal to the number of separate and distinct memory devices 204 coupled to the PPU 200 .
  • a host processor executes a driver kernel that implements an application programming interface (API) that enables one or more applications executing on the host processor to schedule operations for execution on the PPU 200 .
  • An application may generate instructions (i.e., API calls) that cause the driver kernel to generate one or more tasks for execution by the PPU 200 .
  • the driver kernel outputs tasks to one or more streams being processed by the PPU 200 .
  • Each task may comprise one or more groups of related threads, referred to herein as a warp.
  • a thread block may refer to a plurality of groups of threads including instructions to perform the task. Threads in the same thread block may exchange data through shared memory.
  • a warp comprises 32 related threads.
  • FIG. 3 illustrates a GPC 250 of the PPU 200 of FIG. 2 , in accordance with one embodiment.
  • each GPC 250 includes a number of hardware units for processing tasks.
  • each GPC 250 includes a pipeline manager 310 , a pre raster operations unit (PROP) 315 , a raster engine 325 , a work distribution crossbar (WDX) 380 , a memory management unit (MMU) 390 , and one or more Texture Processing Clusters (TPCs) 320 .
  • PROP pre raster operations unit
  • WDX work distribution crossbar
  • MMU memory management unit
  • TPCs Texture Processing Clusters
  • the operation of the GPC 250 is controlled by the pipeline manager 310 .
  • the pipeline manager 310 manages the configuration of the one or more TPCs 320 for processing tasks allocated to the GPC 250 .
  • the pipeline manager 310 may configure at least one of the one or more TPCs 320 to implement at least a portion of a graphics rendering pipeline.
  • a TPC 320 may be configured to execute a vertex shader program on the programmable streaming multiprocessor (SM) 340 .
  • the pipeline manager 310 may also be configured to route packets received from the Xbar 270 to the appropriate logical units within the GPC 250 . For example, some packets may be routed to fixed function hardware units in the PROP 315 and/or raster engine 325 while other packets may be routed to the TPCs 320 for processing by the primitive engine 335 or the SM 340 .
  • the PROP unit 315 is configured to route data generated by the raster engine 325 and the TPCs 320 to a Raster Operations (ROP) unit in the memory partition unit 280 , described in more detail below.
  • the PROP unit 315 may also be configured to perform optimizations for color blending, organize pixel data, perform address translations, and the like.
  • the raster engine 325 includes a number of fixed function hardware units configured to perform various raster operations.
  • the raster engine 325 includes a setup engine, a coarse raster engine, a culling engine, a clipping engine, a fine raster engine, and a tile coalescing engine. Primitives lying outside a viewing frustum may be clipped by the clipping engine.
  • the setup engine receives transformed vertices that lie within the viewing plane and generates edge equations associated with the geometric primitive defined by the vertices. The edge equations are transmitted to the coarse raster engine to determine the set of pixel tiles covered by the primitive.
  • the output of the coarse raster engine may be transmitted to the culling engine where tiles associated with the primitive that fail a hierarchical z-test are culled. Those fragments that survive culling may be passed to a fine raster engine to generate coverage information (e.g., a coverage mask for each tile) based on the edge equations generated by the setup engine.
  • the output of the raster engine 380 comprises fragments to be processed, for example, by a fragment shader implemented within a TPC 320 .
  • Each TPC 320 included in the GPC 250 includes an M-Pipe Controller (MPC) 330 , a primitive engine 335 , an SM 340 , and one or more texture units 345 .
  • the MPC 330 controls the operation of the TPC 320 , routing packets received from the pipeline manager 310 to the appropriate units in the TPC 320 . For example, packets associated with a vertex may be routed to the primitive engine 335 , which is configured to fetch vertex attributes associated with the vertex from the memory 204 . In contrast, packets associated with a shader program may be transmitted to the SM 340 .
  • MPC M-Pipe Controller
  • the texture units 345 are configured to load texture maps (e.g., a 2D array of texels) from the memory 204 and sample the texture maps to produce sampled texture values for use in shader programs executed by the SM 340 .
  • the texture units 345 implement texture operations such as filtering operations using mip-maps (i.e., texture maps of varying levels of detail).
  • each TPC 320 includes two (2) texture units 345 .
  • the SM 340 comprises a programmable streaming processor that is configured to process tasks represented by a number of threads. Each SM 340 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular group of threads concurrently. In one embodiment, the SM 340 implements a SIMD (Single-Instruction, Multiple-Data) architecture where each thread in a group of threads (i.e., a warp) is configured to process a different set of data based on the same set of instructions. All threads in the group of threads execute the same instructions.
  • SIMD Single-Instruction, Multiple-Data
  • the SM 340 implements a SIMT (Single-Instruction, Multiple Thread) architecture where each thread in a group of threads is configured to process a different set of data based on the same set of instructions, but where individual threads in the group of threads are allowed to diverge during execution.
  • SIMT Single-Instruction, Multiple Thread
  • some threads in the group of threads may be active, thereby executing the instruction, while other threads in the group of threads may be inactive, thereby performing a no-operation (NOP) instead of executing the instruction.
  • NOP no-operation
  • the MMU 390 provides an interface between the GPC 250 and the memory partition unit 280 .
  • the MMU 390 may provide translation of virtual addresses into physical addresses, memory protection, and arbitration of memory requests.
  • the MMU 390 provides one or more translation lookaside buffers (TLBs) for improving translation of virtual addresses into physical addresses in the memory 204 .
  • TLBs translation lookaside buffers
  • the PPU 200 described above may be configured to perform highly parallel computations much faster than conventional CPUs.
  • Parallel computing has advantages in graphics processing, data compression, biometrics, stream processing algorithms, and the like.
  • the PPU 200 comprises a graphics processing unit (GPU).
  • the PPU 200 is configured to receive commands that specify shader programs for processing graphics data.
  • Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like.
  • a primitive includes data that specifies a number of vertices for the primitive (e.g., in a model-space coordinate system) as well as attributes associated with each vertex of the primitive.
  • the PPU 200 can be configured to process the graphics primitives to generate a frame buffer (i.e., pixel data for each of the pixels of the display).
  • An application writes model data for a scene (i.e., a collection of vertices and attributes) to a memory such as a system memory or memory 204 .
  • the model data defines each of the objects that may be visible on a display.
  • the application then makes an API call to the driver kernel that requests the model data to be rendered and displayed.
  • the driver kernel reads the model data and writes commands to the one or more streams to perform operations to process the model data.
  • the commands may reference different shader programs to be executed on the SMs 340 of the PPU 200 including one or more of a vertex shader, hull shader, domain shader, geometry shader, and a pixel shader.
  • one or more of the SMs 340 may be configured to execute a vertex shader program that processes a number of vertices defined by the model data.
  • the different SMs 340 may be configured to execute different shader programs concurrently.
  • a first subset of SMs 340 may be configured to execute a vertex shader program while a second subset of SMs 340 may be configured to execute a pixel shader program.
  • the first subset of SMs 340 processes vertex data to produce processed vertex data and writes the processed vertex data to the L2 cache 360 and/or the memory 204 .
  • the second subset of SMs 340 executes a pixel shader to produce processed fragment data, which is then blended with other processed fragment data and written to the frame buffer in memory 204 .
  • the vertex shader program and pixel shader program may execute concurrently, processing different data from the same scene in a pipelined fashion until all of the model data for the scene has been rendered to the frame buffer. Then, the contents of the frame buffer are transmitted to a display controller for display on a display device.
  • the PPU 200 may be included in a desktop computer, a laptop computer, a tablet computer, a smart-phone (e.g., a wireless, hand-held device), personal digital assistant (PDA), digital camera, a hand-held electronic device, and the like.
  • the PPU 200 is embodied on a single semiconductor substrate.
  • the PPU 200 is included in a system-on-a-chip (SoC) along with one or more other logic units such as a reduced instruction set computer (RISC) CPU, a memory management unit (MMU), a digital-to-analog converter (DAC), and the like.
  • SoC system-on-a-chip
  • the PPU 200 may be included on a graphics card that includes one or more memory devices 204 such as GDDR5 SDRAM.
  • the graphics card may be configured to interface with a PCIe slot on a motherboard of a desktop computer that includes, e.g., a northbridge chipset and a southbridge chipset.
  • the PPU 200 may be an integrated graphics processing unit (iGPU) included in the chipset (i.e., Northbridge) of the motherboard.
  • iGPU integrated graphics processing unit
  • TPC 320 includes one or more tree traversal units (TTUs) 395 , in accordance with one embodiment.
  • the TTUs 395 are each configured to perform tree traversal operations. Tree traversal operations are commonly utilized in, for example, ray tracing algorithms in computer graphics. However, the TTUs 395 may be optimized for general tree traversal operations and are not limited, specifically, to ray tracing techniques.
  • each TPC 320 included in the PPU 200 may include one or more TTUs 395 for performing tree traversal operations.
  • the TTUs 395 are coupled to the SM 340 similar to the texture units 345 .
  • one or more TTUs 395 may be implemented within the PPU 200 and shared by one or more GPCs 250 or one or more SMs 340 .
  • a tree traversal operation may include any operation performed by traversing the nodes of a tree data structure.
  • a tree data structure may include, but is not limited to, a binary tree, an octree, a four-ary tree, a k-d tree, a binary space partitioning (BSP) tree, and a bounding volume hierarchy (BVH) tree.
  • the tree traversal operation includes a number of instructions for intersection a query shape with the tree.
  • the query shapes may be, e.g., rays, bounding boxes, frustums, cones, spheres, and the like.
  • a query shape may be specified by a query data structure.
  • the query data structure may include any technically feasible technique for specifying the query shape to intersect with the tree.
  • the query data structure may specify the starting and ending points of a ray using two three-coordinate vectors.
  • the query data structure may specify the six planes of an axis-aligned bounding box using six 32-bit floating point coordinates.
  • the various query data structures may include any number of fields for specifying the attributes of the query shape.
  • one type of tree traversal operation for which the TTU 395 may be optimized is to intersect a ray with a BVH data structure that represents each of the geometric primitives in a 3D scene or 3D model.
  • the TTU 395 may be particularly useful in ray-tracing applications in which millions or even billions of rays are intersected with the geometric primitives of a 3D model represented by a BVH data structure.
  • FIG. 4 illustrates an exemplary system 400 in which the various architecture and/or functionality of the various previous embodiments may be implemented.
  • a system 400 is provided including at least one central processor 401 that is connected to a communication bus 402 .
  • the communication bus 402 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point communication protocol(s).
  • the system 400 also includes a main memory 404 . Control logic (software) and data are stored in the main memory 404 which may take the form of random access memory (RAM).
  • RAM random access memory
  • the system 400 also includes input devices 412 , a graphics processor 406 , and a display 408 , i.e. a conventional CRT (cathode ray tube), LCD (liquid crystal display), LED (light emitting diode), plasma display or the like.
  • User input may be received from the input devices 412 , e.g., keyboard, mouse, touchpad, microphone, and the like.
  • the graphics processor 406 may include a plurality of shader modules, a rasterization module, etc. Each of the foregoing modules may even be situated on a single semiconductor platform to form a graphics processing unit (GPU).
  • GPU graphics processing unit
  • a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.
  • CPU central processing unit
  • the system 400 may also include a secondary storage 410 .
  • the secondary storage 410 includes, for example, a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory.
  • the removable storage drive reads from and/or writes to a removable storage unit in a well-known manner.
  • Computer programs, or computer control logic algorithms may be stored in the main memory 404 and/or the secondary storage 410 . Such computer programs, when executed, enable the system 400 to perform various functions.
  • the memory 404 , the storage 410 , and/or any other storage are possible examples of non-transitory computer-readable media.
  • the architecture and/or functionality of the various previous figures may be implemented in the context of the central processor 401 , the graphics processor 406 , an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the central processor 401 and the graphics processor 406 , a chipset (i.e., a group of integrated circuits designed to work and sold as a unit for performing related functions, etc.), and/or any other integrated circuit for that matter.
  • a chipset i.e., a group of integrated circuits designed to work and sold as a unit for performing related functions, etc.
  • the architecture and/or functionality of the various previous figures may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system, and/or any other desired system.
  • the system 400 may take the form of a desktop computer, laptop computer, server, workstation, game consoles, embedded system, and/or any other type of logic.
  • the system 400 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a television, etc.
  • PDA personal digital assistant
  • system 400 may be coupled to a network (e.g., a telecommunications network, local area network (LAN), wireless network, wide area network (WAN) such as the Internet, peer-to-peer network, cable network, or the like) for communication purposes.
  • a network e.g., a telecommunications network, local area network (LAN), wireless network, wide area network (WAN) such as the Internet, peer-to-peer network, cable network, or the like

Abstract

An apparatus, computer readable medium, and method are disclosed for decompressing compressed geometric data stored in a lossless compression format. The compressed geometric data resides within a compression block sized according to a system cache line. An indirection technique maps a global identifier value in a linear identifier space to corresponding variable rate compressed data. The apparatus may include decompression circuitry within a graphics processing unit configured to perform ray-tracing.

Description

    CLAIM OF PRIORITY
  • This application claims the benefit of U.S. Provisional Application No. 62/046,093 titled “Bounding Volume Hierarchy Representation and Traversal,” filed Sep. 4, 2014, the entire contents of which are incorporated herein by reference.
  • FIELD OF THE INVENTION
  • The present invention relates to numerical geometric data representation, and more particularly to block-based lossless compression and decompression of numerical geometric data.
  • BACKGROUND
  • Three-dimensional (3D) computer graphics rendering techniques may generate a two-dimensional (2D) representation of a 3D scene. A given 3D scene is typically represented as a collection of geometric primitives (e.g., points, lines, triangles, quads, meshes, etc.). Each geometric primitive may include vertex information represented as floating-point values. For example, a triangle primitive may include three vertices, and each one of the three vertices may include a 3D coordinate represented as an ordered set of three floating-point values.
  • Object-based rasterization and ray tracing are two commonly implemented techniques for generating a 2D representation of a 3D scene. Both techniques frequently access geometric primitive data stored in memory and generate intensive memory bandwidth demands. Because the number of geometric primitives in a typical scene may be quite large (e.g., on the order of many millions of triangles, etc.), memory bandwidth limitations may constrain overall rendering performance. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.
  • SUMMARY
  • An apparatus, computer readable medium, and method are disclosed for generating decompressed geometric data from a compression block. The method comprises receiving a compression block configured to store a header and compressed geometric data for at least two geometric primitives and identifying a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives, based on a first local index. The method also includes generating a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data, based on at least a first anchor value, where the first set of decompressed geometric data comprises more bits of data than the first set of compressed geometric data.
  • The apparatus may comprise circuitry within a processing unit, such as a graphics processing unit (GPU), or a parallel processing unit, decompression unit, or memory interface unit therein. The apparatus may include circuitry to implement one or more decompression techniques for decompressing vertex information associated with triangle primitives. Other embodiments include software, hardware, and systems configured to perform method steps for generating decompressed geometric data from the compression block.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1A illustrates a flowchart of a method for generating decompressed geometric data from a compression block, in accordance with one embodiment;
  • FIG. 1B illustrates a compression block structure configured to store uncompressed triangle data, in accordance with one embodiment;
  • FIG. 1C illustrates a compression block structure configured to store compressed triangle data, in accordance with one embodiment;
  • FIG. 1D illustrates a flowchart of a method for identifying a compression block based on a global identifier, in accordance with one embodiment;
  • FIG. 1E illustrates an indirection data structure comprising a plurality of indirection blocks, in accordance with one embodiment;
  • FIG. 1F illustrates an exemplary structure of an indirection block, in accordance with one embodiment;
  • FIG. 1G illustrates a geometric data processing system configured to decompress geometric data from a compression block residing within memory, in accordance with one embodiment;
  • FIG. 2 illustrates a parallel processing unit, in accordance with one embodiment;
  • FIG. 3 illustrates a general processing cluster of the parallel processing unit of FIG. 2, in accordance with one embodiment; and
  • FIG. 4 illustrates an exemplary system in which the various architecture and/or functionality of the various previous embodiments may be implemented.
  • DETAILED DESCRIPTION
  • Three-dimensional (3D) graphics rendering techniques typically represent a 3D scene as a collection of geometric primitives. Each geometric primitive may include geometric data such as vertex coordinates, texture coordinates, or any other technically relevant information. The collection of geometric primitives may be stored in a memory subsystem and accessed from the memory subsystem to render the scene. In certain embodiments of the present invention, scene rendering is performed, at least in part, by a graphics processing unit (GPU), and the collection of geometric primitives representing a given 3D scene is stored in a memory subsystem coupled to the GPU.
  • Geometric data for one or more geometric primitives may be stored within a compression block. Each compression block may correspond in size to a cache line within the GPU. The collection of geometric primitives for the 3D scene may be stored in a plurality of compression blocks, with a variable number of geometric primitives stored in any one compression block. The number of geometric primitives stored within a given compression block is a function of data similarity of geometric data values for associated geometric primitives. The compression blocks may be identified by a compression block number, with sequential compression blocks having corresponding sequential compression block numbers. Furthermore, sequential compression blocks may provide storage for sequentially identified geometric primitives.
  • Each geometric primitive may be identified using a unique identifier, such as a unique thirty-two bit integer value. The unique identifier may also be described as a global identifier because each value is globally unique within an identifier space for primitives. Accessing data for a given geometric primitive specified by an associated identifier involves first locating an appropriate compression block within the memory subsystem where the geometric primitive resides. A mapping data structure may be constructed to locate the appropriate compression block and data for the geometric primitive. The mapping data structure accounts for the variable number of geometric primitives stored in each compression block associated with the 3D scene.
  • Rendering techniques based on ray tracing may organize 3D primitives occupying a 3D space using a bounding volume hierarchy (BVH), a data structure designed to efficiently encode spatial relationships among 3D objects comprising sets of 3D primitives. Each 3D primitive within the BVH may be represented as a hounding volume, such as an axis-aligned bounding box (AABB), defined by a pair of bounding planes in each of three dimensions. Geometric primitives within a given AABB may include spatially similar coordinate positions and corresponding numeric representations of associated geometric data, such as vertex coordinates, may include similar bit patterns. In certain usage cases, the similar coordinate positions may align to powers of two fractional increments corresponding to an authoring tool grid resolution. Furthermore, geometric primitives associated with fans or meshes may include common vertex coordinates. Similar and common numeric information associated with geometric primitives may be identified as the basis for compression of the numeric information.
  • In one embodiment of the present invention geometric data stored within a compression block is decompressed. Certain embodiments of the present invention implement logic circuitry within the GPU that receives a primitive identifier associated with a geometric primitive and returns geometric data for the geometric primitive. The logic circuitry may be associated with a memory controller or a processing core within the to provide transparent decompression of geometric data. Compression of uncompressed geometric data may be implemented using any technically feasible technique that generates suitably formatted compression blocks.
  • FIG. 1A illustrates a flowchart of a method 100 for generating decompressed geometric data from a compression block, in accordance with one embodiment. Although method 100 is described in conjunction with the systems of FIGS. 1G-4, persons of ordinary skill in the art will understand that any system that performs method 100 is within the scope of embodiments of the present invention. In one embodiment, a decompression unit, such as decompression unit 196 of FIG. 1G is configured to perform method 100. The decompression unit may reside within memory partition unit 280 of FIG. 2, or within any other technically feasible circuitry associated with parallel processing unit (PPU) 200 of FIG. 2. In other embodiments, the decompression unit may reside within any technically feasible functional unit or units associated with a computer system architecture. The decompression unit may be implemented using function-specific logic circuitry, such as a function-specific portion of a processing pipeline configured to perform at least method 100. Alternatively, the decompression unit may be implemented as instructions or microcode for controlling a processing unit. The instructions may be encoded within non-transitory computer-readable medium such as a read-only solid-state memory or a programmable solid-state flash memory.
  • Method 100 begins at step 102, where the decompression unit receives a compression block configured to store a header and compressed geometric data for at least two geometric primitives. Each of the at least two geometric primitives is associated with a local index within the compression block. The local index may be determined based on a global identifier of the primitive that uniquely identifies a geometric primitive within a set of geometric primitives that collectively define a 3D scene.
  • The header may include at least one mode bit that indicates whether geometric data within the compression block is stored in an uncompressed format or in a compressed format. In certain cases, it may be desirable to store the geometric data in an uncompressed format. The uncompressed format may be compatible with a compressed format for representing other geometric data that is compressed. In other cases, the geometric data compresses according to a data-dependent compression ratio, allowing geometric data representing a variable number of geometric primitives to be stored within the compression block. An uncompressed format for representing geometric data is described in more detail in conjunction with FIG. 1B and a compressed format for representing geometric data is described in more detail in conjunction with FIG. 1C. In one embodiment, method 100 is applied to data in the compressed format illustrated in FIG. 1C.
  • Multiple compression blocks may be stored in a memory subsystem, and each of the multiple compression blocks may represent geometric data in the compressed format or the uncompressed format, as indicated by the at least one mode bit. Each of the multiple compression blocks may include geometric data for multiple geometric primitives, such as triangles. All geometric data for any one geometric primitive (e.g. one triangle) may reside entirely within one associated compression block. In the compressed format, geometric data for a varying number of geometric primitives may reside within the compression block. Consequently, geometric data for a specific geometric primitive may be located at a variable location within the compression block. In one embodiment, the variable location is a function of the number of geometric primitives represented within the compression block. The variable location, along with location information for geometric data associated with other geometric primitives within the compression block is recorded within a topology field of the compression block.
  • At step 104, based on the first local index, the decompression unit identifies a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives. In one embodiment, the first local index is received in conjunction with receiving the compression block. In one embodiment, the first geometric primitive is a triangle and the first set of compressed geometric data comprises three vertex positions, each of which includes a three-dimensional coordinate.
  • Each three-dimensional coordinate may include three floating-point values, which may be stored in a compressed format. Each of the three floating-point values may be stored using a compressed representation of a thirty-two bit floating-point encoding. Alternatively, each three-dimensional coordinate may include three fixed-point values, three integer values, or three values defined by any technically feasible numeric representation, any of which may be stored in a compressed format. A second local index may be received in conjunction with receiving the compression block for identifying a second set of compressed geometric data for a second geometric primitive. One or more vertex positions associated with the second geometric primitive may be represented as references to equivalent vertex positions associated with the first geometric primitive.
  • At step 106, the decompression unit generates a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data based on at least a first anchor value. In one embodiment, the first anchor value is one of three anchor values of a three-dimensional anchor position. Each one of the three anchor values may correspond to one of the dimensions of the three-dimensional anchor position. Additional geometric primitive vertex positions may be represented using three-dimensional offsets relative to the three-dimensional anchor position. In one embodiment, the anchor position may serve as one vertex position (e.g. vertex position zero), while other vertex positions are defined as offsets relative to the anchor position. In such an embodiment, the first set of decompressed geometric data includes three vertex positions, each comprising a three-dimensional position. Each of the three vertex positions may be represented within the compression block as a three-dimensional offset position relative to the three-dimensional anchor position. Each three-dimensional offset position may be represented as a set of compressed numeric values, and each of the compressed numeric values may be compressed according to a different compression ratio.
  • FIG. 1B illustrates a compression block 140 configured to store uncompressed triangle data, in accordance with one embodiment. In such an embodiment, compression block 140 includes one-thousand twenty-four (1024 or 2̂10) bits, starting at bit 0 and ending at bit 1023. As shown, compression block 140 includes a header field 148, a triangle 0 field 142, a triangle 1 field 144, and a triangle 2 field 146. Each field 142, 144, 146, 148 includes subfields, and each subfield is labeled with a hit count in a second line. For example, the subfield labeled “Mode” of header field 148 includes three bits, as indicated by the “3” on the second line for the subfield. The mode subfield specifies how to interpret other bits within compression block 140. At least one of the possible eight different bit codes for the mode subfield specifies that compression block 140 should be interpreted as shown here, having data for three different triangles stored in an uncompressed format.
  • Header 148 may also include three alpha (α) bits, an MD2 subfield having 32 bits, an MD1 subfield having 32 bits, and an MD0 subfield having 32 bits. In one embodiment, subfield MD2 stores an application-specific triangle metadata value associated with a triangle 2, subfield MD1 stores a triangle metadata value associated with a triangle 1, and subfield MD0 stores a triangle metadata value associated with triangle 0. Furthermore, each of the three alpha bits may indicate whether a corresponding triangle (e.g., triangle 2, triangle 1, triangle 0) is fully opaque (or, alternatively, partially transparent).
  • Geometric data for triangle 0 is stored within triangle 0 field 142, geometric data for triangle 1 is stored within triangle 1 field 144, and geometric data for triangle 2 is stored within triangle 2 field 146. As shown, triangle 0 field 142 includes three vertices. A first of the three vertices may include coordinates (X0, Y0, Z0), specified by corresponding 32-bit values. A second of the three vertices may include coordinates (X1, Y1, Z1), specified by corresponding 32-bit values. A third of the three vertices may include coordinates (X2, Y2, Z2), specified by corresponding 32-bit values.
  • In other embodiments, compression block 140 may include a different number of bits specified as a power of two, such as 512 (2̂9) bits, 2048 (2̂11) bits, or 4096 (2̂12) bits. In alternative embodiments, compression block 140 may include a number of bits that is not an integer power of two. In certain embodiments, the number of bits included within compression block 140 corresponds to the number of bits included within a cache line tier an associated processing unit. Compression block 140 is structured to be compatible with other formats that store geometric data in a compressed format, as illustrated below in FIG. 1C.
  • FIG. 1C illustrates a compression block 150 configured to store compressed triangle data, in accordance with one embodiment. In such an embodiment, compression block 150 includes one-thousand twenty-four (1024 or 2̂10) bits, starting at bit 0 and ending at bit 1023. As shown, compression block 150 includes a vertex positions field 152, a topology field 154, and header field 156.
  • Header field 156 includes a precision subfield 160, a number of triangles subfield 161, a shift subfield 162, and a mode subfield 163. Consistent with compression block 140 of FIG. 1B, mode subfield 163 includes three bits and specifies how to interpret the remaining bits of compression block 150 from a set of enumerated compression block formats. Compression block 140 illustrates one such format, and compression block 150 illustrates another such format. Mode subfield 163 and the mode subfield of compression block 140 occupy the same upper bits (bit 1023, bit 1022, and bit 1021) of a 1024 bit compression block format. In other embodiments, mode subfield 163 may include a different number of bits.
  • Precision subfield 160 includes subfields P.X, P.Y, P.Z, and P.MD. In one embodiment, the subfields P.X, P.Y, P.Z, and P.MD of precision subfield 160 each include five bits. Precision subfield P.X specifies a number of bits for representing vertex position offsets in the x-dimension within compression block 150, precision subfield P.Y specifies a number of bits for representing vertex position offsets in the y-dimension within compression block 150, and precision subfield P.Z specifies a number of bits for representing vertex position offsets in the z-dimension within compression block 150. Precision subfield P.MD specifies a number of bits for a triangle metadata offset. The number of triangles stored within compression block 150 is indicated by the number of triangles subfield 161. Precision subfields P.X, P. Y, P.Z, and P.MD, along with number of triangles subfield 161 may store a given value represented by the value minus one. For example, to indicate eight bits of precision for position offsets in the x-dimension, precision subfield P.X may store a value of seven. In one embodiment, shift subfield 162 indicates the lowest bit position affected when position offsets 169 are combined with values in vertex position anchor subfield 167.
  • Vertex positions field 152 includes a vertex position anchor subfield 167 and a vertex position offset subfield 168. As shown, vertex position anchor subfield 167 includes subfields for X, Y, and Z. In one embodiment, vertex position anchor subfield 167 comprises the three-dimensional anchor position of FIG. 1A. Furthermore, each of the subfields X, Y, Z within the vertex position anchor subfield 167 may represent a thirty-two bit floating-point value. Three-dimensional position offsets 169 from the three-dimensional position anchor are represented by X, Y, and Z offsets within vertex position offset subfield 168. Each three-dimensional position offset 169 represents a corresponding vertex position within a three-dimensional space. Position offset 169(1) being associated with vertex position one, position offset 169(2) being associated with vertex position two, and so forth. Each position offset 169 may be combined to the three-dimensional anchor position to generate a corresponding vertex position. The three-dimensional anchor position may be associated with vertex position zero so that a reference to vertex position zero refers to the anchor position.
  • In one embodiment, a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by replacing the P.X lowest bits of the vertex position anchor x value by the vertex position offset x value 169, a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by replacing the P.Y lowest bits of the vertex position anchor y value by the vertex position offset y value 169, and a vertex position z coordinate is generated by combining a vertex position offset z value 169 with a vertex position anchor z value from vertex position anchor 167 by replacing the P.Z lowest bits of the vertex position anchor z value by the vertex position offset z value 169.
  • In another embodiment, a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by replacing bits SHIFT 162 . . . SHIFT+P.X.−1 of the vertex position anchor x value by the vertex position offset x value 169, a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by replacing bits SHIFT 162. SHIFT+P.Y−1 of the vertex position anchor y value by the vertex position offset y value 169, and a vertex position z coordinate is generated by combining a vertex position offset z value 169 with vertex position anchor z value from vertex position anchor 167 by replacing bits SHIFT 162 . . . SHIFT+P.Z−1 of the vertex position anchor z value by the vertex position offset z value 169.
  • In yet another embodiment, a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by performing a binary integer addition of vertex position anchor x value and the vertex position offset x value 169, a vertex position y coordinate is generated by combining a vertex position offset y value 169 with vertex position anchor y value 167 by performing a binary integer addition of vertex position anchor y value and vertex position offset y value 169, and a vertex position z coordinate is generated by combining a vertex position offset z value 169 with vertex position anchor z value 167 by performing a binary integer addition of the vertex position anchor z value and the vertex position offset z value 169. Optionally, the vertex position offset values 169 may be sign-extended to 32 bits before the binary integer addition is performed.
  • In still yet another embodiment, a vertex position x coordinate is generated by combining a vertex position offset x value 169 with a vertex position anchor x value from vertex position anchor 167 by performing a binary integer addition of the vertex position offset x value 169 shifted left by a number of bit positions specified by a shift value stored in the SHIFT subfield 162 and a vertex position anchor x value from vertex position anchor 167, a vertex position y coordinate is generated by combining a vertex position offset y value 169 with a vertex position anchor y value from vertex position anchor 167 by performing a binary integer addition of the vertex position offset y value 169 shifted left by a number of bit positions specified by a shift value stored in the SHIFT subfield 162 and a vertex position anchor y value from vertex position anchor 167, a vertex position z coordinate is generated by combining a vertex position offset z value 169 with a vertex position anchor z value from vertex position anchor 167 by performing a binary integer addition of the vertex position offset z value 169 shifted left by a number of bit positions specified by a shift value stored in the SHIFT subfield 162 and a vertex position anchor z value from vertex position anchor 167. Optionally, the vertex position offset values 169 may be sign-extended to 32 bits before the left shift and the binary integer addition are performed.
  • Each vertex position may be referenced by one or more triangles stored within the compression block 150. For example, a first triangle forming a quad may share two vertices with a second triangle forming the quad. If the first triangle and the second triangle are stored within the same compression block 150, then vertex position information for each of the two shared vertices need only be stored once within compression block 150. Vertex positions for the first triangle may include references to the two shared vertex positions as well as a reference to a third vertex position. Similarly, vertex positions for the second triangle may include references to the two shared vertex positions as well as a reference to a fourth vertex position. In total, the quad needs only four vertex positions represented within compression block 150 rather than six because two are shared.
  • Each dimension of each vertex position offset may include a different number of bits of precision. For example, the x-dimension offset may be specified by a number of bits shown as P.X, which corresponds to a value (stored as the value minus one) in the X subfield of precision subfield 160 of header field 156. Very different precision may be required in each dimension, based on triangle positions. In one data-dependent scenario, a set of vertex positions may be narrow in the x-dimension, but wider in the y-dimension and z-dimensions. In such a situation, the x-dimension may require fewer bits of precision to represent an offset from the vertex anchor without loss.
  • Topology field 154 associates triangles with vertex position data. Each triangle may be associated with an application-specific triangle metadata (MD) value. A triangle metadata anchor subfield 166 indicates an anchor value for triangle metadata values for triangles stored within compression block 150. In one embodiment, triangle metadata anchor subfield 166 includes a thirty-two bit value. Triangle metadata offset subfield 164 includes a set of offset values that may be used in conjunction with triangle metadata anchor subfield 166 for associating a metadata value for each triangle stored within compression block 150. Each metadata offset value includes a number of bits specified by the P.MD subfield of precision subfield 160. For example, if the P.MD subfield specifies five bits, then each subfield within the triangle metadata offset subfield 164 includes five bits. In such an example, thirty-two bit metadata values for triangles stored within compression block 150 may be represented using only five bits each rather than thirty-two bits each. Each triangle stored within compression block 150 may be identified by a local index. The metadata value for triangle zero (local index=0) may be equal to the value of metadata anchor field 166. A metadata value for each remaining triangle (local index=1, 2, . . . ) within compression block 150 may be calculated by combining the value of triangle metadata anchor subfield 166 and a corresponding metadata offset from triangle metadata offset subfield 164.
  • In one embodiment, vertex IDs subfield 165 includes an alpha (a) bit for each triangle within compression block 150 to indicate whether the triangle is fully opaque (or, alternatively, partially transparent). Furthermore, a set of three vertex indices is included within vertex IDs subfield 165 for each triangle 1 through M−1 within compression block 150. The three vertex indices of triangle 0 within compression block 150 may be fixed to values 0, 1, and 2. In one embodiment, each vertex index within a set of three vertex indices is allocated four bits (twelve bits per triangle), providing an index space for referencing sixteen different vertex positions. For a given triangle, a first vertex position is determined by a first vertex index into vertex positions field 152. A second vertex position is determined by a second vertex index into vertex positions field 152, and a third vertex position is determined by a third vertex index into vertex positions field 152.
  • In data-dependent scenarios where vertex positions are shared among triangles, as is common in meshes and fans, more triangles may fit within compression block 150 because fewer vertex positions may be needed per triangle. Furthermore, in scenarios where vertex positions may be represented as relatively small offsets to the vertex position anchor, fewer bits may be needed per vertex position offsets 169, and more triangles may fit within compression block 150. In certain scenarios, vertex positions are snapped to a grid, whereby lower mantissa values for the vertex positions are constant, thereby requiring fewer bits to represent position offsets 169. In each data-dependent scenario, subfields within 156 may be written to indicate an appropriate number of bits needed to represent vertex positions and an appropriate number of triangles. Furthermore, subfields within vertex position field 152 and topology field 154 are adjusted to be properly and contiguously packed. For example, triangle metadata offset subfield 164 includes (M−1)*P.MD bits, where M is a number of triangles and P.MD is the number of bits specified by the MI) subfield within the precision subfield 160. Furthermore, vertex IDs subfield 165 includes (M−1)*13+1 bits. Additionally, vertex position offset subfield 168 includes (N−1)*(P.X+P.Y+P.Z) bits, where N corresponds to the total number of vertex positions represented in the compression block 150. Consequently, a highly variable (three to sixteen) number of triangles may fit within compression block 150. In other embodiments, compression block 150 includes a larger number of bits (e.g. 2048, 4096), and more triangles may be stored therein.
  • The process of generating compression block 150 may be performed using any technically feasible technique. For example, in ray-tracing systems that implement a bounding volume hierarchy (BVH) tree, triangles are organized according to spatial locality. In such a system, generating a compression block 150 with candidate triangles for compression involves linearly scanning through a list of triangles in BVH leaf order and adding sequential triangles to a compression block until no more triangles can fit. That is, if T triangles may be successfully encoded into the compression block then encoding T+1 triangles is attempted. If encoding 1+1 succeeds, then encoding T+2 triangles is attempted, and so on. When encoding one more triangle fails, then the previous encoding is used. Each compression block 150 is self-contained in that vertex positions for each triangle are available. This approach preserves locality in that triangles that were near to each other in an uncompressed representation remain near each other after compression. In one embodiment, each compression block 150 is sized according to a host system's cache line size. In the above examples, this size is assumed to be 1024 bits, but the teachings disclosed herein will be readily understood by persons of ordinary skill in the art as being equally applicable to smaller cache lines (e.g., 512 bits), and larger cache lines (2048 bits, 4096 bits, or more).
  • Two different techniques may be used for encoding compression block 150. The first technique involves opportunistic indexing. The second technique is lossless delta encoding of vertex positions. Each of the two techniques may be implemented alone or in combination. In one embodiment, both techniques are implemented to generate compression block 150.
  • Opportunistic indexing involves checking whether a newly added triangle shares any vertices with any other triangles already added to compression block 150. If any vertices are shared, then the shared vertex positions are referenced rather than explicitly added as new vertex offset positions 169 when including the newly added triangle to compression block 150.
  • In one embodiment, lossless delta encoding involves encoding floating-point values associated with a particular dimension (x, y, z) relative to corresponding values for the anchor position. For example, encoding a new floating-point value associated with the x-dimension involves encoding the new value relative to a floating-point anchor position value for the x-dimension. If the new value and the anchor position value are close to each other, their binary representations typically differ only in some number of the lowest-order bits. For example, a bit-wise difference between two nearby floating-point values frequently requires less than twenty-three bits. The X subfield of precision subfield 160 (P.X) indicates how many bits are necessary to store all vertex position offsets in the x-dimension without loss for triangles stored within compression block 150. Similarly, the Y subfield of precision subfield 160 (P.Y) indicates how many bits are necessary to store all vertex position offsets in the y-dimension without loss for triangles stored within compression block 150, and the Z subfield of precision subfield 160 (P.Z) indicates how many bits are necessary to store all vertex position offsets in the z-dimension without loss. The value for each subfield P.X, P.Y, and P.Z is data-dependent and may vary accordingly.
  • While the embodiments described above illustrate an implementation for a floating-point geometric data type for vertex position information, persons skilled in the art will recognize that the teachings disclosed herein may also be applied equally to other data types such as fixed-point and integer data types.
  • The above technique for compressing vertex data into compression blocks generates compression blocks that contain a potentially variable number of vertices and triangles each. If random access to this data is needed, a mechanism for mapping global primitive indices to compression blocks and further to individual triangle primitives through local indices within the compression blocks is needed. This mapping is achieved using an indirection data structure that comprises a set of indirection blocks. Each such indirection block stores a header field and a payload field that includes one bit per triangle. The one bit per triangle indicates whether a corresponding triangle begins a new compression block. The header identifies a compression block by index and a local index within the identified compression block for the first triangle of the indirection block.
  • FIG. 1D illustrates a flowchart of a method 170 for identifying a compression block based on a global identifier, in accordance with one embodiment. Although method 170 is described in conjunction with the systems of FIGS. 1G-4, persons of ordinary skill in the art will understand that any system that performs method 170 is within the scope of embodiments of the present invention. In one embodiment, a decompression unit, such as decompression unit 196 of FIG. 1G is configured to perform method 170. The decompression unit may reside within memory partition unit 280 of FIG. 2, or within any other technically feasible circuitry associated with parallel processing unit (PPU) 200 of FIG. 2. In other embodiments, the decompression unit may reside within any technically feasible functional unit or units associated with a computer system architecture. The decompression unit may be implemented using function-specific logic circuitry, such as a function-specific portion of a processing pipeline configured to perform at least method 170. In one embodiment, the decompression unit is realized by reconfigurable logic that may include (but is not restricted to) field programmable gate arrays (FPGAs). Alternatively, the decompression unit may be implemented as instructions or microcode for controlling a processing unit. The instructions may be encoded within non-transitory computer-readable medium such as a read-only solid-state memory or a programmable solid-state flash memory.
  • Method 170 begins at step 172, where the decompression unit receives a global identifier associated with a geometric primitive. In one embodiment, the global identifier comprises a global triangle index and the geometric primitive comprises a triangle. At step 174, the decompression unit identifies an indirection block based on the global identifier. At step 175, the decompression unit identifies a compression block and a local index based on the global identifier and the indirection block. An exemplary data structure for implementing method 170 is described below in FIGS. 1E and 1F.
  • FIG. 1E illustrates an indirection data structure 180 comprising a plurality of indirection blocks 182, in accordance with one embodiment. Each indirection block 182 may include a number of bits equal to a cache line size. For example, each indirection block 182 may include one-thousand twenty-four (1024) bits for systems with cache lines sized to have 1024 bits. Indirection data structure 180 may include a number of indirection blocks 182 depending on the number of global identifiers needed to represent a complete scene. Each sequential indirection block 182 may be identified as having an indirection block number, and may be disposed in corresponding contiguous memory addresses, or further mapped through another level of indirection that maps an indirection block, number to an indirection block 182 at a memory location.
  • FIG. 1F illustrates an exemplary structure of an indirection block 182, in accordance with one embodiment. As shown, each indirection block 182 includes a header field 184 and a payload field 186. Header field 184 comprises a compression block index subfield 187, and a local index subfield 188. Payload field 186 includes a number of bits (P) equal to line size (L, e.g. 1024 bits) minus header size (H). Each payload bit corresponds to a unique global identifier number and may father correspond to a geometric primitive identified by the global identifier number. Within any one indirection block 182, bits within payload field 186 may be identified as payload bit 0 or PB[0] through payload bit P−1 or RB[P−1].
  • To allocate a sufficient number of indirection blocks 182 to provide one payload hit per global identifier number in a scene, indirection data structure 180 may include a number of indirection blocks 182 equal to the number of global identifiers in the scene divided by the number of payload bits (P), with the resulting quotient rounded up to the next integer. In such a configuration, a first indirection block 182(0) includes P payload bits corresponding to global identifier values from 0 to P−1; a second indirection block 182(1) includes P payload bits corresponding to global identifier values from P to 2P−1; a third indirection block 182(2) includes P payload bits corresponding to global identifier values from 2P to 3P−1, and so forth. A fixed mapping from a global identifier (e.g., a global triangle index) to an indirection block 182 may be performed by dividing the global identifier by the number of payload bits (P) and rounding the quotient down to the nearest integer. In one embodiment, the fixed mapping is performed in step 174 of method 170.
  • The mapping from a global identifier to an indirection block 182 is fixed and direct. For example, a global identifier within the range of 4P to 5P−1 will map directly to a fifth indirection block 182(4). However, mapping the global identifier further from indirection block 182(4) to a specific compression block 150 is variable because a variable number of compression blocks 150 may be needed to store geometric data for the geometric primitives (e.g. triangles) preceding, in order, the geometric primitive identified by the global identifier number. Such variability depends on the actual geometric data values and their compressibility.
  • As shown, indirection block 182(4) includes payload bits PB[0] through PB[P−1], corresponding to global identifiers 4P through 5P−1. To map an arbitrary global identifier in the range 4P to 5P−1, the payload bits of indirection block 182(4) need to be examined along with header field 184. A compression block 150 is identified by compression block index subfield 187. Global identifiers ranging from 4P to the first occurrence of a payload bit value of one (“1”) map into this identified compression block 150. Local index subfield 188 indicates how many global identifiers are mapped to the identified compression block 150 from a prior indirection block 182. Thus, local index subfield 188 provides an offset for locating geometric data within the identified compression block 150.
  • In one example, compression block index subfield 187 contains the value one-hundred ninety-seven (“197”) and local index field 188 contains the value three (“3”). As shown, a global identifier with value 4P maps to compression block number “197”, with a local index of “3”. In other words, geometric objects 0, 1, and 2 stored in compression block number “197” are associated with a previous indirection block 182 mapping. Furthermore, global identifiers 4P through 4P+4 are mapped to compression block number “197” with corresponding local index values of “3” through “7”, respectively. Global identifiers in the range 4P+5 through 4P±13 map to compression block number “198” with local index values “0” through “7”, respectively. For example, global identifier 4P+9 may be assigned a local index value of “4”. Global identifier 4P+14 maps to compression block number “199”, and so forth.
  • In one embodiment, method 170 performs step 175 in the context of the above description for indirection data structure 180 and indirection block 182.
  • FIG. 1G illustrates a geometric data processing system 190 configured to decompress geometric data from compression blocks residing within memory, in accordance with one embodiment. As shown, geometric data processing system 190 comprises a decompression unit 196 coupled to a processing unit 198 and to a memory interface 194, which may be further coupled to a memory subsystem 192. In one embodiment, one or more of the decompression unit 196, the processing unit 198, and the memory interface 194 is realized by reconfigurable logic that may include (but is not restricted to) FPGAs. Processing unit 198 may include a multi-threaded processor, such as a multi-threaded processor comprising a graphics processing unit (GPU). In one embodiment, processing unit 198 is configured to perform graphics rendering based on ray-tracing of scene data comprising triangles that are organized within a BVH. Data for the triangles may be stored in a compressed format within compression blocks 150, as described in FIG. 1C. Furthermore, certain data for the triangles within the BVH may be stored in an uncompressed format within compression blocks 140, as described in FIG. 1B. Triangles stored in compression blocks 140 may be poorly suited for compression using the techniques disclosed herein, while triangles stored in compression blocks 150 may be more suitable for compression.
  • Processing unit 198 may generate access requests 195 to receive decompressed triangle data 197 corresponding to compressed triangle vertex data residing within compression blocks 150. Access requests 195 may comprise a global triangle index per triangle requested. Alternatively, an access request may include a compression block index and a local index for embodiments where direct access to compression blocks is provided without indirection. Decompressed triangle data 197 may comprise three-dimensional vertex position information represented as numeric values in each of three dimensions. As discussed previously, the numeric values may be represented as floating-point numbers. Decompression unit 196 may perform method 170 to identify a specific compression block 150 as an access request target based on a global triangle index. Method 170 may access indirection data structure 180 to identify the specific compression block 150. Decompression unit 196 may then perform method 100 to decompress vertex data to generate decompressed triangle data 197.
  • Memory interface 194 may operate to receive access requests from decompression unit 196 and generate appropriate media-specific signals 193, such as DRAM control protocol signals for accessing memory subsystem 192. In certain embodiments, decompression unit 196 resides within a memory control subsystem, such as a memory partition unit 280(U) of FIG. 2. Memory interface 194 may further include an additional port (not shown) for receiving conventional access requests from processing unit 198. Memory interface 194 may include cache memory for caching blocks of data residing within memory subsystem 192. Similarly, decompression unit 196 and processing unit 198 may each include cache memory for caching related data, such as decompressed triangle data 197.
  • More illustrative information will now be set forth regarding various optional architectures and features with which the foregoing framework may or may not be implemented, per the desires of the user. It should be strongly noted that the following information is set forth for illustrative purposes and should not be construed as limiting in any manner. Any of the following features may be optionally incorporated with or without the exclusion of other features described.
  • SYSTEM OVERVIEW
  • FIG. 2 illustrates a parallel processing unit (PPU) 200, in accordance with one embodiment. In one embodiment, the PPU 200 is a multi-threaded processor that is implemented on one or more integrated circuit devices. The PPU 200 is a latency hiding architecture designed to process a large number of threads in parallel. A thread (i.e., a thread of execution) is an instantiation of a set of instructions configured to be executed by the PPU 200. In one embodiment, the PPU 200 is a graphics processing unit (GPU) configured to implement a graphics rendering pipeline for processing three-dimensional (3D) graphics data in order to generate two-dimensional (2D) image data for display on a display device such as a liquid crystal display (LCD) device. In other embodiments, the PPU 200 may be utilized for performing general-purpose computations. While one exemplary parallel processor is provided herein for illustrative purposes, it should be strongly noted that such processor is set forth for illustrative purposes only, and that any processor may be employed to supplement and/or substitute for the same.
  • As shown in FIG. 2, the PPU 200 includes an Input/Output (I/O) unit 205, a host interface unit 210, a front end unit 215, a compute scheduler unit (CSU) 220, a compute work distribution unit (CWDU) 225, a graphics primitive distribution unit (GPDU) 230, a hub 235, a crossbar (Xbar) 270, one or more general processing clusters (GPCs) 250, and one or more memory partition units 280. The PPU 200 may be connected to a host processor or other peripheral devices via a system bus 202. The PPU 200 may also be connected to a local memory comprising a number of memory devices 204. In one embodiment, the local memory may comprise a number of dynamic random access memory (DRAM) devices.
  • The I/O unit 205 is configured to transmit and receive communications (i.e., commands, data, etc.) from a host processor (not shown) over the system bus 202. The I/O unit 205 may communicate with the host processor directly via the system bus 202 or through one or more intermediate devices such as a memory bridge. In one embodiment, the I/O unit 205 implements a Peripheral Component Interconnect Express (PCIe) interface for communications over a PCIe bus. In alternative embodiments, the I/O unit 205 may implement other types of well-known interfaces for communicating with external devices.
  • The I/O unit 205 is coupled to a host interface unit 210 that decodes packets received via the system bus 202. In one embodiment, the packets represent commands configured to cause the PPU 200 to perform various operations. The host interface unit 210 transmits the decoded commands to various other units of the PPU 200 as the commands may specify. For example, some commands may be transmitted to the front end unit 215. Other commands may be transmitted to the hub 235 or other units of the PPU 200 such as one or more copy engines, a video encoder, a video decoder, a power management unit, etc. (not explicitly shown). In other words, the host interface unit 210 is configured to route communications between and among the various logical units of the PPU 200.
  • In one embodiment, a program executed by the host processor encodes a command stream in a buffer that provides workloads to the PPU 200 for processing. A workload may comprise a number of instructions and pointers to data to be processed by those instructions. The buffer is a region in a memory that is accessible (i.e., read/write) by both the host processor and the PPU 200. For example, the host interface unit 210 may be configured to access the buffer in a system memory connected to the system bus 202 via memory requests transmitted over the system bus 202 by the I/O unit 205. In one embodiment, the host processor writes the command stream to the buffer and then transmits a pointer to the start of the command stream to the PPU 200. The host interface unit 210 manages the scheduling of instructions from one or more command streams written by the host processor (i.e., channels) on the various sub-units of the PPU 200.
  • The front end unit 215 receives instructions from the host interface unit 210 from one or more command streams and forwards those instructions to the correct sub-unit of the PPU 200. Instructions associated with a compute pipeline may be received by the front end unit 215. These compute instructions are then forwarded to a compute scheduler unit 220. The compute scheduler unit 220 is configured to track state information related to the various tasks managed by the compute scheduler unit 220. The state may indicate which GPC 250 a task is assigned to, whether the task is active or inactive, a priority level associated with the task, and so forth. The compute scheduler unit 220 manages the execution of a plurality of tasks on the one or more GPCs 250.
  • The compute scheduler unit 220 is coupled to a compute work distribution unit 225 that is configured to dispatch tasks for execution on the GPCs 250. The compute work distribution unit 225 may track a number of scheduled tasks received from the compute scheduler unit 220. In one embodiment, the compute work distribution unit 225 manages a pending task pool and an active task pool for each of the GPCs 250. The pending task pool may comprise a number of slots (e.g., 16 slots) that contain tasks assigned to be processed by a particular GPC 250. The active task pool may comprise a number of slots (e.g., 4 slots) for tasks that are actively being processed by the GPCs 250. As a GPC 250 finishes the execution of a task, that task is evicted from the active task pool for the GPC 250 and one of the other tasks from the pending task pool is selected and scheduled for execution on the GPC 250. If an active task has been idle on the GPC 250, such as while waiting for a data dependency to be resolved, then the active task may be evicted from the GPC 250 and returned to the pending task pool while another task in the pending task pool is selected and scheduled for execution on the GPC 250.
  • Returning to the front end unit 215, instructions associated with a graphics pipeline may be received by the front end unit 215. These graphics instructions are then forwarded to a graphics primitive distribution unit 230. The graphics primitive distribution unit 230 fetches vertex data from the memory 204 or the system memory via the system bus 202 for various graphics primitives. Graphics primitives may include points, lines, triangles, quads, triangle strips, and the like. The graphics primitive distribution unit 230 groups the vertices into batches of primitives and dispatches tasks to the GPCs 250 for processing the batches of primitives. Processing may involve executing a shader (i.e., a Vertex Shader, Tessellation Shader, Geometry Shader, etc.) on a programmable processing unit as well as performing fixed function operations on the vertices such as clipping, culling, and viewport transformation using a fixed function unit.
  • The compute work distribution unit 225 and the graphics primitive distribution unit 230 communicate with the one or more GPCs 250 via a XBar 270. The XBar 270 is an interconnect network that couples many of the units of the PPU 200 to other units of the PPU 200. For example, the XBar 270 may be configured to couple the compute work distribution unit 225 to a particular GPC 250. Although not shown explicitly, one or more other units of the PPU 200 are coupled to the host unit 210. The other units may also be connected to the XBar 270 via a hub 235.
  • The tasks associated with the compute pipeline are managed by the compute scheduler unit 220 and dispatched to a GPC 250 by the compute work distribution unit 225. The tasks associated with the graphics pipeline are managed and distributed to a GPC 250 by the graphics primitive distribution unit 230. The GPC 250 is configured to process the tasks and generate results. The results may be consumed by other tasks within the GPC 250, routed to a different GPC 250 via the XBar 270, or stored in the memory 204. The results can be written to the memory 204 via the memory partition units 280, which implement a memory interface for reading and writing data to/from the memory 204. In one embodiment, the PPU 200 includes a number U of memory partition units 280 that is equal to the number of separate and distinct memory devices 204 coupled to the PPU 200.
  • In one embodiment, a host processor executes a driver kernel that implements an application programming interface (API) that enables one or more applications executing on the host processor to schedule operations for execution on the PPU 200. An application may generate instructions (i.e., API calls) that cause the driver kernel to generate one or more tasks for execution by the PPU 200. The driver kernel outputs tasks to one or more streams being processed by the PPU 200. Each task may comprise one or more groups of related threads, referred to herein as a warp. A thread block may refer to a plurality of groups of threads including instructions to perform the task. Threads in the same thread block may exchange data through shared memory. In one embodiment, a warp comprises 32 related threads.
  • FIG. 3 illustrates a GPC 250 of the PPU 200 of FIG. 2, in accordance with one embodiment. As shown in FIG. 3, each GPC 250 includes a number of hardware units for processing tasks. In one embodiment, each GPC 250 includes a pipeline manager 310, a pre raster operations unit (PROP) 315, a raster engine 325, a work distribution crossbar (WDX) 380, a memory management unit (MMU) 390, and one or more Texture Processing Clusters (TPCs) 320. It will be appreciated that the GPC 250 of FIG. 3 may include other hardware units in lieu of or in addition to the units shown in FIG. 3.
  • In one embodiment, the operation of the GPC 250 is controlled by the pipeline manager 310. The pipeline manager 310 manages the configuration of the one or more TPCs 320 for processing tasks allocated to the GPC 250. In one embodiment, the pipeline manager 310 may configure at least one of the one or more TPCs 320 to implement at least a portion of a graphics rendering pipeline. For example, a TPC 320 may be configured to execute a vertex shader program on the programmable streaming multiprocessor (SM) 340. The pipeline manager 310 may also be configured to route packets received from the Xbar 270 to the appropriate logical units within the GPC 250. For example, some packets may be routed to fixed function hardware units in the PROP 315 and/or raster engine 325 while other packets may be routed to the TPCs 320 for processing by the primitive engine 335 or the SM 340.
  • The PROP unit 315 is configured to route data generated by the raster engine 325 and the TPCs 320 to a Raster Operations (ROP) unit in the memory partition unit 280, described in more detail below. The PROP unit 315 may also be configured to perform optimizations for color blending, organize pixel data, perform address translations, and the like.
  • The raster engine 325 includes a number of fixed function hardware units configured to perform various raster operations. In one embodiment, the raster engine 325 includes a setup engine, a coarse raster engine, a culling engine, a clipping engine, a fine raster engine, and a tile coalescing engine. Primitives lying outside a viewing frustum may be clipped by the clipping engine. The setup engine receives transformed vertices that lie within the viewing plane and generates edge equations associated with the geometric primitive defined by the vertices. The edge equations are transmitted to the coarse raster engine to determine the set of pixel tiles covered by the primitive. The output of the coarse raster engine may be transmitted to the culling engine where tiles associated with the primitive that fail a hierarchical z-test are culled. Those fragments that survive culling may be passed to a fine raster engine to generate coverage information (e.g., a coverage mask for each tile) based on the edge equations generated by the setup engine. The output of the raster engine 380 comprises fragments to be processed, for example, by a fragment shader implemented within a TPC 320.
  • Each TPC 320 included in the GPC 250 includes an M-Pipe Controller (MPC) 330, a primitive engine 335, an SM 340, and one or more texture units 345. The MPC 330 controls the operation of the TPC 320, routing packets received from the pipeline manager 310 to the appropriate units in the TPC 320. For example, packets associated with a vertex may be routed to the primitive engine 335, which is configured to fetch vertex attributes associated with the vertex from the memory 204. In contrast, packets associated with a shader program may be transmitted to the SM 340.
  • In one embodiment, the texture units 345 are configured to load texture maps (e.g., a 2D array of texels) from the memory 204 and sample the texture maps to produce sampled texture values for use in shader programs executed by the SM 340. The texture units 345 implement texture operations such as filtering operations using mip-maps (i.e., texture maps of varying levels of detail). In one embodiment, each TPC 320 includes two (2) texture units 345.
  • The SM 340 comprises a programmable streaming processor that is configured to process tasks represented by a number of threads. Each SM 340 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular group of threads concurrently. In one embodiment, the SM 340 implements a SIMD (Single-Instruction, Multiple-Data) architecture where each thread in a group of threads (i.e., a warp) is configured to process a different set of data based on the same set of instructions. All threads in the group of threads execute the same instructions. In another embodiment, the SM 340 implements a SIMT (Single-Instruction, Multiple Thread) architecture where each thread in a group of threads is configured to process a different set of data based on the same set of instructions, but where individual threads in the group of threads are allowed to diverge during execution. In other words, when an instruction for the group of threads is dispatched for execution, some threads in the group of threads may be active, thereby executing the instruction, while other threads in the group of threads may be inactive, thereby performing a no-operation (NOP) instead of executing the instruction.
  • The MMU 390 provides an interface between the GPC 250 and the memory partition unit 280. The MMU 390 may provide translation of virtual addresses into physical addresses, memory protection, and arbitration of memory requests. In one embodiment, the MMU 390 provides one or more translation lookaside buffers (TLBs) for improving translation of virtual addresses into physical addresses in the memory 204.
  • The PPU 200 described above may be configured to perform highly parallel computations much faster than conventional CPUs. Parallel computing has advantages in graphics processing, data compression, biometrics, stream processing algorithms, and the like.
  • In one embodiment, the PPU 200 comprises a graphics processing unit (GPU). The PPU 200 is configured to receive commands that specify shader programs for processing graphics data. Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like. Typically, a primitive includes data that specifies a number of vertices for the primitive (e.g., in a model-space coordinate system) as well as attributes associated with each vertex of the primitive. The PPU 200 can be configured to process the graphics primitives to generate a frame buffer (i.e., pixel data for each of the pixels of the display).
  • An application writes model data for a scene (i.e., a collection of vertices and attributes) to a memory such as a system memory or memory 204. The model data defines each of the objects that may be visible on a display. The application then makes an API call to the driver kernel that requests the model data to be rendered and displayed. The driver kernel reads the model data and writes commands to the one or more streams to perform operations to process the model data. The commands may reference different shader programs to be executed on the SMs 340 of the PPU 200 including one or more of a vertex shader, hull shader, domain shader, geometry shader, and a pixel shader. For example, one or more of the SMs 340 may be configured to execute a vertex shader program that processes a number of vertices defined by the model data. In one embodiment, the different SMs 340 may be configured to execute different shader programs concurrently. For example, a first subset of SMs 340 may be configured to execute a vertex shader program while a second subset of SMs 340 may be configured to execute a pixel shader program. The first subset of SMs 340 processes vertex data to produce processed vertex data and writes the processed vertex data to the L2 cache 360 and/or the memory 204. After the processed vertex data is rasterized (i.e., transformed from three-dimensional data into two-dimensional data in screen space) to produce fragment data, the second subset of SMs 340 executes a pixel shader to produce processed fragment data, which is then blended with other processed fragment data and written to the frame buffer in memory 204. The vertex shader program and pixel shader program may execute concurrently, processing different data from the same scene in a pipelined fashion until all of the model data for the scene has been rendered to the frame buffer. Then, the contents of the frame buffer are transmitted to a display controller for display on a display device.
  • The PPU 200 may be included in a desktop computer, a laptop computer, a tablet computer, a smart-phone (e.g., a wireless, hand-held device), personal digital assistant (PDA), digital camera, a hand-held electronic device, and the like. In one embodiment, the PPU 200 is embodied on a single semiconductor substrate. In another embodiment, the PPU 200 is included in a system-on-a-chip (SoC) along with one or more other logic units such as a reduced instruction set computer (RISC) CPU, a memory management unit (MMU), a digital-to-analog converter (DAC), and the like.
  • In one embodiment, the PPU 200 may be included on a graphics card that includes one or more memory devices 204 such as GDDR5 SDRAM. The graphics card may be configured to interface with a PCIe slot on a motherboard of a desktop computer that includes, e.g., a northbridge chipset and a southbridge chipset. In yet another embodiment, the PPU 200 may be an integrated graphics processing unit (iGPU) included in the chipset (i.e., Northbridge) of the motherboard.
  • In one embodiment, TPC 320 includes one or more tree traversal units (TTUs) 395, in accordance with one embodiment. The TTUs 395 are each configured to perform tree traversal operations. Tree traversal operations are commonly utilized in, for example, ray tracing algorithms in computer graphics. However, the TTUs 395 may be optimized for general tree traversal operations and are not limited, specifically, to ray tracing techniques.
  • In one embodiment, each TPC 320 included in the PPU 200 may include one or more TTUs 395 for performing tree traversal operations. The TTUs 395 are coupled to the SM 340 similar to the texture units 345. Alternately, one or more TTUs 395 may be implemented within the PPU 200 and shared by one or more GPCs 250 or one or more SMs 340.
  • A tree traversal operation may include any operation performed by traversing the nodes of a tree data structure. A tree data structure may include, but is not limited to, a binary tree, an octree, a four-ary tree, a k-d tree, a binary space partitioning (BSP) tree, and a bounding volume hierarchy (BVH) tree. In one embodiment, the tree traversal operation includes a number of instructions for intersection a query shape with the tree. The query shapes may be, e.g., rays, bounding boxes, frustums, cones, spheres, and the like. In various embodiments, a query shape may be specified by a query data structure. The query data structure may include any technically feasible technique for specifying the query shape to intersect with the tree. For example, the query data structure may specify the starting and ending points of a ray using two three-coordinate vectors. In another example, the query data structure may specify the six planes of an axis-aligned bounding box using six 32-bit floating point coordinates. The various query data structures may include any number of fields for specifying the attributes of the query shape.
  • For example, one type of tree traversal operation for which the TTU 395 may be optimized is to intersect a ray with a BVH data structure that represents each of the geometric primitives in a 3D scene or 3D model. The TTU 395 may be particularly useful in ray-tracing applications in which millions or even billions of rays are intersected with the geometric primitives of a 3D model represented by a BVH data structure.
  • EXEMPLARY SYSTEM
  • FIG. 4 illustrates an exemplary system 400 in which the various architecture and/or functionality of the various previous embodiments may be implemented. As shown, a system 400 is provided including at least one central processor 401 that is connected to a communication bus 402. The communication bus 402 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point communication protocol(s). The system 400 also includes a main memory 404. Control logic (software) and data are stored in the main memory 404 which may take the form of random access memory (RAM).
  • The system 400 also includes input devices 412, a graphics processor 406, and a display 408, i.e. a conventional CRT (cathode ray tube), LCD (liquid crystal display), LED (light emitting diode), plasma display or the like. User input may be received from the input devices 412, e.g., keyboard, mouse, touchpad, microphone, and the like. In one embodiment, the graphics processor 406 may include a plurality of shader modules, a rasterization module, etc. Each of the foregoing modules may even be situated on a single semiconductor platform to form a graphics processing unit (GPU).
  • In the present description, a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.
  • The system 400 may also include a secondary storage 410. The secondary storage 410 includes, for example, a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory. The removable storage drive reads from and/or writes to a removable storage unit in a well-known manner.
  • Computer programs, or computer control logic algorithms, may be stored in the main memory 404 and/or the secondary storage 410. Such computer programs, when executed, enable the system 400 to perform various functions. The memory 404, the storage 410, and/or any other storage are possible examples of non-transitory computer-readable media.
  • In one embodiment, the architecture and/or functionality of the various previous figures may be implemented in the context of the central processor 401, the graphics processor 406, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the central processor 401 and the graphics processor 406, a chipset (i.e., a group of integrated circuits designed to work and sold as a unit for performing related functions, etc.), and/or any other integrated circuit for that matter.
  • Still yet, the architecture and/or functionality of the various previous figures may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system, and/or any other desired system. For example, the system 400 may take the form of a desktop computer, laptop computer, server, workstation, game consoles, embedded system, and/or any other type of logic. Still yet, the system 400 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a television, etc.
  • Further, while not shown, the system 400 may be coupled to a network (e.g., a telecommunications network, local area network (LAN), wireless network, wide area network (WAN) such as the Internet, peer-to-peer network, cable network, or the like) for communication purposes.
  • While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (20)

What is claimed is:
1. A method, comprising:
receiving a compression block configured to store a header and compressed geometric data for at least two geometric primitives;
identifying a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives, based on a first local index; and
generating a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data, based on at least a first anchor value,
wherein the first set of decompressed geometric data comprises more bits of data than the first set of compressed geometric data.
2. The method of claim 1, wherein the header includes a mode subfield that controls interpretation of bits in the compression block according to two or more enumerated formats.
3. The method of claim 2, wherein at least one of the two or more enumerated formats includes an uncompressed format and at least one of the two or more enumerated formats includes a compressed format.
4. The method of claim 1, wherein the compressed geometric data includes a vertex positions field comprising a vertex position anchor subfield that specifies a three-dimensional anchor position that includes the first anchor value, and two or more vertex position offset subfields that each specifies a three-dimensional position offset relative to the three-dimensional anchor position.
5. The method of claim 4, wherein the header includes a set of three subfields for specifying a bit precision for each of the three dimensions associated with the two or more vertex position offset subfields.
6. The method of claim 1, wherein the compressed geometric data includes a topology field comprising vertex identifiers, and three different vertex identifiers are associated with at least one of the at least two geometric primitives, wherein each vertex identifier references either one three-dimensional position offset or the three-dimensional anchor position.
7. The method of claim 6, wherein a first vertex identifier associated with a first of the at least two geometric primitives refers to a first three-dimensional position offset, and a second vertex identifier associated with a second of the at least two geometric primitives refers to the first three-dimensional position offset.
8. The method of claim 1, wherein the compressed geometric data includes a topology field comprising alpha bits, wherein a different alpha bit is associated with each of the at least two geometric primitives.
9. The method of claim 1, wherein the compressed geometric data includes a topology field comprising primitive metadata values, wherein a different primitive metadata value is associated with each of the at least two geometric primitives.
10. The method of claim 1, wherein the first geometric primitive is associated with a global identifier, and the global identifier deter nines an indirection block number that identifies an indirection block.
11. The method of claim 10, wherein the indirection block number is determined by dividing the value of the global identifier with a number of payload bits and rounding the quotient down to the nearest integer.
12. The method of claim 10, wherein the indirection block includes a compression block index subfield, a local index subfield, and payload field.
13. The method of claim 12, wherein the compression block and the first local index are determined by the global identifier in conjunction with the compression block index subfield, the local index subfield, and the payload field.
14. A system, comprising:
a memory;
a processing unit configured to generate a global identifier; and
a decompression unit coupled to the processing unit and the memory, and configured to:
receive a compression block configured to store a header and compressed geometric data for at least two geometric primitives;
identify a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives, based on a first local index; and
generate a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data, based on at least a first anchor value,
wherein the first set of decompressed geometric data comprises more bits of data than the first set of compressed geometric data, and
wherein the memory is configured to store the compression block.
15. The system of claim 14, wherein the compressed geometric data includes a vertex positions field comprising a vertex position anchor subfield that specifies a three-dimensional anchor position that includes the first anchor value, and two or more vertex position offset subfields that each specifies a three-dimensional position offset relative to the three-dimensional anchor position, and wherein the header includes a set of three subfields for specifying a bit precision for each of the three dimensions associated with the two or more vertex position offset subfields.
16. The system of claim 14, wherein the compressed geometric data includes a topology field comprising vertex identifiers, wherein three different vertex identifiers are associated with at least one of the at least two geometric primitives, and wherein each vertex identifier references either one three-dimensional position offset or the three-dimensional anchor position, and wherein a first vertex identifier associated with a first of the at least two geometric primitives refers to a first three-dimensional position offset, and a second vertex identifier associated with a second of the at least two geometric primitives refers to the first three-dimensional position offset.
17. The system of claim 14, wherein the compressed geometric data includes a topology field comprising primitive metadata values, and a different primitive metadata value is associated with each of the at least two geometric primitives.
18. The system of claim 14, wherein the memory is further configured to store an indirection block, and wherein the first geometric primitive is associated with a global identifier, and wherein the global identifier determines an indirection block number that identifies the indirection block, and wherein the indirection block number is determined by dividing the value of the global identifier with a number of payload bits and rounding the quotient down to the nearest integer, and wherein the indirection block includes a compression block index subfield, a local index subfield, and payload field, and wherein the compression block and the first local index are determined by the global identifier in conjunction with the compression block index subfield, the local index subfield, and the payload field.
19. The system of claim 14, further comprising a caching system with cache lines configured to store a number of cache line bits, and wherein the compression block includes the number of cache line bits, and wherein an indirection block includes the number of cache line bits.
20. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps comprising:
receiving a compression block configured to store a header and compressed geometric data for at least two geometric primitives;
identifying a location within the compression block of a first set of compressed geometric data for a first geometric primitive of the at least two geometric primitives, based on a first local index; and
generating a first set of decompressed geometric data for the first geometric primitive by decompressing the first set of compressed geometric data, based on at least a first anchor value,
wherein the first set of decompressed geometric data comprises more bits of data than the first set of compressed geometric data.
US14/737,343 2014-09-04 2015-06-11 Block-based lossless compression of geometric data Abandoned US20160071234A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/737,343 US20160071234A1 (en) 2014-09-04 2015-06-11 Block-based lossless compression of geometric data
US16/502,415 US10866990B2 (en) 2014-09-04 2019-07-03 Block-based lossless compression of geometric data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462046093P 2014-09-04 2014-09-04
US14/737,343 US20160071234A1 (en) 2014-09-04 2015-06-11 Block-based lossless compression of geometric data

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/502,415 Continuation US10866990B2 (en) 2014-09-04 2019-07-03 Block-based lossless compression of geometric data

Publications (1)

Publication Number Publication Date
US20160071234A1 true US20160071234A1 (en) 2016-03-10

Family

ID=55437691

Family Applications (6)

Application Number Title Priority Date Filing Date
US14/563,872 Active 2037-04-05 US10235338B2 (en) 2014-09-04 2014-12-08 Short stack traversal of tree data structures
US14/589,904 Active 2035-01-27 US9582607B2 (en) 2014-09-04 2015-01-05 Block-based bounding volume hierarchy
US14/662,090 Active 2035-03-20 US9569559B2 (en) 2014-09-04 2015-03-18 Beam tracing
US14/697,480 Active 2036-09-23 US10025879B2 (en) 2014-09-04 2015-04-27 Tree data structures based on a plurality of local coordinate systems
US14/737,343 Abandoned US20160071234A1 (en) 2014-09-04 2015-06-11 Block-based lossless compression of geometric data
US16/502,415 Active US10866990B2 (en) 2014-09-04 2019-07-03 Block-based lossless compression of geometric data

Family Applications Before (4)

Application Number Title Priority Date Filing Date
US14/563,872 Active 2037-04-05 US10235338B2 (en) 2014-09-04 2014-12-08 Short stack traversal of tree data structures
US14/589,904 Active 2035-01-27 US9582607B2 (en) 2014-09-04 2015-01-05 Block-based bounding volume hierarchy
US14/662,090 Active 2035-03-20 US9569559B2 (en) 2014-09-04 2015-03-18 Beam tracing
US14/697,480 Active 2036-09-23 US10025879B2 (en) 2014-09-04 2015-04-27 Tree data structures based on a plurality of local coordinate systems

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/502,415 Active US10866990B2 (en) 2014-09-04 2019-07-03 Block-based lossless compression of geometric data

Country Status (1)

Country Link
US (6) US10235338B2 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180293784A1 (en) * 2017-04-07 2018-10-11 Carsten Benthin Ray tracing apparatus and method for memory access and register operations
US10204441B2 (en) * 2017-04-07 2019-02-12 Intel Corporation Apparatus and method for hierarchical beam tracing and packet compression in a ray tracing system
US20190108656A1 (en) * 2016-06-14 2019-04-11 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
WO2019089160A1 (en) 2017-11-02 2019-05-09 Advanced Micro Devices, Inc. Compression and decompression of indices in a graphics pipeline
US10740952B2 (en) 2018-08-10 2020-08-11 Nvidia Corporation Method for handling of out-of-order opaque and alpha ray/primitive intersections
US10810785B2 (en) 2018-08-10 2020-10-20 Nvidia Corporation Method for forward progress tree traversal mechanisms in hardware
US10825230B2 (en) 2018-08-10 2020-11-03 Nvidia Corporation Watertight ray triangle intersection
US10885698B2 (en) 2018-08-10 2021-01-05 Nvidia Corporation Method for programmable timeouts of tree traversal mechanisms in hardware
CN112559040A (en) * 2020-12-02 2021-03-26 北京百度网讯科技有限公司 Instruction execution method and device, electronic equipment and storage medium
US11170254B2 (en) 2017-09-07 2021-11-09 Aurora Innovation, Inc. Method for image analysis
US11170556B2 (en) * 2019-07-04 2021-11-09 Lg Electronics Inc. Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data
US11189075B2 (en) 2018-08-10 2021-11-30 Nvidia Corporation Query-specific behavioral modification of tree traversal
US11270495B2 (en) 2020-05-21 2022-03-08 Nvidia Corporation Scattered geometry compression for ray tracing acceleration structures
US11282261B2 (en) 2020-06-10 2022-03-22 Nvidia Corporation Ray tracing hardware acceleration with alternative world space transforms
US11302056B2 (en) 2020-06-10 2022-04-12 Nvidia Corporation Techniques for traversing data employed in ray tracing
US11321903B2 (en) * 2020-03-27 2022-05-03 Advanced Micro Devices, Inc. Bounding volume hierarchy compression
US11334762B1 (en) 2017-09-07 2022-05-17 Aurora Operations, Inc. Method for image analysis
US11380041B2 (en) 2020-06-11 2022-07-05 Nvidia Corporation Enhanced techniques for traversing ray tracing acceleration structures
US11394979B2 (en) * 2020-01-09 2022-07-19 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US11450057B2 (en) 2020-06-15 2022-09-20 Nvidia Corporation Hardware acceleration for ray tracing primitives that share vertices
US11481952B2 (en) * 2014-05-29 2022-10-25 Imagination Technologies Limited Allocation of primitives to primitive blocks
US11508112B2 (en) 2020-06-18 2022-11-22 Nvidia Corporation Early release of resources in ray tracing hardware
US11631158B2 (en) * 2020-03-18 2023-04-18 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method

Families Citing this family (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9607426B1 (en) * 2013-12-20 2017-03-28 Imagination Technologies Limited Asynchronous and concurrent ray tracing and rasterization rendering processes
US9552664B2 (en) 2014-09-04 2017-01-24 Nvidia Corporation Relative encoding for a block-based bounding volume hierarchy
US10235338B2 (en) 2014-09-04 2019-03-19 Nvidia Corporation Short stack traversal of tree data structures
US10242485B2 (en) * 2014-09-04 2019-03-26 Nvidia Corporation Beam tracing
WO2016038858A1 (en) * 2014-09-09 2016-03-17 日本電気株式会社 Data management system, data management device, data management method, and program
KR102244619B1 (en) * 2014-09-30 2021-04-26 삼성전자 주식회사 Method for generating and traverse acceleration structure
US10133763B2 (en) 2015-10-20 2018-11-20 International Business Machines Corporation Isolation of concurrent operations on tree-based data structures
US10223409B2 (en) * 2015-10-20 2019-03-05 International Business Machines Corporation Concurrent bulk processing of tree-based data structures
US10102231B2 (en) * 2015-10-20 2018-10-16 International Business Machines Corporation Ordering heterogeneous operations in bulk processing of tree-based data structures
KR102604737B1 (en) * 2016-01-11 2023-11-22 삼성전자주식회사 METHOD AND APPARATUS for generating acceleration structure
US9858704B2 (en) * 2016-04-04 2018-01-02 Intel Corporation Reduced precision ray traversal with plane reuse
US9922396B2 (en) * 2016-04-04 2018-03-20 Intel Corporation Reduction of BVH-node bandwidth with incremental traversal
WO2017200527A1 (en) * 2016-05-16 2017-11-23 Hewlett-Packard Development Company, L.P. Generating a shape profile for a 3d object
CN105979211B (en) * 2016-06-07 2019-01-22 中国地质大学(武汉) A kind of three-dimensional coverage rate calculation method suitable for multi-view point video monitoring system
US9881389B1 (en) * 2017-02-07 2018-01-30 8i Limited Data compression for visual elements
US10664286B2 (en) * 2017-03-13 2020-05-26 Board Of Supervisors Of Louisiana State University And Agricultural And Mechanical College Enhanced performance for graphical processing unit transactional memory
US10726514B2 (en) * 2017-04-28 2020-07-28 Intel Corporation Compute optimizations for low precision machine learning operations
US10417807B2 (en) * 2017-07-13 2019-09-17 Imagination Technologies Limited Hybrid hierarchy of bounding and grid structures for ray tracing
US10586374B2 (en) * 2017-07-26 2020-03-10 Alvin D. Zimmerman Bounding volume hierarchy using virtual grid
US10861196B2 (en) 2017-09-14 2020-12-08 Apple Inc. Point cloud compression
US11818401B2 (en) 2017-09-14 2023-11-14 Apple Inc. Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables
US10897269B2 (en) 2017-09-14 2021-01-19 Apple Inc. Hierarchical point cloud compression
US11113845B2 (en) 2017-09-18 2021-09-07 Apple Inc. Point cloud compression using non-cubic projections and masks
US10909725B2 (en) 2017-09-18 2021-02-02 Apple Inc. Point cloud compression
US10607373B2 (en) 2017-11-22 2020-03-31 Apple Inc. Point cloud compression with closed-loop color conversion
US11010928B2 (en) 2018-04-10 2021-05-18 Apple Inc. Adaptive distance based point cloud compression
US10939129B2 (en) 2018-04-10 2021-03-02 Apple Inc. Point cloud compression
US10909726B2 (en) 2018-04-10 2021-02-02 Apple Inc. Point cloud compression
US10909727B2 (en) 2018-04-10 2021-02-02 Apple Inc. Hierarchical point cloud compression with smoothing
US11017566B1 (en) 2018-07-02 2021-05-25 Apple Inc. Point cloud compression with adaptive filtering
US11202098B2 (en) 2018-07-05 2021-12-14 Apple Inc. Point cloud compression with multi-resolution video encoding
US11012713B2 (en) 2018-07-12 2021-05-18 Apple Inc. Bit stream structure for compressed point cloud data
US10672178B2 (en) * 2018-08-08 2020-06-02 Alvin D. Zimmerman Ray traversal with virtual grids
US10580196B1 (en) 2018-08-10 2020-03-03 Nvidia Corporation Method for continued bounding volume hierarchy traversal on intersection without shader intervention
US11367224B2 (en) 2018-10-02 2022-06-21 Apple Inc. Occupancy map block-to-patch information compression
US10762686B2 (en) 2018-12-28 2020-09-01 Intel Corporation Apparatus and method for a hierarchical beam tracer
US11062500B2 (en) * 2018-12-28 2021-07-13 Intel Corporation Apparatus and method for ray tracing with grid primitives
US10930051B2 (en) 2018-12-28 2021-02-23 Intel Corporation Apparatus and method for general ray tracing queries
US10755469B2 (en) * 2018-12-28 2020-08-25 Intel Corporation Apparatus and method for ray tracing instruction processing and execution
US11500841B2 (en) * 2019-01-04 2022-11-15 International Business Machines Corporation Encoding and decoding tree data structures as vector data structures
US11363249B2 (en) 2019-02-22 2022-06-14 Avalon Holographics Inc. Layered scene decomposition CODEC with transparency
US11537581B2 (en) * 2019-03-22 2022-12-27 Hewlett Packard Enterprise Development Lp Co-parent keys for document information trees
US11057564B2 (en) 2019-03-28 2021-07-06 Apple Inc. Multiple layer flexure for supporting a moving image sensor
US11321910B2 (en) * 2019-04-04 2022-05-03 Intel Corporation Apparatus and method for reduced precision bounding volume hierarchy construction
KR102151444B1 (en) * 2019-04-11 2020-09-03 주식회사 실리콘아츠 Ray tracing device using mimd based t&i scheduling
WO2020229394A1 (en) * 2019-05-10 2020-11-19 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Matrix-based intra prediction
US11711544B2 (en) 2019-07-02 2023-07-25 Apple Inc. Point cloud compression with supplemental information messages
US11627314B2 (en) 2019-09-27 2023-04-11 Apple Inc. Video-based point cloud compression with non-normative smoothing
US11562507B2 (en) 2019-09-27 2023-01-24 Apple Inc. Point cloud compression using video encoding with time consistent patches
US11316951B2 (en) * 2019-09-30 2022-04-26 Citrix Systems, Inc. Polytree queue for synchronizing data with a shared resource
US11538196B2 (en) 2019-10-02 2022-12-27 Apple Inc. Predictive coding for point cloud compression
US11895307B2 (en) 2019-10-04 2024-02-06 Apple Inc. Block-based predictive coding for point cloud compression
GB2589372B (en) * 2019-11-29 2022-04-13 Sony Interactive Entertainment Inc Image generation system and method
US11017581B1 (en) 2020-01-04 2021-05-25 Adshir Ltd. Method for constructing and traversing accelerating structures
US11798196B2 (en) 2020-01-08 2023-10-24 Apple Inc. Video-based point cloud compression with predicted patches
US11625866B2 (en) 2020-01-09 2023-04-11 Apple Inc. Geometry encoding using octrees and predictive trees
US11681545B2 (en) 2020-03-11 2023-06-20 Cisco Technology, Inc. Reducing complexity of workflow graphs through vertex grouping and contraction
US11295508B2 (en) 2020-06-10 2022-04-05 Nvidia Corporation Hardware-based techniques applicable for ray tracing for efficiently representing and processing an arbitrary bounding volume
US11373358B2 (en) * 2020-06-15 2022-06-28 Nvidia Corporation Ray tracing hardware acceleration for supporting motion blur and moving/deforming geometry
US11620768B2 (en) * 2020-06-24 2023-04-04 Apple Inc. Point cloud geometry compression using octrees with multiple scan orders
US11615557B2 (en) 2020-06-24 2023-03-28 Apple Inc. Point cloud compression using octrees with slicing
US11238640B2 (en) * 2020-06-26 2022-02-01 Advanced Micro Devices, Inc. Early culling for ray tracing
EP3940649A1 (en) * 2020-07-14 2022-01-19 Imagination Technologies Limited Methods and systems for constructing ray tracing acceleration structures
US11494969B2 (en) 2020-08-20 2022-11-08 Sony Interactive Entertainment LLC System and method for accelerated ray tracing with asynchronous operation and ray transformation
US11704859B2 (en) * 2020-08-20 2023-07-18 Sony Interactive Entertainment LLC System and method for accelerated ray tracing
US11755366B2 (en) * 2020-09-01 2023-09-12 EMC IP Holding Company LLC Parallel handling of a tree data structure for multiple system processes
US20220134222A1 (en) * 2020-11-03 2022-05-05 Nvidia Corporation Delta propagation in cloud-centric platforms for collaboration and connectivity
GB2599183B (en) * 2021-03-23 2022-10-12 Imagination Tech Ltd Intersection testing in a ray tracing system
GB2599188B (en) 2021-03-23 2022-10-12 Imagination Tech Ltd Intersection testing in a ray tracing system
CN113259624A (en) * 2021-03-24 2021-08-13 北京潞电电气设备有限公司 Monitoring equipment and method thereof
US11948338B1 (en) 2021-03-29 2024-04-02 Apple Inc. 3D volumetric content encoding using 2D videos and simplified 3D meshes
EP4113448A1 (en) * 2021-06-29 2023-01-04 Imagination Technologies Limited Scheduling processing in a ray tracing system
US20230108967A1 (en) 2021-09-16 2023-04-06 Nvidia Corporation Micro-meshes, a structured geometry for computer graphics
US20230252717A1 (en) * 2022-02-04 2023-08-10 Qualcomm Incorporated Ray tracing processor
US20230252685A1 (en) * 2022-02-04 2023-08-10 Qualcomm Incorporated Leaf node compression with compressibility prediction
US20230334750A1 (en) * 2022-03-31 2023-10-19 Imagination Technologies Limited Methods and hardware logic for loading ray tracing data into a shader processing unit of a graphics processing unit
US11893677B1 (en) * 2022-07-29 2024-02-06 Qualcomm Incorporated Bounding volume hierarchy (BVH) widening based on node compressibility
US20240087211A1 (en) 2022-09-09 2024-03-14 Nvidia Corporation Generation and Traversal of Partial Acceleration Structures for Ray Tracing
US20240095995A1 (en) 2022-09-16 2024-03-21 Nvidia Corporation Reducing false positive ray traversal using ray clipping
US20240095996A1 (en) 2022-09-16 2024-03-21 Nvidia Corporation Efficiency of ray-box tests
US20240095994A1 (en) 2022-09-16 2024-03-21 Nvidia Corporation Reducing false positive ray traversal using point degenerate culling
US20240095993A1 (en) 2022-09-16 2024-03-21 Nvidia Corporation Reducing false positive ray traversal in a bounding volume hierarchy

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030184555A1 (en) * 2002-03-26 2003-10-02 Christopher Fraser Display list compression for a tiled 3-D rendering system
US20040100474A1 (en) * 2002-11-27 2004-05-27 Eric Demers Apparatus for generating anti-aliased and stippled 3d lines, points and surfaces using multi-dimensional procedural texture coordinates
US20070085714A1 (en) * 2005-09-30 2007-04-19 Intel Corporation Apparatus, system, and method of data compression
US20080228933A1 (en) * 2007-03-12 2008-09-18 Robert Plamondon Systems and methods for identifying long matches of data in a compression history
US20090110305A1 (en) * 2007-10-30 2009-04-30 Simon Fenney Method and apparatus for compressing and decompressing data
US20090189890A1 (en) * 2008-01-27 2009-07-30 Tim Corbett Methods and systems for improving resource utilization by delaying rendering of three dimensional graphics
US20100169382A1 (en) * 2008-12-30 2010-07-01 Gad Sheaffer Metaphysical address space for holding lossy metadata in hardware
US20100316113A1 (en) * 2006-11-17 2010-12-16 Euee-S Jang Recorded medium having program for coding and decoding using bit-precision, and apparatus thereof
US20110310102A1 (en) * 2010-06-17 2011-12-22 Via Technologies, Inc. Systems and methods for subdividing and storing vertex data
US20120229464A1 (en) * 2011-03-09 2012-09-13 Steven Fishwick Compression of a tessellated primitive index list in a tile rendering system
US20130326190A1 (en) * 2012-05-11 2013-12-05 Samsung Electronics Co., Ltd. Coarse-grained reconfigurable processor and code decompression method thereof
US20130339472A1 (en) * 2012-06-19 2013-12-19 Canon Kabushiki Kaisha Methods and systems for notifying a server with cache information and for serving resources based on it
US20140354666A1 (en) * 2013-05-09 2014-12-04 Imagination Technologies Limited Vertex parameter data compression
US20140358876A1 (en) * 2013-05-29 2014-12-04 International Business Machines Corporation Managing a multi-version database
US20150070372A1 (en) * 2013-09-12 2015-03-12 Arm Limited Image processing apparatus and a method of storing encoded data blocks generated by such an image processing apparatus

Family Cites Families (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6184897B1 (en) 1997-01-15 2001-02-06 International Business Machines Corporation Compressed representation of changing meshes and method to decompress
US6097394A (en) * 1997-04-28 2000-08-01 Board Of Trustees, Leland Stanford, Jr. University Method and system for light field rendering
US6326963B1 (en) 1998-01-22 2001-12-04 Nintendo Co., Ltd. Method and apparatus for efficient animation and collision detection using local coordinate systems
US6373488B1 (en) 1999-10-18 2002-04-16 Sierra On-Line Three-dimensional tree-structured data display
IT1311443B1 (en) 1999-11-16 2002-03-12 St Microelectronics Srl METHOD OF CLASSIFICATION OF DIGITAL IMAGES ON THE BASIS OF THEIR CONTENT.
US7495664B2 (en) 2000-06-19 2009-02-24 Mental Images Gmbh Instant ray tracing
US7499053B2 (en) 2000-06-19 2009-03-03 Mental Images Gmbh Real-time precision ray tracing
US8411088B2 (en) 2000-06-19 2013-04-02 Nvidia Corporation Accelerated ray tracing
US7161599B2 (en) 2001-10-18 2007-01-09 Microsoft Corporation Multiple-level graphics processing system and method
US7619633B2 (en) 2002-06-27 2009-11-17 Microsoft Corporation Intelligent caching data structure for immediate mode graphics
US7443401B2 (en) 2001-10-18 2008-10-28 Microsoft Corporation Multiple-level graphics processing with animation interval generation
US7064766B2 (en) 2001-10-18 2006-06-20 Microsoft Corporation Intelligent caching data structure for immediate mode graphics
US7337163B1 (en) 2003-12-04 2008-02-26 Hyperion Solutions Corporation Multidimensional database query splitting
DE102004007835A1 (en) 2004-02-17 2005-09-15 Universität des Saarlandes Device for displaying dynamic complex scenes
US7145562B2 (en) 2004-05-03 2006-12-05 Microsoft Corporation Integration of three dimensional scene hierarchy into two dimensional compositing system
US7792817B2 (en) 2005-04-19 2010-09-07 International Business Machines Corporation System and method for managing complex relationships over distributed heterogeneous data sources
KR20090028706A (en) 2006-06-30 2009-03-19 텔레 아틀라스 노스 아메리카, 인크. Nearest search on adaptive index with variable compression
US7773087B2 (en) 2007-04-19 2010-08-10 International Business Machines Corporation Dynamically configuring and selecting multiple ray tracing intersection methods
US8502819B1 (en) * 2007-12-17 2013-08-06 Nvidia Corporation System and method for performing ray tracing node traversal in image rendering
TWI358647B (en) 2007-12-28 2012-02-21 Ind Tech Res Inst Data classification system and method for building
US8217935B2 (en) 2008-03-31 2012-07-10 Caustic Graphics, Inc. Apparatus and method for ray tracing with block floating point data
CN101478551B (en) 2009-01-19 2011-12-28 清华大学 Multi-domain network packet classification method based on multi-core processor
US9424370B2 (en) * 2009-03-12 2016-08-23 Siemens Product Lifecycle Management Software Inc. System and method for spatial partitioning of CAD models
US8248412B2 (en) 2009-03-19 2012-08-21 International Business Machines Corporation Physical rendering with textured bounding volume primitive mapping
WO2011035800A2 (en) * 2009-07-24 2011-03-31 Uws Ventures Ltd. Direct ray tracing of 3d scenes
US8669977B2 (en) 2009-10-01 2014-03-11 Intel Corporation Hierarchical mesh quantization that facilitates efficient ray tracing
US10163187B2 (en) 2009-10-30 2018-12-25 Intel Corproation Graphics rendering using a hierarchical acceleration structure
KR101697238B1 (en) 2010-08-26 2017-01-17 삼성전자주식회사 Image processing apparatus and method
US8791945B2 (en) 2011-05-18 2014-07-29 Intel Corporation Rendering tessellated geometry with motion and defocus blur
US8638331B1 (en) * 2011-09-16 2014-01-28 Disney Enterprises, Inc. Image processing using iterative generation of intermediate images using photon beams of varying parameters
US9183667B2 (en) 2011-07-15 2015-11-10 Kirill Garanzha Out-of-core ray tracing with memory-efficient page generation
US9013484B1 (en) * 2012-06-01 2015-04-21 Disney Enterprises, Inc. Progressive expectation-maximization for hierarchical volumetric photon mapping
US9146957B2 (en) 2012-12-20 2015-09-29 Business Objects Software Ltd. Method and system for generating optimal membership-check queries
AU2013200051B2 (en) 2013-01-04 2016-02-11 Canon Kabushiki Kaisha Method, apparatus and system for de-blocking video data
KR101993835B1 (en) * 2013-02-25 2019-06-27 삼성전자주식회사 Method and apparatus for adaptive stack management
KR102193683B1 (en) 2013-10-22 2020-12-21 삼성전자주식회사 Apparatus and method for traversing acceleration structure in a ray tracing system
KR20150057868A (en) 2013-11-20 2015-05-28 삼성전자주식회사 Method and apparatus for traversing binary tree in a ray tracing system
US10235338B2 (en) 2014-09-04 2019-03-19 Nvidia Corporation Short stack traversal of tree data structures
US9552664B2 (en) 2014-09-04 2017-01-24 Nvidia Corporation Relative encoding for a block-based bounding volume hierarchy
US9928640B2 (en) 2015-12-18 2018-03-27 Intel Corporation Decompression and traversal of a bounding volume hierarchy

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030184555A1 (en) * 2002-03-26 2003-10-02 Christopher Fraser Display list compression for a tiled 3-D rendering system
US20040100474A1 (en) * 2002-11-27 2004-05-27 Eric Demers Apparatus for generating anti-aliased and stippled 3d lines, points and surfaces using multi-dimensional procedural texture coordinates
US20070085714A1 (en) * 2005-09-30 2007-04-19 Intel Corporation Apparatus, system, and method of data compression
US20100316113A1 (en) * 2006-11-17 2010-12-16 Euee-S Jang Recorded medium having program for coding and decoding using bit-precision, and apparatus thereof
US20080228933A1 (en) * 2007-03-12 2008-09-18 Robert Plamondon Systems and methods for identifying long matches of data in a compression history
US20090110305A1 (en) * 2007-10-30 2009-04-30 Simon Fenney Method and apparatus for compressing and decompressing data
US20090189890A1 (en) * 2008-01-27 2009-07-30 Tim Corbett Methods and systems for improving resource utilization by delaying rendering of three dimensional graphics
US20100169382A1 (en) * 2008-12-30 2010-07-01 Gad Sheaffer Metaphysical address space for holding lossy metadata in hardware
US20110310102A1 (en) * 2010-06-17 2011-12-22 Via Technologies, Inc. Systems and methods for subdividing and storing vertex data
US20120229464A1 (en) * 2011-03-09 2012-09-13 Steven Fishwick Compression of a tessellated primitive index list in a tile rendering system
US20130326190A1 (en) * 2012-05-11 2013-12-05 Samsung Electronics Co., Ltd. Coarse-grained reconfigurable processor and code decompression method thereof
US20130339472A1 (en) * 2012-06-19 2013-12-19 Canon Kabushiki Kaisha Methods and systems for notifying a server with cache information and for serving resources based on it
US20140354666A1 (en) * 2013-05-09 2014-12-04 Imagination Technologies Limited Vertex parameter data compression
US20140358876A1 (en) * 2013-05-29 2014-12-04 International Business Machines Corporation Managing a multi-version database
US20150070372A1 (en) * 2013-09-12 2015-03-12 Arm Limited Image processing apparatus and a method of storing encoded data blocks generated by such an image processing apparatus

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230038653A1 (en) * 2014-05-29 2023-02-09 Imagination Technologies Limited Allocation of primitives to primitive blocks
US11481952B2 (en) * 2014-05-29 2022-10-25 Imagination Technologies Limited Allocation of primitives to primitive blocks
US11127169B2 (en) * 2016-06-14 2021-09-21 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
US11593970B2 (en) 2016-06-14 2023-02-28 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
US20190108656A1 (en) * 2016-06-14 2019-04-11 Panasonic Intellectual Property Corporation Of America Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
US10861216B2 (en) * 2017-04-07 2020-12-08 Intel Corporation Ray tracing apparatus and method for memory access and register operations
US10204441B2 (en) * 2017-04-07 2019-02-12 Intel Corporation Apparatus and method for hierarchical beam tracing and packet compression in a ray tracing system
US11436785B2 (en) * 2017-04-07 2022-09-06 Intel Corporation Apparatus and method for hierarchical beam tracing and packet compression in a ray tracing system
US11367243B2 (en) * 2017-04-07 2022-06-21 Intel Corporation Ray tracing apparatus and method for memory access and register operations
US10580197B2 (en) * 2017-04-07 2020-03-03 Intel Corporation Apparatus and method for hierarchical beam tracing and packet compression in a ray tracing system
US10977853B2 (en) * 2017-04-07 2021-04-13 Intel Corporation Apparatus and method for hierarchical beam tracing and packet compression in a ray tracing system
US20180293784A1 (en) * 2017-04-07 2018-10-11 Carsten Benthin Ray tracing apparatus and method for memory access and register operations
US11334762B1 (en) 2017-09-07 2022-05-17 Aurora Operations, Inc. Method for image analysis
US11748446B2 (en) 2017-09-07 2023-09-05 Aurora Operations, Inc. Method for image analysis
US11170254B2 (en) 2017-09-07 2021-11-09 Aurora Innovation, Inc. Method for image analysis
KR102646818B1 (en) 2017-11-02 2024-03-13 어드밴스드 마이크로 디바이시즈, 인코포레이티드 Compressing and Decompressing Indexes in the Graphics Pipeline
EP3704665A4 (en) * 2017-11-02 2021-08-25 Advanced Micro Devices, Inc. Compression and decompression of indices in a graphics pipeline
US10600142B2 (en) * 2017-11-02 2020-03-24 Advanced Micro Devices, Inc. Compression and decompression of indices in a graphics pipeline
WO2019089160A1 (en) 2017-11-02 2019-05-09 Advanced Micro Devices, Inc. Compression and decompression of indices in a graphics pipeline
US20190172173A1 (en) * 2017-11-02 2019-06-06 Advanced Micro Devices, Inc. Compression and decompression of indices in a graphics pipeline
KR20200067222A (en) * 2017-11-02 2020-06-11 어드밴스드 마이크로 디바이시즈, 인코포레이티드 Compression and decompression of indexes in the graphics pipeline
US11189075B2 (en) 2018-08-10 2021-11-30 Nvidia Corporation Query-specific behavioral modification of tree traversal
US10825230B2 (en) 2018-08-10 2020-11-03 Nvidia Corporation Watertight ray triangle intersection
US11790595B2 (en) 2018-08-10 2023-10-17 Nvidia Corporation Method for handling of out-of-order opaque and alpha ray/primitive intersections
US10810785B2 (en) 2018-08-10 2020-10-20 Nvidia Corporation Method for forward progress tree traversal mechanisms in hardware
US11328472B2 (en) 2018-08-10 2022-05-10 Nvidia Corporation Watertight ray triangle intersection
US11804000B2 (en) 2018-08-10 2023-10-31 Nvidia Corporation Query-specific behavioral modification of tree traversal
US10740952B2 (en) 2018-08-10 2020-08-11 Nvidia Corporation Method for handling of out-of-order opaque and alpha ray/primitive intersections
US11164360B2 (en) 2018-08-10 2021-11-02 Nvidia Corporation Method for handling of out-of-order opaque and alpha ray/primitive intersections
US11704863B2 (en) 2018-08-10 2023-07-18 Nvidia Corporation Watertight ray triangle intersection
US10885698B2 (en) 2018-08-10 2021-01-05 Nvidia Corporation Method for programmable timeouts of tree traversal mechanisms in hardware
US11928772B2 (en) 2018-08-10 2024-03-12 Nvidia Corporation Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
US11455768B2 (en) 2018-08-10 2022-09-27 Nvidia Corporation Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
US11170556B2 (en) * 2019-07-04 2021-11-09 Lg Electronics Inc. Apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data and a method for receiving point cloud data
US11394979B2 (en) * 2020-01-09 2022-07-19 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US11631158B2 (en) * 2020-03-18 2023-04-18 Lg Electronics Inc. Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US11321903B2 (en) * 2020-03-27 2022-05-03 Advanced Micro Devices, Inc. Bounding volume hierarchy compression
US11270495B2 (en) 2020-05-21 2022-03-08 Nvidia Corporation Scattered geometry compression for ray tracing acceleration structures
US11823320B2 (en) 2020-05-21 2023-11-21 Nvidia Corporation Scattered geometry compression for ray tracing acceleration structures
US11282261B2 (en) 2020-06-10 2022-03-22 Nvidia Corporation Ray tracing hardware acceleration with alternative world space transforms
US11302056B2 (en) 2020-06-10 2022-04-12 Nvidia Corporation Techniques for traversing data employed in ray tracing
US11380041B2 (en) 2020-06-11 2022-07-05 Nvidia Corporation Enhanced techniques for traversing ray tracing acceleration structures
US11816783B2 (en) 2020-06-11 2023-11-14 Nvidia Corporation Enhanced techniques for traversing ray tracing acceleration structures
US11450057B2 (en) 2020-06-15 2022-09-20 Nvidia Corporation Hardware acceleration for ray tracing primitives that share vertices
US11508112B2 (en) 2020-06-18 2022-11-22 Nvidia Corporation Early release of resources in ray tracing hardware
US11854141B2 (en) 2020-06-18 2023-12-26 Nvidia Corporation Early release of resources in ray tracing hardware
CN112559040A (en) * 2020-12-02 2021-03-26 北京百度网讯科技有限公司 Instruction execution method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
US20160070767A1 (en) 2016-03-10
US10866990B2 (en) 2020-12-15
US9582607B2 (en) 2017-02-28
US10235338B2 (en) 2019-03-19
US20160071310A1 (en) 2016-03-10
US20190324991A1 (en) 2019-10-24
US20160071312A1 (en) 2016-03-10
US20160070820A1 (en) 2016-03-10
US9569559B2 (en) 2017-02-14
US10025879B2 (en) 2018-07-17

Similar Documents

Publication Publication Date Title
US10866990B2 (en) Block-based lossless compression of geometric data
US10032289B2 (en) Relative encoding for a block-based bounding volume hierarchy
US10242485B2 (en) Beam tracing
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
US9946666B2 (en) Coalescing texture access and load/store operations
US10984049B2 (en) Performing traversal stack compression
US10417817B2 (en) Supersampling for spatially distributed and disjoined large-scale data
US9224235B2 (en) System, method, and computer program product for compression of a bounding volume hierarchy
US10055883B2 (en) Frustum tests for sub-pixel shadows
US10114760B2 (en) Method and system for implementing multi-stage translation of virtual addresses
US11941752B2 (en) Streaming a compressed light field
US10068366B2 (en) Stereo multi-projection implemented using a graphics processing pipeline
US10699427B2 (en) Method and apparatus for obtaining sampled positions of texturing operations
US10861230B2 (en) System-generated stable barycentric coordinates and direct plane equation access
US9905037B2 (en) System, method, and computer program product for rejecting small primitives
US20140267276A1 (en) System, method, and computer program product for generating primitive specific attributes
US11379420B2 (en) Decompression techniques for processing compressed data suitable for artificial neural networks
US11501467B2 (en) Streaming a light field compressed utilizing lossless or lossy compression
CN109643279B (en) Method and apparatus for memory compression using virtual-to-virtual address tables
US11823318B2 (en) Techniques for interleaving textures

Legal Events

Date Code Title Description
AS Assignment

Owner name: NVIDIA CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEHTINEN, JAAKKO T.;AILA, TIMO OSKARI;KARRAS, TERO TAPANI;AND OTHERS;SIGNING DATES FROM 20150608 TO 20150610;REEL/FRAME:036262/0771

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION