CN100442264C

CN100442264C - Packet processing systems and methods

Info

Publication number: CN100442264C
Application number: CNB2006101361708A
Authority: CN
Inventors: 陈文中; 李亮; S-Y·乔伊斯·程
Original assignee: Via Technologies Inc
Current assignee: Asahi Electronics (Shanghai) Co., Ltd.
Priority date: 2005-10-14
Filing date: 2006-10-13
Publication date: 2008-12-10
Anticipated expiration: 2026-10-13
Also published as: CN1952918A; TWI326830B; TW200715130A; US20070088877A1; US7657679B2

Abstract

Packet processing system and method embodiments implemented in a peripheral component interconnect-express (PCIE) compliant system are disclosed. One method embodiment, among others, comprises receiving a packet having at least a first type of data and a second type of data over a PCIE connection, and segregating the entire packet into two contiguous groups, a first group comprising the first type of data and a second group comprising the second type of data.

Description

Packet processing systems and method

Technical field

The present invention is relevant for a kind of computer system, particularly relevant for data transmission system and method in a kind of computing machine transmission system.

Background technology

CPU (central processing unit) is carried out data transmission by other assembly in electric wire or internal bus and storer and the computer system.Intraware can carry out data transmission by bus (being commonly referred to expansion bus (expansion bus)) and external device (ED).The method of data transmission is carried out in definition by bus in many standards.For example, (peripheral component interconnect, PCI) standard is developed local bus (local bus) standard that by INTEL to peripheral element connecting interface.Local bus comprises the data bus that is connected directly to microprocessor.Another kind is called quick peripheral element connecting interface, and (PCI-Express, standard PCIE) is to connect bus standard in a kind of I/O (I/O), comprises defined communication protocol and framework.The PCIE standard is the extension of PCI standard, for example data transmission rate is doubled.(different with the single panel data bus of PCI) deliver data in package along two pairs of Point-to-Point Data paths in explanation both-end sequence connection (2-way serial connection) in PCIE.The high data transmission rate that connects (for example 1394b, USB2.0, InfiniBand and Fast Ethernet network (Gigabit Ethernet)) in PCIE is used at a high speed.

The test of PCIE cannot be omitted byte during being to write the operation (for example write store) of assembly.For example, the template in the graphics process application program (stencil) (s-data) is handled the content that does not need to use whole package with the degree of depth (z-data) operation or color/alpha value (alpha).For example, about the operation of the template and the degree of depth, the z-data have occupied three bytes in four bytes, and the s-data have occupied byte in four bytes, and the calculating of z value is important operation (getting rid of the s-data).The method of tradition head it off roughly is divided into two big classes.Wherein a kind of solution is to carry out read operation before carrying out write operation, writes to enable to merge, no matter make that whether needing to carry out the position (for example template byte) of write operation all can be write again.Yet, such read very inefficent and can influence usefulness with wiring method.

Another method is that package is divided into administrative unit, to obtain the byte enable characteristic of conventional P CI standard.Just, the conventional P CI standard front end and the end portion that are included in package provides byte mask (byte mask) (part that whole package content is promptly only arranged).For example, concerning the package of 512-position, package can be split into eight transactions (transaction), and each transaction has eight bytes (for example the 4-bit mask is positioned at front end, and the 4-bit mask is positioned at end).Just, concerning each fragment, byte mask can be enabled for only being used for front end or only being used for terminal byte, with the write operation of processing selecting.The shortcoming of the method is must add a header (header) to each divided package, makes and can reduce usefulness through extra packet header.

Summary of the invention

In view of this, the invention provides a kind of packet processing systems and method in quick peripheral element connecting interface environment.Such system receives at least one packet data.Package can have at least a different data kenel, and the data kenel does not need special access requirement.For example, in the graphics process environment, use PCIE communication protocol and can use two kinds of different data kenels (depth data (z-data) and template data (s-data)).Depth data is handled (z-data) may be needed the z-data in the package are carried out read or write operation, also comprises template data in this package, but does not need template data is carried out write operation in depth data is handled.In the embodiment of some packet processing systems z-data and template data being separated into continuous byte, is respectively the first template data group and the 2nd z-data group.Group can enable packet processing systems and optionally another group that is excluded be carried out write operation.

In brief, the present invention discloses a kind of packet processing systems, is arranged in the quick peripheral element connecting interface system.Receiver receives a package via a quick peripheral element connecting interface, and this package has at least one first kenel data and one second kenel data.Separation vessel is separated into two continuous groups with described package, and one first group comprises the described first kenel data, and one second group comprises the described second kenel data.The exchange logic unit is set in order to the described first kenel data and the second kenel data are carried out exchange.The mask logic unit is set in order to the described first kenel data and the second kenel data are carried out shielding.

In addition, the present invention discloses a kind of method for processing packet, be applicable to the package in the quick peripheral element connecting interface system of processing, comprise by a quick peripheral element connecting interface receiving described package that described package has at least one first kenel data and one second kenel data; And described package is separated into two continuous groups, wherein one first group comprises that described first kenel data and one second group comprise the described second kenel data.

In addition, the present invention discloses a kind of graphic system, comprises a quick peripheral element connecting interface and a Graphics Processing Unit.Graphics Processing Unit is coupled to described quick peripheral element connecting interface, comprise a package logical block, described package logical block is set in order to receive the package with at least two kinds of kenel data by described quick peripheral element connecting interface, and described package is divided into two continuous groups, one first group comprises one first kenel data, and one second group comprises one second kenel data.

Description of drawings

Fig. 1 is the block diagram that shows according to the described graphic system of the embodiment of the invention, and graphic system system is as the environment of implementing packet processing systems (with method).

Fig. 2 A shows to divide the graphic system of Fig. 1 and the functional block diagram of packet processing systems according to the described selection portion of the embodiment of the invention.

Fig. 2 B is the functional block diagram that shows according to the packet processing systems of the described Fig. 2 A of the embodiment of the invention.

Fig. 3 shows the framework of the pre-service package content of being implemented by the packet processing systems of Fig. 2 B and the synoptic diagram of byte mask.

Fig. 4 shows that the pre-service package execution to Fig. 3 shields the aftertreatment package content that is produced with swap operation.

Fig. 5 is the process flow diagram that shows according to the described method for processing packet of the embodiment of the invention.

The primary clustering symbol description

10: graphic system

100,100a: packet processing systems

102: display device

The 103:PCIE wiring

104, DIU: display interfaces unit

106: local storage

110, MIU: storer boundary element

114, GPU: Graphics Processing Unit

118, PCIE BIU:PCIE Bus Interface Unit

122: chipset

124: system storage

126, CPU: CPU (central processing unit)

150: driver software

220, BCI: impact damper control initial cell

222, VS: vertex shader

224, TSU: triangle generation unit

226, STG: span and brick generation unit

The 228:ZL1 unit

The 230:ZL1 high-speed cache

The 232:ZL2 unit

The 234:Z high-speed cache

236,238:P unit

240, PS: pixel coloring device

242: texture cache

The 244:ZL3 unit

246: the purpose unit

The 248:D high-speed cache

260: separation vessel

262: the mask logic unit

264: the exchange logic unit

266: write logical block

268: receiver

300,400, A, B: package

302,304: byte mask

The 303:s-data

The 305:z-data

The 402:z-data group

The 404:s-data group

Embodiment

For above-mentioned and other purpose of the present invention, feature and advantage can be become apparent, cited below particularlyly go out preferred embodiment, and conjunction with figs., be described in detail below:

Embodiment:

The various embodiment of open packet processing systems of the present invention and method.Such packet processing systems and method are used in whole package content (or whole package) with byte mask, with enable to circuit unit carry out to select to write with and/or read operation.Can improve processing speed and usefulness (comparing) by byte mask being used in whole package content with legacy system.As mentioned above, conventional P CI system can be applied to byte mask the end or the front end of package, but not is used for the whole contents of package.Such legacy system must be decomposed into package the management fragment and header is attached to each fragment.Because additional header can make processing time and storage requirements increase, and therefore can reduce treatment efficiency.Packet processing systems disclosed in this invention and method do not need to cut apart the package content and header are attached to each fragment or do not need to carry out reading and write operation in the legacy system.Therefore, packet processing systems disclosed in this invention and method can be carried out continuous write operation to assembly.

Described according to the embodiment of the invention is that the packet processing systems and the method for background comprises Graphics Processing Unit with the graphics process environment, in order to produce the processing degree of depth (z) data and template (s) data of triangle (or other is primary) and PCIE bus.Yet bus communication protocol and standard that those skilled in the art all understand other can be contained in the scope of the present invention equally.Moreover those skilled in the art all understand, even the present invention only illustrates write operation, the disclosed principle of the embodiment of the invention also can be applicable in the read operation.In addition, even template and depth data only are described in instructions, same operation also can be used in the data of other kenel, for example the alpha value data is carried out with color (for example RGB) data and is separated or swap operation.

Fig. 1 is the block diagram that shows according to the described graphic system 10 of the embodiment of the invention, and graphic system 10 is as the environment of implementing packet processing systems 100 (with method).Graphic system 10 can be set at computer system in certain embodiments.Graphic system 10 can comprise by display interfaces unit (display interface unit, display device 102 that DIU) is driven and local storage 106 (comprising for example display buffer, texture buffer, instruction buffer and frame buffer (framebuffer) etc.).Local storage 106 can also be called frame buffer, storage element or storer at this.Local storage 106 is by storer boundary element (memory interface unit, MIU) 110 and be coupled to Graphics Processing Unit (graphics processing unit, GPU) 114.According to one embodiment of the invention, MIU 110, GPU 114 and DIU 104 are coupled to Bus Interface Unit (businterface unit, BIU) 118 with the PCIE compatibility.For example, according to the described PCIE BIU118 of the embodiment of the invention can by use graphics addresses replay firing table (graphics address remapping table, GART) or other memory mapped mechanism and realizing.BIU 118 and GPU 114 can be by PCIE wiring 103 coupled in communication, and by PCIE wiring 103 can provide data with and/or instruction.In an embodiment of the present invention BIU 118 and MIU 110 are set in order to (double data rate, DDR) the memory communication agreement transmits or receive data according to PCIE communication protocol and Double Data Rate respectively.

BIU 118 is coupled to chipset 122 (for example north bridge chipset) or switch.Chipset 122 comprises the interface electronics, in order to will strengthening from the signal of CPU (central processing unit) 126, and will be back and forth separates in the signal of system storage 124 and the signal in the input/output device (not shown) back and forth.Even in embodiments of the present invention by the PCIE bus communication protocol carry out between primary processor and the GPU 114 be connected with and/or communication, yet in other embodiments also can by other method (for example PCI and exclusive high-speed bus etc.) carry out between primary processor and the GPU 114 be connected with and/or communicate by letter.System storage 124 also comprises graphics application program (not shown) and driver software 150, and driver software 150 will instruct or order the buffer that is sent among GPU 114 and the DIU 104 by CPU126.Driver software 150 or the unit with identical function can be stored in the system storage 124, and (central processing unit CPU) 126 carries out by CPU (central processing unit).In an embodiment of the present invention, driver software 150 provides coding and decoding (for example colour coding (shader code)) to GPU 114, handles to carry out in GPU 114.

Employed in certain embodiments extra Graphics Processing Unit is coupled to assembly shown in Figure 1 through chipset 122 and by the PCIE bus communication protocol.Can comprise assemblies all among Fig. 1 according to the described graphic system 10 of the embodiment of the invention, or comprise than Fig. 1 still less with and/or different assemblies.In addition, can use extra assembly in certain embodiments, for example be coupled to the South Bridge chip of chipset 122.

Packet processing systems 100 can by hardware, software with and/or firmware realize.When packet processing systems 100 is realized by hardware (for example package (P) unit among Fig. 2 A), this hardware can be realized by following any one or combination of the prior art: the discrete logic with logic gate, in order to realize logic function according to data-signal, ASIC(Application Specific Integrated Circuit) (application specific integrated circuit with suitable combinational logic gate, ASIC), programmable gate array (programmable gate array, PGA) and field programmable gate array (fieldprogrammable gate array, FPGA) etc.

When packet processing systems 100 is realized by software or firmware (for example handling) by driver software 150 control hardwares, comprise that so the driver software 150 in order to the sequential list of the executable instruction that realizes logic function can be contained in the computer-readable recording medium of computing machine, to use by instruction execution system or device (for example based on the system of computing machine, comprise the system of processor or other can obtain from the instruction of instruction execution system or device and other system of execution command) or to be connected with it.In instructions, computer-readable medium can for anyly keep, the device of storage or transmission procedure, this program is used by instruction execution system or device or is connected with it.The computer-readable system can be for example electronics, magnetic, optics, electromagnetism, infrared ray or semiconductor system, device or transmission medium, however its unavailable scope with restriction the present invention.Computer-readable medium can also comprise electric connection (electronics), portable computer disk (magnetic), random access memory (the random access memory with at least one lead, RAM) (electronics), ROM (read-only memory) (read-only memory, ROM) (electronics), EPROM (Erasable Programmable Read Only Memory) (erasable programmable read-only memory, EPROM) (electronics), flash memory, (electronics), optical fiber (optics) and formula CD (optics) only.It should be noted that, computer-readable medium can be printed in last paper or other suitable medium for having program, and program is electrically caught by paper or other medium are carried out photoscanning, compiling, decipher or carry out and be stored in the computer memory with suitable method where necessary.

In addition, the scope of some embodiments of the invention comprises that function with preferred embodiment of the present invention is used in the logic in hardware or the software set medium.

Fig. 2 A is the partial function block diagram that shows according to the described GPU 114 of the embodiment of the invention, comprises the packet processing systems 100 that is denoted as 100a.GPU 114 can comprise initial (the buffercontrol initialization of impact damper control, BCI) unit 220, vertex shader (vertex shader, VS) 222, triangle generation unit (triangle setup unit, TSU) 224, span and brick generation unit (span and tile generation, STG) 226, ZL1 unit 228, ZL1 high-speed cache 230, ZL2 unit 232, Z high-speed cache 234, P unit 236 and 238, pixel coloring device (pixel shader, PS) 240, texture (T) high-speed cache 242, ZL3 unit 244, purpose (D) unit 246 and D high-speed cache 248.The function of at least one assembly can fixed-function unit realizes or realizes by the sign indicating number that use is implemented in processing unit able to programme.Data or instruction that BCI unit 220 receives from Bus Interface Unit (for example BIU among Fig. 1 118), and begin to handle vertex data.P unit 236 with 238 and ZL1 high-speed cache 230, D high-speed cache 248 be connected with storer boundary element (for example MIU 110 and BIU 118) respectively.It should be noted that in certain embodiments P unit 236 and 238 can be contained in respectively in Z high-speed cache 234 and the T high-speed cache 242.Although packet processing systems in certain embodiments 100 comprises less or more assembly, comprise packet processing systems 100a (shown in dotted line) according to the described P of one embodiment of the invention unit 236 and 238 (being called the package logical block herein respectively or jointly).For example, packet processing systems 100a can also comprise driver software 150, driver software 150 be set to be used for controlling P unit 236 and 238 with and/or the execution of core processor (for example engine), or can be contained in certain embodiments in whole Graphics Processing Unit 114 or the graphic system 10.

Fig. 2 B is the functional block diagram that shows according to the described packet processing systems 100 of the embodiment of the invention.As shown in the figure, packet processing systems 100 comprises separation vessel (segregator) 260, receiver 268, writes logical block 266 and driver software 150.Separation vessel 260 also comprises mask logic unit 262 and exchange logic unit 264.Separation vessel 260 is set in order to whole package is divided into two continuous groups, and first group comprises that the first kenel data and second group comprise the second kenel data.Receiver 268 is set to by PCIE wiring (for example from BIU 118) and receives data.Writing logical block 266 is set in order to data write cache (for example Z high-speed cache 234, T high-speed cache 242).Driver software 150 is set in order to adjust and the function of controlling receiver 268 with separation vessel 260.Those skilled in the art all understand for each

packet element

236 and 238, and at least one logical block in the packet processing systems 100 disclosed according to the present invention (for example 260,268,266 etc.) can be replicated; Or in certain embodiments, at least one logical block in the packet processing systems 100 (for example 260,268,266 etc.) can be shared by

packet element

236 and 238.

With reference to Fig. 2 A and Fig. 2 B,

P unit

236 and 238 in an embodiment of the present invention comprises logic gate, comprise buffer, buffer is set in order to enable to carry out in (for example edge calculations) between other function the function of shielding (mask logic unit 262) and byte exchange (exchange logic unit 264).ZL2 unit 232 and ZL Unit 3 244 access Z high-speed caches 234.D unit 246 is coupled to PS 240 and ZL3 unit 244, in order to the execution colouring function, and access D high-speed cache 248.PS 240 access T high-speed caches 242, it is equivalent to the texture processing performed according to mechanisms known.It should be noted that in certain embodiments at least one assembly shown in Fig. 2 A and Fig. 2 B can be merged into single component, otherwise the function of single component can be distributed between two assemblies at least.

In operation, the instruction that BCI 220 receives from driver software 150 or other software is to draw triangle or other basic pattern (primitive).BCI 220 also receives vertex information according to the triangle that is about to draw.Vertex information is sent to VS 222, to carry out the summit conversion.VS 222 can be included in painted programming or the code of carrying out in the programmable unit (for example engine among core processor or the GPU 114).In certain embodiments, VS 222 can fixed-function unit realize.What pay special attention to is, object is converted to work space and screen space to form triangle by object space.Triangle is sent to TSU224, and TSU 224 collects primary, and carries out known work between other known function, for example produces and delimit frame, eliminates, produces edge function and refusal triangle form class.TSU 224 is sent to STG unit 226 with data, and STG unit 226 provides brick to produce (tile generation) function, so data object can be split into several floor tiles (for example 8*8 or 16*16 etc.) and be sent to ZL1 unit 228.ZL1 unit 228 is carried out the z value respectively as ZL2 unit 232 and ZL3 unit 244 and is handled, and for example the z value is carried out high-order and eliminates (high level rejection) (for example: compare with low order is superseded, high-order is eliminated and consumed less position).ZL unit 228,232 combines with ZL1 high-speed cache 230, Z high-speed cache 234 and Z high-speed cache 234 respectively with 244 operation.PS 240 can be included in the tinter of carrying out in the programmable unit (for example engine among core processor or the GPU114), and programmable unit receives texture and pipeline (pipeline) data, and provides and export D unit 246 and ZL3 unit 244 to.In certain embodiments, PS 240 can comprise fixed-function unit.D unit 246 is set in order to the value in Z high-speed cache 234 or high-speed cache 248 with ZL3 unit 244 and upgrades preceding alpha value test and the template test carried out.

P unit 236 and 238 is handled package (for example: carry out following separating and function of exchange) and is corresponded respectively to z-data and the s-data that are stored in Z high-speed cache 234 and the T high-speed cache 242.For example, primary application program can require to handle being carried out by the obtained surface of z-data (getting rid of the s-data).The requirement of primary application program communicates and realizes via BIU 118 and GPU 114 by driver software 150.Buffer among the driver software 150 programming GPU 114 and the core processor (for example engine) among the indication GPU 114 enable the form that this only has z.The instruction that core processor is passed on according to the driver software in the primary application program 150 produces shielding, and shielding is stored in can be by at least one buffer of P unit 236 and 238 accesses, so that P unit 236 with and/or 238 preceding via the necessary package form (form that z is promptly only arranged) of BIU 118 or MIU 110 outputs, carry out and separate or function of exchange.For example, P unit 238 receives the data of the pre-service package form (with reference to Fig. 3, this package is denoted as 300) from BIU 118 according to the reading requirement that is sent to BIU 118.The package address relevant with reading (or writing) operation can produce by the core processing unit (for example engine) among the GPU 114.With reference to Fig. 3, package 300 comprises two kinds of data of different types, comprises template (s) data 303 and the degree of depth or z-data 305.In this embodiment, three continuous z-data 305 bytes (z0 for example, z0, z0) single template (s) data 303 (for example s0) (in figure and Fig. 4 each comprise or the block of s data is represented a byte) of arranging in pairs or groups.Masking operation is carried out by using 302 pairs of whole package contents of byte mask in P unit 238, and swap data has the pixel package 400 of aftertreatment package form with formation, and pixel package 400 comprises two indivedual continuous groups 402 (z-data) and 404 (s-data) (as shown in Figure 4).P unit 238 a group group 402 or 404 at least writes T high-speed cache 242.It should be noted that P unit 238 can write in the T high-speed cache 242 connecing with the z data, but write operation only betides as shown in Figure 3 mixed format (for example package 300) in this embodiment.

About P unit 236, the data in the Z high-speed cache 234 are the pre-service package forms shown in the package 300 of Fig. 3.For example, P unit 236 is carried out masking operation according to the requirement that writes to BIU 118 to the package 300 that is stored in the high-speed cache 234, and the data of package exchange by P unit 236.After carrying out above-mentioned shielding and swap operation, the form of data is the form of the pixel package 400 that comprises aftertreatment package form as shown in Figure 4.Being denoted as the not at the same level of A and B in Fig. 2 will illustrate in Fig. 3 and Fig. 4.

With reference to the package 300 of Fig. 3, package 300 representatives are denoted as the package (pre-service package form) of A in Fig. 2 A.As mentioned above, the kenel that repeats in the package 300 comprises combination (three continuous z data 305 bytes (for example z0, z0, z0) collocation single template data 303 bytes (for example s0)) for example of at least two kinds of different types of data.In operation, when only needing (getting rid of s-data 303) to the 305 execution write operations of z-data, P unit 236 (or P unit 238, do explanation with P unit 236 in this embodiment) can carry out the byte enable operation to whole package content 300.Just, P unit 236 is used for whole package 300 with byte mask 302, and the bit-type attitude of byte mask 302 is s-data 303 are deenergized and to enable the z-data.Therefore, P unit 236 utilizes and has the byte mask 302 of data kenel for 11101110...1110.Just, P unit 236 adds 0 value in per 4 positions, and place value is the 0 representative function (even mask bit maintenance initial value) of deenergizing.Place value is 1 to represent ena-bung function, is equivalent to allow the position of conductively-closed to pass through.Those skilled in the art all understand disclosed in embodiments of the present invention shielding place value and function can have opposite function (i.e. 1 representative is deenergized, and 0 representative enables) in certain embodiments.

It should be noted that the data kenel can be anti-phase (shown in the byte mask 304) of bit-type attitude, just 00010001...0001 when expectation is carried out write operation to s-data 303 (getting rid of z-data 305).In addition, when expectation was passed through all positions, the mask bit kenel can all be 1 (not shown).Therefore, P unit 236 (with 238) optionally carries out write operation to the combination of package content 300 and continuous byte by byte mask 302.

Be denoted as the pixel package 400 of B among Fig. 4 displayed map 2A and show aftertreatment package form, aftertreatment package form comprises two continuous z-data group 402 and s-data group 404.Pixel package 400 can write in the local storage 106 (or can one of group 402 and 404 be write T high-speed cache 242 according to the requirement that writes to BIU 118 by P unit 238, perhaps s-data and z-data all can above-mentioned mixed format write T high-speed cache 242) via MIU 110 or BIU 118.As shown in the figure, z-data group 402 is separated from one another with s-data group 404, to enable to select to write the continuous position or the data of byte.In this embodiment, with the front end (for example preceding 48 bytes) of z-data-moving all in the group 402 (for example exchange) to package, and with the end (for example back 16 bytes) of s-data-movings all in the group 404 to package.For example, because the shielding place value is 0, so s-data group 404 employed 16 bytes can be retained.As described in Fig. 2 A, be that 48 byte enable of 1 are only carried out write operations (getting rid of s-data group 404) to z data group 402 corresponding to the shielding place value.When expectation was carried out write operations to s-data group 404, same can be with the shielding place value, and 1 part replaced with 0.

Fig. 5 shows according to the described method for processing packet 100b of the embodiment of the invention, method for processing packet 100b be subjected in conjunction with driver software 150 and P unit 236 with and/or 238 control.Comprise the package (502) that has at least the first kenel data and the second kenel data by PCIE wiring reception according to the described method of the embodiment of the invention, and whole package is separated into two continuous groups, and first group comprises that the first kenel data and second group comprise the second kenel data (504).

The explanation of any step or the square in the process flow diagram can be regarded as representing the program code of module, fragment or part among Fig. 5, program code comprises at least one executable instruction in order to specific logical function in the performing step, those skilled in the art all understand, and also can be contained in the scope of the present invention to be different from said sequence execution function (comprise and carry out simultaneously or carry out with opposite order) in other embodiments.

Though the present invention with preferred embodiment openly as above; yet it is not in order to limiting scope of the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; permission can be done some and change and retouching, so protection scope of the present invention is as the criterion when looking the qualification person of claim institute.

Claims

1. a packet processing systems is arranged in the quick peripheral element connecting interface system, comprising:

One receiver, in order to receive a package via a quick peripheral element connecting interface, described package has at least one first kenel data and one second kenel data;

One separation vessel, in order to described package is separated into two continuous groups, one first group comprises that described first kenel data and one second group comprise the described second kenel data;

Described separation vessel comprises:

One exchange logic unit is set in order to the described first kenel data and the second kenel data are carried out exchange; And

One mask logic unit is set in order to the described first kenel data and the second kenel data are carried out shielding.

2. packet processing systems as claimed in claim 1 comprises that also one writes logical block, in order to one of described two continuous groups are write to get rid of another group.

3. packet processing systems as claimed in claim 2, the wherein said logical block that writes also is set in order to one of described two continuous groups are write a storer by a storer boundary element.

4. packet processing systems as claimed in claim 2, the wherein said logical block that writes also is set in order to one of described two continuous groups are write a Bus Interface Unit, and described Bus Interface Unit is coupled to one of a system storage, a primary processor and a chipset or any combination.

5. packet processing systems as claimed in claim 2, the wherein said first kenel data are that a continuous template data and described second kenel data are continuous depth datas, and the said write logical block also is set in order at least one of described continuous template data byte, described continuous depth data byte carried out write operation.

6. packet processing systems as claimed in claim 2, the wherein said first kenel data are that a continuous color data and described second kenel data are continuous alpha value data, and the said write logical block also is set in order at least one of described continuous color data byte, described continuous alpha value data byte carried out write operation.

7. packet processing systems as claimed in claim 1, the wherein said first kenel data are that a continuous template data and described second kenel data are continuous depth datas, described mask logic unit also is set in order to produce one and enables mask bit or a mask bit or enable and the deenergize mask bit combination of deenergizing, and described mask bit is used in depth data and template data.

8. packet processing systems as claimed in claim 1 also comprises a driver software, sets in order to regulate and the function of controlling described receiver and separation vessel.

9. a method for processing packet is applicable to the package in the quick peripheral element connecting interface system of processing, comprising:

Receive described package by a quick peripheral element connecting interface, described package has at least one first kenel data and one second kenel data; And

Described package is separated into two continuous groups, and one first group comprises that described first kenel data and one second group comprise the described second kenel data;

Described separation comprises carries out swap operation to the described first kenel data in the second kenel data, and the described first kenel data are carried out masking operation in the second kenel data.

10. method for processing packet as claimed in claim 9 also comprises write operation, and one of described two continuous groups are write to get rid of another group.

11. method for processing packet as claimed in claim 10, wherein said write operation comprise one of described two continuous groups are write a storer by a storer boundary element.

12. method for processing packet as claimed in claim 10, wherein said write operation comprises one of described two continuous groups is write a Bus Interface Unit that described Bus Interface Unit is coupled to one of a system storage, a primary processor and a chipset or any combination.

13. method for processing packet as claimed in claim 10, the wherein said first kenel data are that a continuous template data and described second kenel data are continuous depth datas, and said write operation comprises that to described continuous template data byte, described continuous depth data byte at least one carry out write operation.

14. method for processing packet as claimed in claim 10, the wherein said first kenel data are that a continuous color data and described second kenel data are continuous alpha value data, and said write operation comprises that to described continuous color data byte, described continuous alpha value data byte at least one carry out write operation.

15. method for processing packet as claimed in claim 9, the wherein said first kenel data are that a continuous template data and described second kenel data are continuous depth datas, described masking operation comprises that producing one enables mask bit or a mask bit or the mask bit combination that enables and deenergize of deenergizing, and described mask bit is used in depth data and template data.

16. method for processing packet as claimed in claim 9 comprises that also adjusting receives described package and separates described package with control.

17. a graphic system comprises:

One quick peripheral element connecting interface; And

One Graphics Processing Unit, be coupled to described quick peripheral element connecting interface, described Graphics Processing Unit comprises a package logical block, described package logical block is set in order to receive the package with at least two kinds of kenel data by described quick peripheral element connecting interface, and described package is divided into two continuous groups, one first group comprises that one first kenel data and one second group comprise one second kenel data, the described first kenel data are carried out swap operation in the second kenel data, and the described first kenel data are carried out masking operation in the second kenel data.

18. graphic system as claimed in claim 17, the wherein said first kenel data are that depth data and the described second kenel data are template datas.

19. graphic system as claimed in claim 17, the wherein said first kenel data are that color data and the described second kenel data are alpha value data.

20. graphic system as claimed in claim 17 also comprises a driver software, described driver software is set in order to provide described package to described Graphics Processing Unit by described quick peripheral element connecting interface.