US9129060B2 - QoS based dynamic execution engine selection - Google Patents

QoS based dynamic execution engine selection Download PDF

Info

Publication number
US9129060B2
US9129060B2 US13/272,975 US201113272975A US9129060B2 US 9129060 B2 US9129060 B2 US 9129060B2 US 201113272975 A US201113272975 A US 201113272975A US 9129060 B2 US9129060 B2 US 9129060B2
Authority
US
United States
Prior art keywords
instruction
store
core
execution
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/272,975
Other versions
US20130097350A1 (en
Inventor
Najeeb I. Ansari
Michael Carns
Jeffrey Schroeder
Bryan Chin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cavium International
Marvell Asia Pte Ltd
Original Assignee
Cavium LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cavium LLC filed Critical Cavium LLC
Priority to US13/272,975 priority Critical patent/US9129060B2/en
Assigned to Cavium, Inc. reassignment Cavium, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ANSARI, NAJEEB I., CARNS, MICHAEL, SCHROEDER, JEFFREY, CHIN, BRYAN
Publication of US20130097350A1 publication Critical patent/US20130097350A1/en
Priority to US14/828,884 priority patent/US9495161B2/en
Application granted granted Critical
Publication of US9129060B2 publication Critical patent/US9129060B2/en
Assigned to JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT reassignment JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT SECURITY AGREEMENT Assignors: CAVIUM NETWORKS LLC, Cavium, Inc.
Assigned to CAVIUM NETWORKS LLC, QLOGIC CORPORATION, CAVIUM, INC reassignment CAVIUM NETWORKS LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: JP MORGAN CHASE BANK, N.A., AS COLLATERAL AGENT
Assigned to CAVIUM, LLC reassignment CAVIUM, LLC CERTIFICATE OF CONVERSION AND CERTIFICATE OF FORMATION Assignors: Cavium, Inc.
Assigned to CAVIUM INTERNATIONAL reassignment CAVIUM INTERNATIONAL ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CAVIUM, LLC
Assigned to MARVELL ASIA PTE, LTD. reassignment MARVELL ASIA PTE, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CAVIUM INTERNATIONAL
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/362Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities

Definitions

  • QoS quality of service
  • a QoS scheme may guarantee a required bit rate, delay, jitter, packet dropping probability and/or bit error rate.
  • QoS guarantees are important for real-time streaming multimedia applications that are delay sensitive and have fixed bit rates, such as voice over IP, online games and video.
  • a host or software will often view the processor as one machine despite the processor having multiple cores.
  • the host or software runs several simultaneous processes, it will treat the processor as one machine, when it could be advantageous to treat it as multiple machines for the multiple processes.
  • a processor comprises a plurality of processing cores, and a plurality of instruction stores, each instruction store storing at least one instruction, each instruction having a corresponding group number, each instruction store having a unique identifier.
  • the processor also comprises a group execution matrix comprising a plurality of group execution masks and a store execution matrix comprising a plurality of store execution masks.
  • the processor also comprises a core selection unit configured to, for each instruction within each instruction store, select a store execution mask from the store execution matrix using the unique identifier of a selected instruction store as an index.
  • the core selection unit is further configured to, for each instruction within each instruction store, select at least one group execution mask from the group execution matrix using the group number of at least one selected instruction from the selected instruction store as an index.
  • the core selection unit is configured to, for each instruction within the instruction store and for each group execution mask of the at least one group execution masks, perform logic operations on the selected group execution mask and the store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores.
  • the core selection unit is further configured to perform a bitwise and—operation on the selected group execution mask and the selected store execution mask to create the core request mask corresponding to the selected instruction store.
  • the processor also comprises an arbitration unit configured to determine instruction priority among each instruction, each instruction store having at least one corresponding core request mask, accordingly assign an instruction for each available core, where the core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores, and signal the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
  • a method comprises, on the clock cycle of a processor with a plurality of cores and plurality of instruction stores, and for each instruction within the instruction stores, selecting a store execution mask from a store execution matrix using a unique identifier of a selected instruction store as an index and selecting at least one group execution mask from a group execution matrix using a group number corresponding to an instruction of the selected instruction store as an index.
  • logic operations are performed on at least the selected group execution mask and the selected store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores, each core request mask added to a core request matrix indexed by the unique identifier of each instruction store. Then, on the clock cycle of the processor, arbitrating to determine instruction priority among the individual instructions corresponding to the plurality of core request masks, assigning an instruction to each available core, where a core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores, signaling the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
  • the instruction store can also include a queue, and the core selection unit can be configured to select one group number corresponding to the instruction at the front of this queue.
  • the instruction store can also be configured to dispatch an instruction to any of the plurality of cores.
  • Each instruction store can be assigned to one of a plurality of virtual functions.
  • the arbitration unit can determine instruction priority among the virtual functions by a method of hardware arbitration.
  • Virtual function arbitration units can determine instruction priority within the virtual function by a method of hardware arbitration.
  • the virtual function arbitration units can be configured to determine instruction priority among the instruction stores.
  • the virtual functions can interface with a host, receive instructions and distribute instructions to its corresponding instruction stores.
  • the core selection unit can perform a bitwise and—operation on a core availability vector, the selected group execution mask, the selected store execution mask, and the core availability vector to create the core request mask corresponding to the selected instruction store.
  • the processor can also comprise a dispatch unit that receives a unique identifier of the one instruction store and an identification number of an available core and produces a signal to the selected instruction store to issue an instruction to the available core indicated by the identification number.
  • the group execution matrix and store execution matrix is set to affect the quality of service of a physical function or a virtual function among the plurality of cores.
  • Instruction stores can include compression instruction stores, cryptography instruction stores, video processing instruction stores, image processing instruction stores, or general instruction stores.
  • Each instruction store is assigned to a physical function, and the arbitration unit is configured to determine instruction priority within the physical function by a method of hardware arbitration.
  • FIG. 1 is a block diagram of a processor with a core selection unit.
  • FIG. 2 is a diagram of an embodiment of virtual function mapping.
  • FIG. 3 is a diagram of an embodiment of core selection logic.
  • FIG. 4 is a diagram of an embodiment of a virtual function arbitration circuit.
  • FIG. 5 is a diagram of an embodiment of a physical function arbitration circuit.
  • FIG. 6A is a diagram of an embodiment of a group execution matrix.
  • FIG. 6B is a diagram of an embodiment of a store execution matrix.
  • FIG. 6C is a diagram of an embodiment of a core availability vector.
  • FIG. 7 is an example embodiment of the interaction between a chip with virtual functions and a core selection unit with an arbitration unit and a host system with software.
  • Treating the cores as one machine makes regulation of the QoS difficult among multiple processes in a host or software.
  • Creating a QoS scheme within a processor allows software to prioritize different processes or groups of processes without using additional software resources or memory.
  • a processor contains two instruction store managers that fetch and dispatch instructions.
  • the processor is coupled to a host processor with software and memory.
  • An instruction store manager (ISM) contains cryptography related instructions.
  • a zip store manager (ZSM) contains compress/decompression related instructions. This specification refers primarily to ISMs, which will refer to both ISMs as defined above and ZSMs, as a person of ordinary skill in the art should be able to interchange the two.
  • the ISM fetches instructions from host memory and dispatches instructions to execution engines based on Quality of Service (QoS) parameters.
  • the ISM has 64 stores and each store within the ISM, or ISM store (ISMS), can belong to physical function (PF) or a particular virtual function (VF) based on the programmed mode.
  • the instruction stores are any data structure capable of storing an instruction.
  • the instruction stores within the ISM are queues. Once instructions have populated a work store in the host memory, software signals a corresponding store in the ISM and that ISMS fetches the instruction if that ISMS has available space.
  • the ZSM also fetches instructions from host memory and dispatches instructions to execution engines based on QoS parameters.
  • the processor has four VF modes in addition to the PF mode.
  • the four VF modes are named VF 8 , VF 16 , VF 32 and VF 64 .
  • VF 8 uses 8 virtual functions
  • VF 16 uses 16 virtual functions
  • VF 32 uses 32 virtual functions
  • VF 64 uses 64 virtual functions.
  • each VF contains 8 instruction stores
  • in VF 16 each VF contains 4 instruction stores
  • in VF 32 each VF contains 2 instruction stores
  • in VF 64 each VF contains 1 instruction store.
  • stores within the VF are always numbered from 0 to N-1, where N is the number of instruction stores per VF.
  • N is 8 for VF 8 , 4 for VF 16 , 2 for VF 32 and 1 for VF 64 .
  • Other embodiments can have a different number of VFs or divide resources among the VFs differently.
  • the instruction stores are numbered from 0 to 63 (64 for ISM and 64 for ZSM) and are grouped into one physical function.
  • the ISM is responsible for dispatching instructions from the instruction stores to execution engines, or cores. To dispatch an instruction, the ISM selects execution engines from a list of available engines. A software selectable Round Robin or Fixed Priority arbitration algorithm may be employed for core selection.
  • the host or software sets a 64-bit store execution mask for each instruction store of cores indicating where the instruction store can dispatch an instruction. Each instruction store has its own store execution mask, which are all stored together in a store execution matrix and are programmed by software to implement QoS policies.
  • each instruction is associated and stored with a group number.
  • group number In one embodiment, there are eight groups.
  • the ISM contains eight 64-bit group execution masks, each mask corresponding to one group number and indicating to which cores a particular group is allowed to dispatch.
  • core eligibility may be determined by the following criteria, where N is any core number from 0-63.
  • the eligibility is determined by performing a bit wise AND of the instruction's instruction store execution mask and the group execution mask for a particular core. If this result is non-zero, then the instruction is considered eligible for dispatch and participates in the instruction scheduling round.
  • Global arbitration uses a method of hardware arbitration that is software selectable between different methods of instruction arbitration.
  • Methods of hardware arbitration may include, e.g., round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • instruction store 0 has the highest priority
  • instruction store 63 has the lowest priority, where the priority of all other instruction stores increments accordingly.
  • a person of ordinary skill in the art could include other implementations of fixed priority arbitration or fixed priority algorithms.
  • VF mode (VF 8 , VF 16 , VF 32 , VF 64 ) there are two levels of arbitration.
  • Local arbitration arbitrates between instruction stores within a virtual function using a method of hardware arbitration.
  • Methods of hardware arbitration may include, e.g., round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • round robin arbitration weighted round robin arbitration
  • fixed priority arbitration lower numbered instruction stores have a higher priority.
  • the local arbitration selects one instruction of the plurality of instruction stores to represent the VF.
  • Global arbitration then arbitrates between the instructions chosen by the local arbitration within each VF using a method of hardware arbitration.
  • methods of hardware arbitration can include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • VF mode the global arbitration among the VF's has a higher precedence than local arbitration within a VF. For example, if global arbitration is round robin, then each VF will be considered for issuing one instruction before intra-VF arbitration is considered.
  • the physical instruction stores may be assigned to VFs in an interleaved manner as shown in the example table below.
  • Group 1 _Mask 0xFFFF_FFFF_FFFF — 0000
  • ISMS 1 _Mask 0x0000 — 0000 — 0000_FFFF
  • ISMS 2 _Mask 0x5555 — 5555 — 5555 — 0000
  • _Mask is the bit vector of eligible execution engines, represented in hexadecimal notation.
  • Group 0 _Mask would activate cores 0 - 15
  • Group 1 _Mask would activate cores 16 - 63
  • ISMS 0 _Mask would activate all even cores between 0 - 15
  • ISMS 1 _Mask would activate all cores between 0 - 15
  • ISMS 2 _Mask would activate all even cores between 16 - 63
  • ISMS 3 _Mask would activate all cores between 16 - 63 .
  • ISMSs 0 and 1 get Group 0 instructions and ISMSs 2 and 3 get Group 1 instructions
  • the store execution masks would remain the same since the group execution mask 0 activates all cores in ISMS 0 and ISMS 1 and group execution mask 1 activates all cores in ISMS 2 and ISMS 3 . Therefore, ISMSs 1 and 3 can dispatch instructions to twice as many engines and therefore have twice as much throughput. This example is simplified, as software can setup any ISMS to work with many instruction groups.
  • each VF has two instruction stores.
  • This example also shows the physical to virtual mapping, where VF 0 includes physical instruction stores 0 and 32 , and VF 1 includes physical instruction stores 1 and 33 . If VF 1 needs more resources than VF 0 , software should set the masks appropriately to adjust the QoS. In this example, both VF 0 and VF 1 share even numbered cores, while only VF 1 can use odd numbered cores.
  • software programming of the group execution masks and store execution masks can control the QoS intra-VF and inter-VF. The features described above allow the group execution masks and store execution marks to create different Quality of Service policies between virtual functions and within virtual functions of a device.
  • the instruction store manager can be reused to feed instructions to a cryptography unit and a compression unit.
  • the design is agnostic to the instructions contained within the instruction stores. Any type of processing instruction may be stored and dispatched to execution units the logic of the instruction store.
  • Two separate instruction store managers can fetch instructions from a host's memory and issue instructions independently to cryptography unit, compression unit, or other type of unit as explained above. This device incorporates both instructions for cryptography and instructions for data compression/decompression in separate store structures.
  • FIG. 1 is a block diagram of a processor with a core selection unit 110 .
  • the processor contains a plurality of cores 714 .
  • the core selection unit 110 is coupled with a plurality of instruction stores 102 A-C through an instruction store bus 108 .
  • Instruction store 102 A is indexed with the number 0, instruction store 102 B is indexed with the number 1, and instruction store 102 C is indexed with the number N.
  • the index of instruction store 102 C N can be any positive integer.
  • a corresponding instruction store indexed to every integer between 1 and N will be coupled to the core selection unit in a similar manner as instruction stores 102 A-C. As such, there will be N+1 total instruction stores. In one embodiment, N can be 63, totaling 64 instruction stores.
  • the instruction stores 102 A-C can be any data structure that can store work for a processor.
  • the instruction stores 102 A-C may be a content adjustable memory.
  • the instruction stores 102 A-C may be a queue.
  • the instruction stores 102 A-C store instructions for the core of a processor in one embodiment, they may also store any other type of work for a processor, e.g. memory operations.
  • the instruction stores 102 A-C can store instructions for cryptography or for compression. Some embodiments can contain more than one set of instruction stores for different applications. Example embodiments of instruction stores are cryptography instruction stores, compression instruction stores, video processing instruction stores, image processing instruction stores, general instruction stores, or general processing instruction stores, or miscellaneous instruction stores.
  • the instruction store bus 108 transmits information from the instruction stores 102 A-C to the core selection unit 110 .
  • This information can include a group number 104 and a store state 106 .
  • the group number 104 is a property of the instruction stored in the instruction store 102 A-C. In one embodiment, the group number is not part of the instruction itself, but is associated and stored together with the instruction. As shown later in the specification, the group number is a property of the instruction that is a factor in selecting an eligible core of the processor to process that instruction.
  • the instruction store state 106 relates to the state of the instruction store 102 A-C.
  • the core selection unit 110 contains a plurality of arbitration units 112 and core selection logic 114 .
  • the core selection unit 110 operates in two different modes, a physical function mode and a virtual function mode. In the physical function mode, the core selection unit 110 groups all of the instruction stores 102 A-C into one physical function.
  • a single arbitration unit then uses a method of hardware arbitration to select an instruction of the physical function for processing by an available core of the processor.
  • the method of hardware arbitration can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • the core selection unit 110 is configured to create a plurality of virtual functions.
  • the core selection unit 110 creates8, 16, 32, or 64 virtual functions.
  • These four levels of virtual functions are modes of the core selection unit 110 , which can be set by the processor, and are referred to as VF 8 , VF 16 , VF 32 , and VF 64 respectively.
  • VF 8 VF 8 , VF 16 , VF 32 , and VF 64 respectively.
  • the core selection unit 110 groups each instruction store 102 A-C into one of a plurality of virtual functions. In one embodiment, the instruction stores 102 A-C are distributed evenly among the virtual functions.
  • Multiple arbitration units 112 are configured to use a method of hardware arbitration to select an instruction within each virtual function using a method of hardware arbitration. Then, second level of arbitration selects an instruction among each of the virtual functions.
  • the method of hardware arbitration can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • the core selection unit 110 also includes a group execution matrix 116 , a store execution matrix 118 , and a core availability vector 120 . Both the group execution matrix 116 and store execution matrix 118 are set by a host or software.
  • the group execution matrix 116 includes a plurality of group execution masks. Each group execution mask corresponds to a group number 104 and indicates which cores can process an instruction from that group number 104 .
  • the store execution matrix includes a plurality of store execution masks. Each store execution mask corresponds to an instruction store 102 A-C and indicates which cores can process an instruction from that instruction store 102 A-C.
  • the core availability vector 120 indicates which core or cores are idle and available to process an instruction.
  • the core selection logic 114 and arbitration units 112 of the core selection unit 110 determine which instruction store can send an instruction to a core.
  • the core selection unit outputs an eligible instruction store 122 and the eligible core ID 124 corresponding to the core that will process the instruction.
  • the core selection unit also outputs an instruction ID to identify the instruction within the instruction store (not shown).
  • each instruction store 122 transmits only one instruction to the core selection unit 110 at a time, such as when the instruction store 122 is a queue with an instruction at its head, no such instruction ID (not shown) is required.
  • FIG. 2 is a diagram of virtual function mapping.
  • Instruction arbitration system 200 includes a core selection unit 202 .
  • the core selection unit 202 generates an eligible instruction store vector 204 which indicates which instruction stores can be processed by an idle core or cores. All instruction stores that are eligible for processing by at least one core are then mapped to a corresponding virtual function by the virtual function mapper 206 .
  • Each virtual function then is processed by the intra-virtual function arbitrator 208 , which selects a winning instruction within each virtual function.
  • Each winning instruction is then transmitted to the inter-virtual function arbitrator 210 , which selects a winning instruction among the winning instructions of each virtual function.
  • the inter-virtual function arbitrator 210 then transmits a winning instruction store ID 212 to instruction dispatch logic 214 .
  • the dispatch logic 214 transmits a core ID 218 of the eligible idle core to the eligible instruction store 216 associated with the winning instruction store ID 212 .
  • the dispatch logic transmits an instruction dispatch signal 220 to the eligible instruction store 216 with the core ID 218 , and the eligible instruction store 216 then issues the eligible instruction to a core corresponding with the core ID 218 .
  • FIG. 3 is a diagram of core selection logic 300 .
  • the group execution matrix 116 includes a plurality of group execution masks 302 .
  • the group execution matrix 116 is coupled with group execution multiplexers 306 A-C.
  • the group execution multiplexers 306 A-C are configured to select one of the plurality of group execution masks 302 .
  • the quantity of group execution multiplexers 306 A-C corresponds with the number of instruction stores in the processor. When the instruction store is configured to output multiple instructions at once, more group execution multiplexers 306 A-C may be necessary to select additional group execution masks 302 .
  • Each group execution multiplexer 306 A-C is coupled with a group execution multiplexer selector 308 A-C associated with a group number of an instruction of the instruction store.
  • the group execution multiplexers 306 A-C each output a corresponding group execution mask 310 A-C.
  • the store execution matrix 118 includes a plurality of store execution masks 304 .
  • the store execution matrix is coupled with store execution multiplexers 312 A-C.
  • the store execution multiplexers 312 A-C are configured to select one of the plurality of store execution masks 304 .
  • the quantity of store execution multiplexers 312 A-C corresponds with the number of instruction stores in the processor.
  • Each store execution multiplexer 312 A-C is coupled with a store execution multiplexer selector 314 A-C associated with an index number of an instruction store.
  • the store execution multiplexer 312 A-C each output a corresponding store execution mask 316 A-C.
  • the core availability vector 120 indicates which cores are available for processing.
  • the eligible instruction store vector 322 indicates which instruction stores contain an instruction that is eligible for processing by a core.
  • the bitwise AND-gates 318 A-C are coupled with corresponding group execution masks 310 A-C, selection execution masks 316 A-C, the core availability vector 120 and the eligible instruction store vector 322 .
  • the quantity of bitwise AND-gates 318 A-C corresponds to the number of instruction stores. However, in an embodiment where the instruction stores are configured to output more than one instruction, more bitwise AND-gates 318 A-C may be required to represent additional eligible instructions.
  • the bitwise AND-gates 318 A-C performs a bitwise AND operation on the corresponding group execution masks 310 A-C, corresponding store execution masks 316 A-C and the core availability vector 120 .
  • the bitwise AND-gates 318 A-C also input a bit of the eligible instruction store vector 322 corresponding with the appropriate instruction store.
  • the bitwise AND-gates 318 A-C then output corresponding instruction store candidate cores 320 A-C.
  • one candidate core is used as an index to select one entry from each instruction store candidate cores 320 A-C, and only non-zero bits are considered for arbitration.
  • FIG. 4 is an embodiment of a virtual function arbitration circuit 400 .
  • the virtual function mapper 404 is coupled with eligible instruction stores 402 A-D and a virtual function mode register 406 .
  • the eligible instruction stores 402 A-D are one bit representing whether the corresponding instruction store is eligible for one core.
  • the eligible instruction stores 402 A-D is a bit-vector indicating for which cores each instruction store is eligible.
  • the virtual function mode register 406 is configured as a selector to the virtual function mapper 404 .
  • the virtual function mode register 406 is set by a host or software.
  • the virtual function mode register 406 indicates whether it should run in physical function or which virtual function mode the processor should run in.
  • the virtual function mapper 404 then outputs virtual functions 408 A-C.
  • the number of virtual functions 408 A-C corresponds to the virtual function mode represented by the virtual function mode register 406 .
  • the quantity of virtual functions 408 A-C can be 8, 16, 32, and 64.
  • Virtual functions 408 A-C include instructions of the virtual function 408 AA-CC.
  • Intra-virtual function arbitration units 410 A-C contain hardware arbitration module 412 A-C and intra-virtual function multiplexers 414 A-C.
  • the intra-virtual function arbitration units 410 A-C are coupled with the virtual functions 408 A-C.
  • the virtual functions 408 A-C and instructions of the virtual function 408 AA-CC are coupled with the intra-virtual function multiplexer 414 A-C.
  • Hardware arbitration units 412 A-C are coupled with the intra-virtual function multiplexer 414 A-C as a selector.
  • the virtual functions 408 A-C and instructions of the virtual function 408 AA-CC are coupled with hardware arbitration units 412 A-C.
  • the intra-virtual function multiplexers 414 A-C output virtual function candidate instructions 416 A-C based on the hardware arbitration units 412 A-C.
  • the intra-virtual function arbitrators 410 A-C outputs the same virtual function candidate instruction 416 A-C corresponding to its intra-virtual function multiplexer 414 A-C.
  • the inter-virtual function arbitrator 420 contains a hardware arbitration module 422 and an inter-virtual function multiplexer 424 .
  • the inter-virtual function arbitrator 420 is coupled with the virtual function candidate instructions 416 A-C.
  • the hardware arbitration module 422 is coupled with the inter-virtual function multiplexer 424 as a selector. In some embodiments, the hardware arbitration module 422 is also coupled with the virtual function candidate instructions 416 A-C.
  • the inter-virtual function multiplexer 424 selects and outputs one of the virtual function candidate instructions 416 A-C, and the inter-virtual function arbitrator 420 outputs the same as a winning instruction store ID 426 .
  • the method of hardware arbitration used by hardware arbitration modules 412 A-C and 422 can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
  • FIG. 5 is an embodiment of a physical function arbitration circuit 500 .
  • a physical function arbitrator 510 is coupled with eligible instruction stores 502 A-C.
  • the physical function arbitrator includes a hardware arbitration module 512 and a physical function arbitration multiplexer 514 .
  • the hardware arbitration module 512 coupled with the physical function arbitration multiplexer 514 and is configured as a selector.
  • the physical function arbitration multiplexer 514 is coupled with the eligible instruction stores 502 A-C.
  • the hardware arbitration module 512 is coupled with the eligible instruction stores 502 A-C.
  • the physical function arbitration multiplexer selects and outputs a winning instruction store ID 516 , which both the physical function arbitration multiplexer 514 and the physical function arbitrator 510 outputs.
  • FIG. 6A is an embodiment of a group execution matrix 600 .
  • Group execution matrix 600 can correspond to group execution matrix 116 in some embodiments.
  • Group execution matrix 600 includes a plurality of group execution masks 612 .
  • Each group execution mask 612 is one row of the group execution matrix 600 and corresponds to a group number associated with an instruction.
  • the matrix is indexed by the group number index 606 which indicates there are j+1 groups and a core number index 608 which indicates there are m+1 cores.
  • each group execution mask 612 includes typical group execution mask values 604 corresponding to each core of the processor.
  • the typical group execution mask value 604 represents whether an instruction from the group indicated by the group number index 606 can be dispatched to the core indicated by the core number index 608 .
  • FIG. 6B is an embodiment of an instruction store execution matrix 620 .
  • Store execution matrix 620 can correspond to store execution matrix 118 in some embodiments.
  • Store execution matrix 620 includes a plurality of store execution masks 622 .
  • Each store execution mask 622 is one row of the store execution matrix 620 and corresponds to a store number index 626 associated with an instruction store. The matrix is indexed by store number index 626 which indicates there are n+1 instruction stores and a core number index 628 which indicates there are m+1 cores.
  • each store execution mask 622 includes typical store execution mask values 624 corresponding to each core of the processor. The typical store execution mask value 624 represents whether an instruction from the instruction store indicated by the store number index 626 can be dispatched to the core indicated by the core number index 628 .
  • FIG. 6C is an embodiment of a core availability vector 640 .
  • Core availability vector 640 can correspond to core availability vector 120 in some embodiments.
  • the core availability vector is indexed by a core number index 648 .
  • the core availability vector includes a plurality of typical core availability vector values 644 corresponding to the availability of the core of the processor indicated by the core number index 648 .
  • FIG. 7 is an example embodiment of the interaction between a host system with software and a chip including virtual functions and core selection and arbitration units.
  • An integrated host and chip system 700 includes a host and software 702 coupled with memory 704 and also a chip 710 through host and chip connection 706 .
  • Chip 710 includes a host interface 712 , a plurality of cores 714 , and an instruction store manager 720 .
  • the host interface 712 is coupled with the instruction store manager 720 and the cores 714 .
  • the cores 714 and instruction store manager 720 are also coupled to each other.
  • the instruction store manager 720 includes a group execution matrix 722 , a store execution matrix 724 , and instruction stores 726 A-C.
  • the host and software 702 are configured to communicate bidirectionally with the chip 710 .
  • the host and software 702 can signal an instruction store 726 A-C that there is an available instruction. If the instruction store has available space, it can fetch instructions from the host and software's 702 memory 704 through the host interface 712 .
  • the host and software can also set the group execution matrix 722 and the store execution matrix 724 .
  • the chip 710 can communicate the results of instructions processed by the cores 714 back to the host and software 702 through the host and chip connection 706 to be recorded in memory 704 .

Abstract

In one embodiment, a processor includes plural processing cores, and plural instruction stores, each instruction store storing at least one instruction, each instruction having a corresponding group number, each instruction store having a unique identifier. The processor also includes a group execution matrix having a plurality of group execution masks and a store execution matrix comprising a plurality of store execution masks. The processor further includes a core selection unit that, for each instruction within each instruction store, selects a store execution mask from the store execution matrix. The core selection unit for each instruction within each instruction store selects at least one group execution mask from the group execution matrix. The core selection unit performs logic operations to create a core request mask. The processor includes an arbitration unit that determines instruction priority among each instruction, assigns an instruction for each available core, and signals the instruction store.

Description

BACKGROUND
In the field of computer networking and other packet-switched telecommunication networks, quality of service (QoS) refers to an ability to provide different priority to different applications, users, or data flows, or to guarantee a certain level of performance to a data flow. For example, a QoS scheme may guarantee a required bit rate, delay, jitter, packet dropping probability and/or bit error rate. QoS guarantees are important for real-time streaming multimedia applications that are delay sensitive and have fixed bit rates, such as voice over IP, online games and video.
In processors with multiple cores, a host or software will often view the processor as one machine despite the processor having multiple cores. When the host or software runs several simultaneous processes, it will treat the processor as one machine, when it could be advantageous to treat it as multiple machines for the multiple processes. Few hardware mechanisms currently exist that regulate QoS of instructions from a host or software.
SUMMARY
In one embodiment, a processor comprises a plurality of processing cores, and a plurality of instruction stores, each instruction store storing at least one instruction, each instruction having a corresponding group number, each instruction store having a unique identifier. The processor also comprises a group execution matrix comprising a plurality of group execution masks and a store execution matrix comprising a plurality of store execution masks.
The processor also comprises a core selection unit configured to, for each instruction within each instruction store, select a store execution mask from the store execution matrix using the unique identifier of a selected instruction store as an index. The core selection unit is further configured to, for each instruction within each instruction store, select at least one group execution mask from the group execution matrix using the group number of at least one selected instruction from the selected instruction store as an index. The core selection unit is configured to, for each instruction within the instruction store and for each group execution mask of the at least one group execution masks, perform logic operations on the selected group execution mask and the store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores. The core selection unit is further configured to perform a bitwise and—operation on the selected group execution mask and the selected store execution mask to create the core request mask corresponding to the selected instruction store.
The processor also comprises an arbitration unit configured to determine instruction priority among each instruction, each instruction store having at least one corresponding core request mask, accordingly assign an instruction for each available core, where the core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores, and signal the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
In one embodiment, a method comprises, on the clock cycle of a processor with a plurality of cores and plurality of instruction stores, and for each instruction within the instruction stores, selecting a store execution mask from a store execution matrix using a unique identifier of a selected instruction store as an index and selecting at least one group execution mask from a group execution matrix using a group number corresponding to an instruction of the selected instruction store as an index.
For each selected group execution mask of the group execution masks, logic operations are performed on at least the selected group execution mask and the selected store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores, each core request mask added to a core request matrix indexed by the unique identifier of each instruction store. Then, on the clock cycle of the processor, arbitrating to determine instruction priority among the individual instructions corresponding to the plurality of core request masks, assigning an instruction to each available core, where a core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores, signaling the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
The instruction store can also include a queue, and the core selection unit can be configured to select one group number corresponding to the instruction at the front of this queue. The instruction store can also be configured to dispatch an instruction to any of the plurality of cores. Each instruction store can be assigned to one of a plurality of virtual functions.
The arbitration unit can determine instruction priority among the virtual functions by a method of hardware arbitration.
Virtual function arbitration units can determine instruction priority within the virtual function by a method of hardware arbitration. The virtual function arbitration units can be configured to determine instruction priority among the instruction stores. The virtual functions can interface with a host, receive instructions and distribute instructions to its corresponding instruction stores.
The core selection unit can perform a bitwise and—operation on a core availability vector, the selected group execution mask, the selected store execution mask, and the core availability vector to create the core request mask corresponding to the selected instruction store.
The processor can also comprise a dispatch unit that receives a unique identifier of the one instruction store and an identification number of an available core and produces a signal to the selected instruction store to issue an instruction to the available core indicated by the identification number.
The group execution matrix and store execution matrix is set to affect the quality of service of a physical function or a virtual function among the plurality of cores.
Instruction stores can include compression instruction stores, cryptography instruction stores, video processing instruction stores, image processing instruction stores, or general instruction stores. Each instruction store is assigned to a physical function, and the arbitration unit is configured to determine instruction priority within the physical function by a method of hardware arbitration.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the present invention.
FIG. 1 is a block diagram of a processor with a core selection unit.
FIG. 2 is a diagram of an embodiment of virtual function mapping.
FIG. 3 is a diagram of an embodiment of core selection logic.
FIG. 4 is a diagram of an embodiment of a virtual function arbitration circuit.
FIG. 5 is a diagram of an embodiment of a physical function arbitration circuit.
FIG. 6A is a diagram of an embodiment of a group execution matrix.
FIG. 6B is a diagram of an embodiment of a store execution matrix.
FIG. 6C is a diagram of an embodiment of a core availability vector.
FIG. 7 is an example embodiment of the interaction between a chip with virtual functions and a core selection unit with an arbitration unit and a host system with software.
DETAILED DESCRIPTION
A description of example embodiments of the invention follows.
The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.
Treating the cores as one machine makes regulation of the QoS difficult among multiple processes in a host or software. Creating a QoS scheme within a processor allows software to prioritize different processes or groups of processes without using additional software resources or memory.
A processor contains two instruction store managers that fetch and dispatch instructions. In one embodiment, the processor is coupled to a host processor with software and memory. An instruction store manager (ISM) contains cryptography related instructions. A zip store manager (ZSM) contains compress/decompression related instructions. This specification refers primarily to ISMs, which will refer to both ISMs as defined above and ZSMs, as a person of ordinary skill in the art should be able to interchange the two.
The ISM fetches instructions from host memory and dispatches instructions to execution engines based on Quality of Service (QoS) parameters. In one embodiment, the ISM has 64 stores and each store within the ISM, or ISM store (ISMS), can belong to physical function (PF) or a particular virtual function (VF) based on the programmed mode. In one embodiment, the instruction stores are any data structure capable of storing an instruction. In another embodiment, the instruction stores within the ISM are queues. Once instructions have populated a work store in the host memory, software signals a corresponding store in the ISM and that ISMS fetches the instruction if that ISMS has available space.
As an example of the similarity of the ISM and ZSM, the ZSM also fetches instructions from host memory and dispatches instructions to execution engines based on QoS parameters. There are 64 stores in ZSM and each ZSM store (ZSMS) can belong to PF or a particular VF based on the programmed mode. Once instructions have populated a work store in the host memory, software signals a corresponding store in the ZSM and that ZSMS fetches the instruction if that ZSMS has available space.
In an embodiment, the processor has four VF modes in addition to the PF mode. The four VF modes are named VF8, VF16, VF32 and VF64. VF8 uses 8 virtual functions, VF16 uses 16 virtual functions, VF32 uses 32 virtual functions, and VF64 uses 64 virtual functions. In addition, in VF8 each VF contains 8 instruction stores, in VF16 each VF contains 4 instruction stores, in VF32 each VF contains 2 instruction stores, and in VF64 each VF contains 1 instruction store. Likewise in any of the VF modes, stores within the VF are always numbered from 0 to N-1, where N is the number of instruction stores per VF. N is 8 for VF8, 4 for VF16, 2 for VF32 and 1 for VF 64. Other embodiments can have a different number of VFs or divide resources among the VFs differently.
In PF mode, the instruction stores are numbered from 0 to 63 (64 for ISM and 64 for ZSM) and are grouped into one physical function.
The ISM is responsible for dispatching instructions from the instruction stores to execution engines, or cores. To dispatch an instruction, the ISM selects execution engines from a list of available engines. A software selectable Round Robin or Fixed Priority arbitration algorithm may be employed for core selection. The host or software sets a 64-bit store execution mask for each instruction store of cores indicating where the instruction store can dispatch an instruction. Each instruction store has its own store execution mask, which are all stored together in a store execution matrix and are programmed by software to implement QoS policies.
In addition, each instruction is associated and stored with a group number. In one embodiment, there are eight groups. Likewise, the ISM contains eight 64-bit group execution masks, each mask corresponding to one group number and indicating to which cores a particular group is allowed to dispatch. Likewise, for any particular instruction, core eligibility may be determined by the following criteria, where N is any core number from 0-63.
    • 1. Core N is available.
    • 2. An instruction store's execution mask indicates that it may dispatch an instruction to core N.
    • 3. The instruction of the instruction store is associated with group M (0 to 7).
    • 4. The group execution mask of group M indicates that it may dispatch the instruction to core N.
The eligibility is determined by performing a bit wise AND of the instruction's instruction store execution mask and the group execution mask for a particular core. If this result is non-zero, then the instruction is considered eligible for dispatch and participates in the instruction scheduling round.
In PF mode, the processor only has one global arbitration level. Global arbitration uses a method of hardware arbitration that is software selectable between different methods of instruction arbitration. Methods of hardware arbitration may include, e.g., round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration. In fixed priority, instruction store 0 has the highest priority, instruction store 63 has the lowest priority, where the priority of all other instruction stores increments accordingly. A person of ordinary skill in the art could include other implementations of fixed priority arbitration or fixed priority algorithms.
In VF mode (VF8, VF16, VF32, VF64), there are two levels of arbitration. First, local arbitration arbitrates between instruction stores within a virtual function using a method of hardware arbitration. Methods of hardware arbitration may include, e.g., round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration. In fixed priority mode, lower numbered instruction stores have a higher priority.
Within each VF, the local arbitration selects one instruction of the plurality of instruction stores to represent the VF. Global arbitration then arbitrates between the instructions chosen by the local arbitration within each VF using a method of hardware arbitration. Again, methods of hardware arbitration can include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration. When global arbitration is in fixed priority mode, priority is assigned by VF number, where the lowest VF numbers have the highest priority. Arbitration decisions are made on a cycle by cycle basis. In VF mode, the global arbitration among the VF's has a higher precedence than local arbitration within a VF. For example, if global arbitration is round robin, then each VF will be considered for issuing one instruction before intra-VF arbitration is considered.
When operating in any one of the VF modes, the physical instruction stores may be assigned to VFs in an interleaved manner as shown in the example table below.
VF Instruction PF Instruction
VF Mode Store Number Store Number
VF8 X -> ISMS0 (ZSMS0)  0 + X
X = VF# = 0 . . . 7 X -> ISMS1 (ZSMS1)  8 + X
X -> ISMS2 (ZSMS2) 16 + X
X -> ISMS3 (ZSMS3) 24 + X
X -> ISMS4 (ZSMS4) 32 + X
X -> ISMS5 (ZSMS5) 40 + X
X -> ISMS6 (ZSMS6) 48 + X
X -> ISMS7 (ZSMS7) 56 + X
VF16 X -> ISMS0 (ZSMS0)  0 + X
X = VF# = 0 . . . 15 X -> ISMS1 (ZSMS1) 16 + X
X -> ISMS2 (ZSMS2) 32 + X
X -> ISMS3 (ZSMS3) 48 + X
VF32 X -> ISMS0 (ZSMS0)  0 + X
X = VF# = 0 . . . 31 X -> ISMS1 (ZSMS1) 32 + X
VF64 X -> ISMS0 (ZSMS0) X
X = VF# = 0 . . . 63
Example PF QoS Configuration:
Group0_Mask: 0x000000000000_FFFF
Group1_Mask: 0xFFFF_FFFF_FFFF0000
ISMS0_Mask: 0x0000000000005555
ISMS1_Mask: 0x000000000000_FFFF
ISMS2_Mask: 0x5555555555550000
ISMS3_Mask: 0xFFFF_FFFF_FFFF0000
In this setup above, _Mask is the bit vector of eligible execution engines, represented in hexadecimal notation. One of skill in the art can appreciate that the Group0_Mask would activate cores 0-15, and Group 1_Mask would activate cores 16-63. Likewise, ISMS0_Mask would activate all even cores between 0-15 and ISMS1_Mask would activate all cores between 0-15. ISMS2_Mask would activate all even cores between 16-63 and ISMS3_Mask would activate all cores between 16-63.
If ISMSs 0 and 1 get Group 0 instructions and ISMSs 2 and 3 get Group 1 instructions, the store execution masks would remain the same since the group execution mask 0 activates all cores in ISMS0 and ISMS1 and group execution mask 1 activates all cores in ISMS2 and ISMS3. Therefore, ISMSs 1 and 3 can dispatch instructions to twice as many engines and therefore have twice as much throughput. This example is simplified, as software can setup any ISMS to work with many instruction groups.
Example VF QoS Configuration:
Group0_Mask: 0x000000000000_FFFF
VF0_ISMS00_Mask: 0x0000000000005555
VF0_ISMS32_Mask: 0x0000000000005555
VF1_ISMS01_Mask: 0x000000000000_FFFF
VF1_ISMS33_Mask: 0x000000000000_FFFF
In this setup above, there are two VFs using group_0 (VF0 and VF1). In VF32 mode (for this example), each VF has two instruction stores. This example also shows the physical to virtual mapping, where VF0 includes physical instruction stores 0 and 32, and VF1 includes physical instruction stores 1 and 33. If VF1 needs more resources than VF0, software should set the masks appropriately to adjust the QoS. In this example, both VF0 and VF1 share even numbered cores, while only VF1 can use odd numbered cores. A person of ordinary skill of the art can appreciate that software programming of the group execution masks and store execution masks can control the QoS intra-VF and inter-VF. The features described above allow the group execution masks and store execution marks to create different Quality of Service policies between virtual functions and within virtual functions of a device.
The instruction store manager can be reused to feed instructions to a cryptography unit and a compression unit. The design is agnostic to the instructions contained within the instruction stores. Any type of processing instruction may be stored and dispatched to execution units the logic of the instruction store. Two separate instruction store managers can fetch instructions from a host's memory and issue instructions independently to cryptography unit, compression unit, or other type of unit as explained above. This device incorporates both instructions for cryptography and instructions for data compression/decompression in separate store structures.
FIG. 1 is a block diagram of a processor with a core selection unit 110. As shown in FIG. 7, the processor contains a plurality of cores 714. Returning to FIG. 1, the core selection unit 110 is coupled with a plurality of instruction stores 102A-C through an instruction store bus 108. Instruction store 102A is indexed with the number 0, instruction store 102B is indexed with the number 1, and instruction store 102C is indexed with the number N. The index of instruction store 102C N can be any positive integer. A person of ordinary skill in the art should appreciate that a corresponding instruction store indexed to every integer between 1 and N will be coupled to the core selection unit in a similar manner as instruction stores 102A-C. As such, there will be N+1 total instruction stores. In one embodiment, N can be 63, totaling 64 instruction stores.
The instruction stores 102A-C can be any data structure that can store work for a processor. In one embodiment, the instruction stores 102A-C may be a content adjustable memory. In another embodiment, the instruction stores 102A-C may be a queue. In addition, while the instruction stores 102A-C store instructions for the core of a processor in one embodiment, they may also store any other type of work for a processor, e.g. memory operations.
In one embodiment, the instruction stores 102A-C can store instructions for cryptography or for compression. Some embodiments can contain more than one set of instruction stores for different applications. Example embodiments of instruction stores are cryptography instruction stores, compression instruction stores, video processing instruction stores, image processing instruction stores, general instruction stores, or general processing instruction stores, or miscellaneous instruction stores.
The instruction store bus 108 transmits information from the instruction stores 102A-C to the core selection unit 110. This information can include a group number 104 and a store state 106. The group number 104 is a property of the instruction stored in the instruction store 102A-C. In one embodiment, the group number is not part of the instruction itself, but is associated and stored together with the instruction. As shown later in the specification, the group number is a property of the instruction that is a factor in selecting an eligible core of the processor to process that instruction. The instruction store state 106 relates to the state of the instruction store 102A-C.
The core selection unit 110 contains a plurality of arbitration units 112 and core selection logic 114. The core selection unit 110 operates in two different modes, a physical function mode and a virtual function mode. In the physical function mode, the core selection unit 110 groups all of the instruction stores 102A-C into one physical function. A single arbitration unit then uses a method of hardware arbitration to select an instruction of the physical function for processing by an available core of the processor. The method of hardware arbitration can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
In the virtual function mode, the core selection unit 110 is configured to create a plurality of virtual functions. In some embodiments, the core selection unit 110 creates8, 16, 32, or 64 virtual functions. These four levels of virtual functions are modes of the core selection unit 110, which can be set by the processor, and are referred to as VF8, VF16, VF32, and VF64 respectively. It should be appreciated by a person of skill in the art that a processor with a different number of instruction stores 102A-C correlates to a different number of virtual functions. The core selection unit 110 groups each instruction store 102A-C into one of a plurality of virtual functions. In one embodiment, the instruction stores 102A-C are distributed evenly among the virtual functions.
Multiple arbitration units 112 are configured to use a method of hardware arbitration to select an instruction within each virtual function using a method of hardware arbitration. Then, second level of arbitration selects an instruction among each of the virtual functions. The method of hardware arbitration can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
It should be appreciated by a person of ordinary skill in the art that when the instruction store is configured to output more than one instruction to the core selection unit, a level of arbitration among the instructions in each instruction store can be integrated into the processor.
The core selection unit 110 also includes a group execution matrix 116, a store execution matrix 118, and a core availability vector 120. Both the group execution matrix 116 and store execution matrix 118 are set by a host or software. The group execution matrix 116 includes a plurality of group execution masks. Each group execution mask corresponds to a group number 104 and indicates which cores can process an instruction from that group number 104. The store execution matrix includes a plurality of store execution masks. Each store execution mask corresponds to an instruction store 102A-C and indicates which cores can process an instruction from that instruction store 102A-C. The core availability vector 120 indicates which core or cores are idle and available to process an instruction.
The core selection logic 114 and arbitration units 112 of the core selection unit 110 determine which instruction store can send an instruction to a core. The core selection unit outputs an eligible instruction store 122 and the eligible core ID 124 corresponding to the core that will process the instruction. In an embodiment where each instruction store transmits multiple instructions to the core selection unit at a time, the core selection unit also outputs an instruction ID to identify the instruction within the instruction store (not shown). In an embodiment where each instruction store 122 transmits only one instruction to the core selection unit 110 at a time, such as when the instruction store 122 is a queue with an instruction at its head, no such instruction ID (not shown) is required.
FIG. 2 is a diagram of virtual function mapping. Instruction arbitration system 200 includes a core selection unit 202. The core selection unit 202 generates an eligible instruction store vector 204 which indicates which instruction stores can be processed by an idle core or cores. All instruction stores that are eligible for processing by at least one core are then mapped to a corresponding virtual function by the virtual function mapper 206. Each virtual function then is processed by the intra-virtual function arbitrator 208, which selects a winning instruction within each virtual function. Each winning instruction is then transmitted to the inter-virtual function arbitrator 210, which selects a winning instruction among the winning instructions of each virtual function. The inter-virtual function arbitrator 210 then transmits a winning instruction store ID 212 to instruction dispatch logic 214. The dispatch logic 214 transmits a core ID 218 of the eligible idle core to the eligible instruction store 216 associated with the winning instruction store ID 212. The dispatch logic transmits an instruction dispatch signal 220 to the eligible instruction store 216 with the core ID 218, and the eligible instruction store 216 then issues the eligible instruction to a core corresponding with the core ID 218.
FIG. 3 is a diagram of core selection logic 300. The group execution matrix 116 includes a plurality of group execution masks 302. The group execution matrix 116 is coupled with group execution multiplexers 306A-C. The group execution multiplexers 306A-C are configured to select one of the plurality of group execution masks 302. The quantity of group execution multiplexers 306A-C corresponds with the number of instruction stores in the processor. When the instruction store is configured to output multiple instructions at once, more group execution multiplexers 306A-C may be necessary to select additional group execution masks 302. Each group execution multiplexer 306A-C is coupled with a group execution multiplexer selector 308A-C associated with a group number of an instruction of the instruction store. The group execution multiplexers 306A-C each output a corresponding group execution mask 310A-C.
The store execution matrix 118 includes a plurality of store execution masks 304. The store execution matrix is coupled with store execution multiplexers 312A-C. The store execution multiplexers 312A-C are configured to select one of the plurality of store execution masks 304. The quantity of store execution multiplexers 312A-C corresponds with the number of instruction stores in the processor. Each store execution multiplexer 312A-C is coupled with a store execution multiplexer selector 314A-C associated with an index number of an instruction store. The store execution multiplexer 312A-C each output a corresponding store execution mask 316A-C.
The core availability vector 120 indicates which cores are available for processing. In one embodiment, the eligible instruction store vector 322 indicates which instruction stores contain an instruction that is eligible for processing by a core.
The bitwise AND-gates 318A-C are coupled with corresponding group execution masks 310A-C, selection execution masks 316A-C, the core availability vector 120 and the eligible instruction store vector 322. In one embodiment, the quantity of bitwise AND-gates 318A-C corresponds to the number of instruction stores. However, in an embodiment where the instruction stores are configured to output more than one instruction, more bitwise AND-gates 318A-C may be required to represent additional eligible instructions. The bitwise AND-gates 318A-C performs a bitwise AND operation on the corresponding group execution masks 310A-C, corresponding store execution masks 316A-C and the core availability vector 120. In some embodiments, the bitwise AND-gates 318A-C also input a bit of the eligible instruction store vector 322 corresponding with the appropriate instruction store. The bitwise AND-gates 318A-C then output corresponding instruction store candidate cores 320A-C. In one embodiment, one candidate core is used as an index to select one entry from each instruction store candidate cores 320A-C, and only non-zero bits are considered for arbitration.
FIG. 4 is an embodiment of a virtual function arbitration circuit 400. The virtual function mapper 404 is coupled with eligible instruction stores 402A-D and a virtual function mode register 406. In one embodiment, the eligible instruction stores 402A-D are one bit representing whether the corresponding instruction store is eligible for one core. In another embodiment, the eligible instruction stores 402A-D is a bit-vector indicating for which cores each instruction store is eligible.
In one embodiment, the virtual function mode register 406 is configured as a selector to the virtual function mapper 404. The virtual function mode register 406 is set by a host or software. The virtual function mode register 406 indicates whether it should run in physical function or which virtual function mode the processor should run in. The virtual function mapper 404 then outputs virtual functions 408A-C. The number of virtual functions 408A-C corresponds to the virtual function mode represented by the virtual function mode register 406. In some embodiments, the quantity of virtual functions 408A-C can be 8, 16, 32, and 64. Virtual functions 408A-C include instructions of the virtual function 408AA-CC.
Intra-virtual function arbitration units 410A-C contain hardware arbitration module 412A-C and intra-virtual function multiplexers 414A-C. The intra-virtual function arbitration units 410A-C are coupled with the virtual functions 408A-C. The virtual functions 408A-C and instructions of the virtual function 408AA-CC are coupled with the intra-virtual function multiplexer 414A-C. Hardware arbitration units 412A-C are coupled with the intra-virtual function multiplexer 414A-C as a selector. In some embodiments, the virtual functions 408A-C and instructions of the virtual function 408AA-CC are coupled with hardware arbitration units 412A-C. The intra-virtual function multiplexers 414A-C output virtual function candidate instructions 416A-C based on the hardware arbitration units 412A-C. The intra-virtual function arbitrators 410A-C outputs the same virtual function candidate instruction 416A-C corresponding to its intra-virtual function multiplexer 414A-C.
The inter-virtual function arbitrator 420 contains a hardware arbitration module 422 and an inter-virtual function multiplexer 424. The inter-virtual function arbitrator 420 is coupled with the virtual function candidate instructions 416A-C. The hardware arbitration module 422 is coupled with the inter-virtual function multiplexer 424 as a selector. In some embodiments, the hardware arbitration module 422 is also coupled with the virtual function candidate instructions 416A-C. The inter-virtual function multiplexer 424 selects and outputs one of the virtual function candidate instructions 416A-C, and the inter-virtual function arbitrator 420 outputs the same as a winning instruction store ID 426.
The method of hardware arbitration used by hardware arbitration modules 412A-C and 422 can be any method of arbitration. Examples methods of hardware arbitration include round robin arbitration, weighted round robin arbitration, fixed priority arbitration, and random arbitration.
FIG. 5 is an embodiment of a physical function arbitration circuit 500. A physical function arbitrator 510 is coupled with eligible instruction stores 502A-C. The physical function arbitrator includes a hardware arbitration module 512 and a physical function arbitration multiplexer 514. The hardware arbitration module 512 coupled with the physical function arbitration multiplexer 514 and is configured as a selector. The physical function arbitration multiplexer 514 is coupled with the eligible instruction stores 502A-C. In some embodiments, the hardware arbitration module 512 is coupled with the eligible instruction stores 502A-C. The physical function arbitration multiplexer selects and outputs a winning instruction store ID 516, which both the physical function arbitration multiplexer 514 and the physical function arbitrator 510 outputs.
FIG. 6A is an embodiment of a group execution matrix 600. Group execution matrix 600 can correspond to group execution matrix 116 in some embodiments. Group execution matrix 600 includes a plurality of group execution masks 612. Each group execution mask 612 is one row of the group execution matrix 600 and corresponds to a group number associated with an instruction. The matrix is indexed by the group number index 606 which indicates there are j+1 groups and a core number index 608 which indicates there are m+1 cores. In one embodiment, the group execution matrix uses values of j=7 and m=63, representing 8 groups and 64 cores. Further, each group execution mask 612 includes typical group execution mask values 604 corresponding to each core of the processor. The typical group execution mask value 604 represents whether an instruction from the group indicated by the group number index 606 can be dispatched to the core indicated by the core number index 608.
FIG. 6B is an embodiment of an instruction store execution matrix 620. Store execution matrix 620 can correspond to store execution matrix 118 in some embodiments. Store execution matrix 620 includes a plurality of store execution masks 622. Each store execution mask 622 is one row of the store execution matrix 620 and corresponds to a store number index 626 associated with an instruction store. The matrix is indexed by store number index 626 which indicates there are n+1 instruction stores and a core number index 628 which indicates there are m+1 cores. In one embodiment, the store execution matrix 620 uses values of n=63 and m=63, representing 64 instruction stores and 64 cores. Further, each store execution mask 622 includes typical store execution mask values 624 corresponding to each core of the processor. The typical store execution mask value 624 represents whether an instruction from the instruction store indicated by the store number index 626 can be dispatched to the core indicated by the core number index 628.
FIG. 6C is an embodiment of a core availability vector 640. Core availability vector 640 can correspond to core availability vector 120 in some embodiments. The core availability vector is indexed by a core number index 648. The core availability vector includes a plurality of typical core availability vector values 644 corresponding to the availability of the core of the processor indicated by the core number index 648.
FIG. 7 is an example embodiment of the interaction between a host system with software and a chip including virtual functions and core selection and arbitration units. An integrated host and chip system 700 includes a host and software 702 coupled with memory 704 and also a chip 710 through host and chip connection 706. Chip 710 includes a host interface 712, a plurality of cores 714, and an instruction store manager 720. The host interface 712 is coupled with the instruction store manager 720 and the cores 714. The cores 714 and instruction store manager 720 are also coupled to each other.
The instruction store manager 720 includes a group execution matrix 722, a store execution matrix 724, and instruction stores 726A-C. The host and software 702 are configured to communicate bidirectionally with the chip 710. The host and software 702 can signal an instruction store 726A-C that there is an available instruction. If the instruction store has available space, it can fetch instructions from the host and software's 702 memory 704 through the host interface 712. The host and software can also set the group execution matrix 722 and the store execution matrix 724. The chip 710 can communicate the results of instructions processed by the cores 714 back to the host and software 702 through the host and chip connection 706 to be recorded in memory 704.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims (29)

What is claimed is:
1. A processor comprising:
a plurality of processing cores;
a plurality of instruction stores, each instruction store storing at least one instruction, each instruction having a corresponding group number, each instruction store having a unique identifier;
a store component storing a group execution matrix and a store execution matrix, the group execution matrix comprising a plurality of group execution masks, each group execution mask corresponding to a given group number and indicating which cores can process an instruction from the given group number; the store execution matrix comprising a plurality of store execution masks;
a core selection unit configured to for each instruction within each instruction store:
select a store execution mask from the store execution matrix using the unique identifier of a selected instruction store as an index, select at least one group execution mask from the group execution matrix using the group number of at least one selected instruction from the selected instruction store as an index, and
for each selected group execution mask of the at least one group execution masks, perform logic operations on the selected group execution mask and the store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores; and
an arbitration unit configured to determine instruction priority among each instruction, each instruction store having at least one corresponding core request mask, accordingly assign an instruction for each available core, where the core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores, and signal the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
2. The processor of claim 1, wherein each instruction store includes a queue, and the core selection unit is configured to select one group number corresponding to the instruction at the front of the queue.
3. The processor of claim 1, wherein each instruction store is further configured to dispatch an instruction to any of the plurality of cores.
4. The processor of claim 1, wherein each instruction store is assigned to one of a plurality of virtual functions.
5. The processor of claim 4, wherein the arbitration unit determines instruction priority among the virtual functions by a method of hardware arbitration.
6. The processor of claim 4, further comprising a plurality of virtual function arbitration units configured to determine instruction priority within the virtual functions by a method of hardware arbitration.
7. The processor of claim 6, wherein the plurality of virtual function arbitration units is configured to determine instruction priority among the instruction stores.
8. The processor of claim 4, wherein the plurality of virtual functions is configured to interface with a host, receive instructions and distribute the instructions to its corresponding instruction stores.
9. The processor of claim 1, wherein the arbitration unit is further configured to determine instruction priority by performing a method of hardware arbitration.
10. The processor of claim 1, wherein the core selection unit is further configured to perform a bitwise and—operation on the selected group execution mask and the selected store execution mask to create the core request mask corresponding to the selected instruction store.
11. The processor of claim 1, further comprising a core availability vector, wherein the core selection unit is further configured to perform a bitwise and—operation on the selected group execution mask, the selected store execution mask, and the core availability vector to create the core request mask corresponding to the selected instruction store.
12. The processor of claim 1, further comprising a dispatch unit configured to receive a unique identifier of the selected instruction store and an identification number of an available core and produce a signal to the selected instruction store to issue an instruction to the available core indicated by the identification number.
13. The processor of claim 1, wherein at least one of the group execution matrix and the store execution matrix is set to affect the quality of service of a physical function or a virtual function among the plurality of cores.
14. The processor of claim 1, wherein the instruction stores are at least one of compression instruction stores, cryptography instruction stores, video processing instruction stores, image processing instruction stores, or general instruction stores.
15. The processor of claim 1, wherein each instruction store is assigned to a physical function, and the arbitration unit is configured to determine instruction priority within the physical function by a method of hardware arbitration.
16. A method comprising:
on the clock cycle of a processor with a plurality of cores and plurality of instruction stores, for each instruction within the instruction stores:
selecting a store execution mask from a store execution matrix using a unique identifier of a selected instruction store as an index;
selecting at least one group execution mask from a group execution matrix using a group number corresponding to an instruction of the selected instruction store as an index, each group execution mask corresponding to a given group number and indicating which cores can process an instruction from the given group number;
for each selected group execution mask of the selected group execution masks, performing logic operations on at least the selected group execution mask and the selected store execution mask to create a core request mask, the core request mask corresponding to the selected instruction store and indicating zero, one, or more candidate cores, each core request mask added to a core request matrix indexed by the unique identifier of each instruction store; and
on the clock cycle of the processor:
arbitrating to determine instruction priority among the individual instructions corresponding to the plurality of core request masks;
assigning an instruction to each available core, where a core request mask corresponding to the instruction store of the instruction indicates candidate cores that intersect with the available cores;
signaling the instruction store corresponding to the assigned instruction to send the assigned instruction to the available core.
17. The method of claim 16, wherein each instruction store includes a queue, selecting at least one group execution mask from a group execution matrix selects only one group number corresponding to the instruction at the front of the queue.
18. The method of claim 16 wherein the plurality of instruction stores are configured to dispatch instructions to the plurality of cores.
19. The method of claim 16, wherein each instruction store is assigned to one of a plurality of virtual functions.
20. The method of claim 19, wherein an arbitration unit arbitrates instruction priority among the virtual functions by a method of hardware arbitration.
21. The method of claim 19, wherein a plurality of virtual function arbitration units arbitrate instruction priority within the virtual functions by a method of hardware arbitration.
22. The method of claim 16, wherein performing logic operations comprises performing a bitwise and—operation on the selected group execution mask and the selected store execution mask to create the core request mask corresponding to the selected instruction store.
23. The method of claim 16, wherein performing logic operations further comprises performing a bitwise and—operation on the selected group execution mask, the selected store execution mask, and a core availability vector, the core availability vector indicating which of the plurality of cores are available for processing, to create the core request mask corresponding to the selected instruction store.
24. The method of claim 16, wherein arbitrating further comprises determining instruction priority by performing method of hardware arbitration.
25. The method of claim 16, further comprising dispatching the assigned instruction to the selected core using a dispatch unit configured to receive a unique identifier of the selected instruction store, the selected instruction store and an identification number of an available core, and produce a signal to the selected instruction store an instruction to the available core.
26. The method of claim 16, further comprising assigning each instruction store to one of a plurality of virtual functions, the plurality of virtual functions configured to interface with a host, receive instructions, and distribute the instructions to its assigned instruction stores.
27. The method of claim 16, further comprising setting at least one of the group execution matrix and the store execution matrix to affect the quality of service of a physical function or a virtual function among the plurality of cores.
28. The method of claim 16, wherein the instruction stores are at least one of compression instruction stores, cryptography instruction stores, video processing instruction stores, image processing instruction stores, or general instruction stores.
29. The method of claim 16, further comprising assigning each instruction store to a physical function, wherein the arbitration unit is configured to determine instruction priority within the physical function by a method of hardware arbitration.
US13/272,975 2011-10-13 2011-10-13 QoS based dynamic execution engine selection Active 2033-03-22 US9129060B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/272,975 US9129060B2 (en) 2011-10-13 2011-10-13 QoS based dynamic execution engine selection
US14/828,884 US9495161B2 (en) 2011-10-13 2015-08-18 QoS based dynamic execution engine selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/272,975 US9129060B2 (en) 2011-10-13 2011-10-13 QoS based dynamic execution engine selection

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/828,884 Continuation US9495161B2 (en) 2011-10-13 2015-08-18 QoS based dynamic execution engine selection

Publications (2)

Publication Number Publication Date
US20130097350A1 US20130097350A1 (en) 2013-04-18
US9129060B2 true US9129060B2 (en) 2015-09-08

Family

ID=48086777

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/272,975 Active 2033-03-22 US9129060B2 (en) 2011-10-13 2011-10-13 QoS based dynamic execution engine selection
US14/828,884 Active US9495161B2 (en) 2011-10-13 2015-08-18 QoS based dynamic execution engine selection

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/828,884 Active US9495161B2 (en) 2011-10-13 2015-08-18 QoS based dynamic execution engine selection

Country Status (1)

Country Link
US (2) US9129060B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140129806A1 (en) * 2012-11-08 2014-05-08 Advanced Micro Devices, Inc. Load/store picker
US9495161B2 (en) 2011-10-13 2016-11-15 Cavium, Inc. QoS based dynamic execution engine selection
US20170286120A1 (en) * 2016-03-30 2017-10-05 Qualcomm Incorporated Apparatus and method to maximize execution lane utilization through a custom high throughput scheduler

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9128769B2 (en) 2011-10-13 2015-09-08 Cavium, Inc. Processor with dedicated virtual functions and dynamic assignment of functional resources
US8930422B2 (en) * 2012-06-04 2015-01-06 Northrop Grumman Systems Corporation Pipelined incremental clustering algorithm
US8867559B2 (en) * 2012-09-27 2014-10-21 Intel Corporation Managing starvation and congestion in a two-dimensional network having flow control
US20150033222A1 (en) 2013-07-25 2015-01-29 Cavium, Inc. Network Interface Card with Virtual Switch and Traffic Flow Policy Enforcement
US20160188510A1 (en) * 2014-12-26 2016-06-30 Samsung Electronics Co., Ltd. METHOD FETCHING/PROCESSING NVMe COMMANDS IN MULTI-PORT, SR-IOV OR MR-IOV SUPPORTED PCIe BASED STORAGE DEVICES
US10341259B1 (en) * 2016-05-31 2019-07-02 Amazon Technologies, Inc. Packet forwarding using programmable feature prioritization
US11010330B2 (en) * 2018-03-07 2021-05-18 Microsoft Technology Licensing, Llc Integrated circuit operation adjustment using redundant elements
US10721172B2 (en) 2018-07-06 2020-07-21 Marvell Asia Pte, Ltd. Limiting backpressure with bad actors

Citations (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5745778A (en) * 1994-01-26 1998-04-28 Data General Corporation Apparatus and method for improved CPU affinity in a multiprocessor system
US6189074B1 (en) 1997-03-19 2001-02-13 Advanced Micro Devices, Inc. Mechanism for storing system level attributes in a translation lookaside buffer
US6253262B1 (en) 1998-09-11 2001-06-26 Advanced Micro Devices, Inc. Arbitrating FIFO implementation which positions input request in a buffer according to its status
US6289369B1 (en) * 1998-08-25 2001-09-11 International Business Machines Corporation Affinity, locality, and load balancing in scheduling user program-level threads for execution by a computer system
US6356989B1 (en) 1992-12-21 2002-03-12 Intel Corporation Translation lookaside buffer (TLB) arrangement wherein the TLB contents retained for a task as swapped out and reloaded when a task is rescheduled
US6496847B1 (en) 1998-05-15 2002-12-17 Vmware, Inc. System and method for virtualizing computer systems
US6789147B1 (en) 2001-07-24 2004-09-07 Cavium Networks Interface for a security coprocessor
US20040216101A1 (en) 2003-04-24 2004-10-28 International Business Machines Corporation Method and logical apparatus for managing resource redistribution in a simultaneous multi-threaded (SMT) processor
US20040268105A1 (en) 2003-06-26 2004-12-30 Michaelis Scott L. Resetting multiple cells within a partition of a multiple partition computer system
US6862694B1 (en) 2001-10-05 2005-03-01 Hewlett-Packard Development Company, L.P. System and method for setting and executing breakpoints
US6861865B1 (en) 2003-03-11 2005-03-01 Cavium Networks Apparatus and method for repairing logic blocks
US6954770B1 (en) 2001-08-23 2005-10-11 Cavium Networks Random number generator
US20050235123A1 (en) 2004-04-19 2005-10-20 Zimmer Vincent J Method to manage memory in a platform with virtual machines
US7035889B1 (en) 2001-12-31 2006-04-25 Cavium Networks, Inc. Method and apparatus for montgomery multiplication
US7076059B1 (en) 2002-01-17 2006-07-11 Cavium Networks Method and apparatus to implement the data encryption standard algorithm
US20060288189A1 (en) 2005-06-15 2006-12-21 Rohit Seth Systems and methods to support partial physical addressing modes on a virtual machine
US7209531B1 (en) 2003-03-26 2007-04-24 Cavium Networks, Inc. Apparatus and method for data deskew
US7240203B2 (en) 2001-07-24 2007-07-03 Cavium Networks, Inc. Method and apparatus for establishing secure sessions
US7260217B1 (en) 2002-03-01 2007-08-21 Cavium Networks, Inc. Speculative execution for data ciphering operations
US20070220203A1 (en) 2006-03-15 2007-09-20 Hitachi, Ltd. Management method for virtualized storage view
US7275249B1 (en) * 2002-07-30 2007-09-25 Unisys Corporation Dynamically generating masks for thread scheduling in a multiprocessor system
US7305567B1 (en) 2002-03-01 2007-12-04 Cavium Networks, In. Decoupled architecture for data ciphering operations
US7310722B2 (en) * 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US20080013715A1 (en) * 2005-12-30 2008-01-17 Feghali Wajdi K Cryptography processing units and multiplier
US7337314B2 (en) * 2003-04-12 2008-02-26 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processor
US20080077909A1 (en) 2006-09-27 2008-03-27 Jamison Collins Enabling multiple instruction stream/multiple data stream extensions on microprocessors
US20080074433A1 (en) * 2006-09-21 2008-03-27 Guofang Jiao Graphics Processors With Parallel Scheduling and Execution of Threads
US7372857B1 (en) 2003-05-28 2008-05-13 Cisco Technology, Inc. Methods and apparatus for scheduling tasks
US20080133709A1 (en) 2006-01-12 2008-06-05 Eliezer Aloni Method and System for Direct Device Access
US7398386B2 (en) 2003-04-12 2008-07-08 Cavium Networks, Inc. Transparent IPSec processing inline between a framer and a network component
US20080320016A1 (en) 2007-06-19 2008-12-25 Raza Microelectronics, Inc. Age matrix for queue dispatch order
US20090024804A1 (en) 1999-08-31 2009-01-22 Wheeler William R Memory controllers for processor having multiple programmable units
US20090070768A1 (en) * 2007-09-11 2009-03-12 Shubhodeep Roy Choudhury System and Method for Using Resource Pools and Instruction Pools for Processor Design Verification and Validation
US20090249094A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Power-aware thread scheduling and dynamic use of processors
US20090300606A1 (en) 2008-05-28 2009-12-03 Troy Miller Virtual machine migration with direct physical access control
US7657933B2 (en) 2003-04-12 2010-02-02 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processing architecture using multiple groups
US7661130B2 (en) 2003-04-12 2010-02-09 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processing architecture using multiple queuing mechanisms
US20100082603A1 (en) 2008-07-05 2010-04-01 Stepan Krompass Managing Execution Of Database Queries
US20100138829A1 (en) 2008-12-01 2010-06-03 Vincent Hanquez Systems and Methods for Optimizing Configuration of a Virtual Machine Running At Least One Process
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
US7814310B2 (en) 2003-04-12 2010-10-12 Cavium Networks IPsec performance optimization
US20100275199A1 (en) 2009-04-28 2010-10-28 Cisco Technology, Inc. Traffic forwarding for virtual machines
US20100332212A1 (en) 2008-09-19 2010-12-30 Ori Finkelman Method and apparatus for sleep and wake of computer devices
US20110161943A1 (en) * 2009-12-30 2011-06-30 Ibm Corporation Method to dynamically distribute a multi-dimensional work set across a multi-core system
US20110314478A1 (en) * 2009-02-24 2011-12-22 Comissariat A L'Energie Atmoique et aux Energies Allocation and Control Unit
US8156495B2 (en) * 2008-01-17 2012-04-10 Oracle America, Inc. Scheduling threads on processors
US20120096192A1 (en) 2010-10-19 2012-04-19 Hitachi, Ltd. Storage apparatus and virtual port migration method for storage apparatus
US20120179844A1 (en) 2011-01-11 2012-07-12 International Business Machines Corporation Dynamically assigning virtual functions to client applications
US20120260257A1 (en) 2004-08-12 2012-10-11 International Business Machines Corporation Scheduling threads in multiprocessor computer
US20130055254A1 (en) 2011-08-31 2013-02-28 Nokia Corporation Methods and apparatuses for providing a virtual machine with dynamic assignment of a physical hardware resource
US8424014B2 (en) 2009-02-27 2013-04-16 International Business Machines Corporation Method for pushing work request-associated contexts into an IO device
US20130097598A1 (en) 2011-10-13 2013-04-18 Cavium, Inc. Processor with dedicated virtual functions and dynamic assignment of functional resources
US8504750B1 (en) 2009-06-23 2013-08-06 Qlogic, Corporation System and method to process event reporting in an adapter
US8881150B2 (en) 2011-10-11 2014-11-04 Hitachi, Ltd. Virtual machine, virtual machine system and method for controlling virtual machine
US8892962B2 (en) 2011-12-07 2014-11-18 Hitachi, Ltd. Virtual computer system having SR-IOV compliant device mounted thereon and failure detection method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7743389B2 (en) 2007-11-06 2010-06-22 Vmware, Inc. Selecting between pass-through and emulation in a virtual machine environment
US9129060B2 (en) 2011-10-13 2015-09-08 Cavium, Inc. QoS based dynamic execution engine selection

Patent Citations (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356989B1 (en) 1992-12-21 2002-03-12 Intel Corporation Translation lookaside buffer (TLB) arrangement wherein the TLB contents retained for a task as swapped out and reloaded when a task is rescheduled
US5745778A (en) * 1994-01-26 1998-04-28 Data General Corporation Apparatus and method for improved CPU affinity in a multiprocessor system
US6189074B1 (en) 1997-03-19 2001-02-13 Advanced Micro Devices, Inc. Mechanism for storing system level attributes in a translation lookaside buffer
US6496847B1 (en) 1998-05-15 2002-12-17 Vmware, Inc. System and method for virtualizing computer systems
US6289369B1 (en) * 1998-08-25 2001-09-11 International Business Machines Corporation Affinity, locality, and load balancing in scheduling user program-level threads for execution by a computer system
US6253262B1 (en) 1998-09-11 2001-06-26 Advanced Micro Devices, Inc. Arbitrating FIFO implementation which positions input request in a buffer according to its status
US20090024804A1 (en) 1999-08-31 2009-01-22 Wheeler William R Memory controllers for processor having multiple programmable units
US7240203B2 (en) 2001-07-24 2007-07-03 Cavium Networks, Inc. Method and apparatus for establishing secure sessions
US6789147B1 (en) 2001-07-24 2004-09-07 Cavium Networks Interface for a security coprocessor
US6954770B1 (en) 2001-08-23 2005-10-11 Cavium Networks Random number generator
US6862694B1 (en) 2001-10-05 2005-03-01 Hewlett-Packard Development Company, L.P. System and method for setting and executing breakpoints
US7035889B1 (en) 2001-12-31 2006-04-25 Cavium Networks, Inc. Method and apparatus for montgomery multiplication
US7076059B1 (en) 2002-01-17 2006-07-11 Cavium Networks Method and apparatus to implement the data encryption standard algorithm
US7260217B1 (en) 2002-03-01 2007-08-21 Cavium Networks, Inc. Speculative execution for data ciphering operations
US7305567B1 (en) 2002-03-01 2007-12-04 Cavium Networks, In. Decoupled architecture for data ciphering operations
US7275249B1 (en) * 2002-07-30 2007-09-25 Unisys Corporation Dynamically generating masks for thread scheduling in a multiprocessor system
US7205785B1 (en) 2003-03-11 2007-04-17 Cavium Networks, Inc. Apparatus and method for repairing logic blocks
US6861865B1 (en) 2003-03-11 2005-03-01 Cavium Networks Apparatus and method for repairing logic blocks
US7209531B1 (en) 2003-03-26 2007-04-24 Cavium Networks, Inc. Apparatus and method for data deskew
US7814310B2 (en) 2003-04-12 2010-10-12 Cavium Networks IPsec performance optimization
US7398386B2 (en) 2003-04-12 2008-07-08 Cavium Networks, Inc. Transparent IPSec processing inline between a framer and a network component
US7657933B2 (en) 2003-04-12 2010-02-02 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processing architecture using multiple groups
US7661130B2 (en) 2003-04-12 2010-02-09 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processing architecture using multiple queuing mechanisms
US7337314B2 (en) * 2003-04-12 2008-02-26 Cavium Networks, Inc. Apparatus and method for allocating resources within a security processor
US20040216101A1 (en) 2003-04-24 2004-10-28 International Business Machines Corporation Method and logical apparatus for managing resource redistribution in a simultaneous multi-threaded (SMT) processor
US7372857B1 (en) 2003-05-28 2008-05-13 Cisco Technology, Inc. Methods and apparatus for scheduling tasks
US20040268105A1 (en) 2003-06-26 2004-12-30 Michaelis Scott L. Resetting multiple cells within a partition of a multiple partition computer system
US7310722B2 (en) * 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7421533B2 (en) 2004-04-19 2008-09-02 Intel Corporation Method to manage memory in a platform with virtual machines
US20050235123A1 (en) 2004-04-19 2005-10-20 Zimmer Vincent J Method to manage memory in a platform with virtual machines
US20120260257A1 (en) 2004-08-12 2012-10-11 International Business Machines Corporation Scheduling threads in multiprocessor computer
US20060288189A1 (en) 2005-06-15 2006-12-21 Rohit Seth Systems and methods to support partial physical addressing modes on a virtual machine
US20080013715A1 (en) * 2005-12-30 2008-01-17 Feghali Wajdi K Cryptography processing units and multiplier
US20080133709A1 (en) 2006-01-12 2008-06-05 Eliezer Aloni Method and System for Direct Device Access
US20070220203A1 (en) 2006-03-15 2007-09-20 Hitachi, Ltd. Management method for virtualized storage view
US20080074433A1 (en) * 2006-09-21 2008-03-27 Guofang Jiao Graphics Processors With Parallel Scheduling and Execution of Threads
US20080077909A1 (en) 2006-09-27 2008-03-27 Jamison Collins Enabling multiple instruction stream/multiple data stream extensions on microprocessors
US20080320016A1 (en) 2007-06-19 2008-12-25 Raza Microelectronics, Inc. Age matrix for queue dispatch order
US20090070768A1 (en) * 2007-09-11 2009-03-12 Shubhodeep Roy Choudhury System and Method for Using Resource Pools and Instruction Pools for Processor Design Verification and Validation
US8156495B2 (en) * 2008-01-17 2012-04-10 Oracle America, Inc. Scheduling threads on processors
US20090249094A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Power-aware thread scheduling and dynamic use of processors
US20090300606A1 (en) 2008-05-28 2009-12-03 Troy Miller Virtual machine migration with direct physical access control
US20100082603A1 (en) 2008-07-05 2010-04-01 Stepan Krompass Managing Execution Of Database Queries
US20100332212A1 (en) 2008-09-19 2010-12-30 Ori Finkelman Method and apparatus for sleep and wake of computer devices
US20100138829A1 (en) 2008-12-01 2010-06-03 Vincent Hanquez Systems and Methods for Optimizing Configuration of a Virtual Machine Running At Least One Process
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
US20110314478A1 (en) * 2009-02-24 2011-12-22 Comissariat A L'Energie Atmoique et aux Energies Allocation and Control Unit
US8424014B2 (en) 2009-02-27 2013-04-16 International Business Machines Corporation Method for pushing work request-associated contexts into an IO device
US20100275199A1 (en) 2009-04-28 2010-10-28 Cisco Technology, Inc. Traffic forwarding for virtual machines
US8504750B1 (en) 2009-06-23 2013-08-06 Qlogic, Corporation System and method to process event reporting in an adapter
US20110161943A1 (en) * 2009-12-30 2011-06-30 Ibm Corporation Method to dynamically distribute a multi-dimensional work set across a multi-core system
US20120096192A1 (en) 2010-10-19 2012-04-19 Hitachi, Ltd. Storage apparatus and virtual port migration method for storage apparatus
US20120179844A1 (en) 2011-01-11 2012-07-12 International Business Machines Corporation Dynamically assigning virtual functions to client applications
US20130055254A1 (en) 2011-08-31 2013-02-28 Nokia Corporation Methods and apparatuses for providing a virtual machine with dynamic assignment of a physical hardware resource
US8881150B2 (en) 2011-10-11 2014-11-04 Hitachi, Ltd. Virtual machine, virtual machine system and method for controlling virtual machine
US20130097598A1 (en) 2011-10-13 2013-04-18 Cavium, Inc. Processor with dedicated virtual functions and dynamic assignment of functional resources
US8892962B2 (en) 2011-12-07 2014-11-18 Hitachi, Ltd. Virtual computer system having SR-IOV compliant device mounted thereon and failure detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Single Root I/O Virtualization and Sharing Specification Revision 1.theta1," PCI-SIG®, pp. 1-100 (Jan. 20, 2010).
"Single Root I/O Virtualization and Sharing Specification Revision 1.θ1," PCI-SIG®, pp. 1-100 (Jan. 20, 2010).
Notice of Allowance mailed Feb. 13, 2015, issued in U.S. Appl. No. 13/272,937, entitled "Processor With Dedicated Virtual Functions and Dynamic Assignment of Functional Resources".

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495161B2 (en) 2011-10-13 2016-11-15 Cavium, Inc. QoS based dynamic execution engine selection
US20140129806A1 (en) * 2012-11-08 2014-05-08 Advanced Micro Devices, Inc. Load/store picker
US20170286120A1 (en) * 2016-03-30 2017-10-05 Qualcomm Incorporated Apparatus and method to maximize execution lane utilization through a custom high throughput scheduler
US10089114B2 (en) * 2016-03-30 2018-10-02 Qualcomm Incorporated Multiple instruction issuance with parallel inter-group and intra-group picking

Also Published As

Publication number Publication date
US20150363200A1 (en) 2015-12-17
US20130097350A1 (en) 2013-04-18
US9495161B2 (en) 2016-11-15

Similar Documents

Publication Publication Date Title
US9495161B2 (en) QoS based dynamic execution engine selection
US11036556B1 (en) Concurrent program execution optimization
US9396154B2 (en) Multi-core processor for managing data packets in communication network
US7769936B2 (en) Data processing apparatus and method for arbitrating between messages routed over a communication channel
US7647444B2 (en) Method and apparatus for dynamic hardware arbitration
US20140223053A1 (en) Access controller, router, access controlling method, and computer program
US20160142341A1 (en) Packet scheduling using hierarchical scheduling process
US8706940B2 (en) High fairness variable priority arbitration method
US9128769B2 (en) Processor with dedicated virtual functions and dynamic assignment of functional resources
US9954771B1 (en) Packet distribution with prefetch in a parallel processing network device
GB2381412A (en) Determining transmission priority for data frames from a plurality of queues
CN111666139B (en) Scheduling method and device for MIMO multi-service-class data queue
US11113101B2 (en) Method and apparatus for scheduling arbitration among a plurality of service requestors
US6937133B2 (en) Apparatus and method for resource arbitration
US20200304424A1 (en) Request arbitration by age and traffic classes
US8782665B1 (en) Program execution optimization for multi-stage manycore processors
US7451258B1 (en) Rotating priority queue manager
US10713089B2 (en) Method and apparatus for load balancing of jobs scheduled for processing
US20140050221A1 (en) Interconnect arrangement
US9977751B1 (en) Method and apparatus for arbitrating access to shared resources
WO2021115326A1 (en) Data processing method and apparatus, electronic device, storage medium, and program product
Morikawa et al. Reentrant line scheduling using weighted fair queuing

Legal Events

Date Code Title Description
AS Assignment

Owner name: CAVIUM, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ANSARI, NAJEEB I.;CARNS, MICHAEL;SCHROEDER, JEFFREY;AND OTHERS;SIGNING DATES FROM 20120103 TO 20120105;REEL/FRAME:027604/0695

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, ILLINOIS

Free format text: SECURITY AGREEMENT;ASSIGNORS:CAVIUM, INC.;CAVIUM NETWORKS LLC;REEL/FRAME:039715/0449

Effective date: 20160816

Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL

Free format text: SECURITY AGREEMENT;ASSIGNORS:CAVIUM, INC.;CAVIUM NETWORKS LLC;REEL/FRAME:039715/0449

Effective date: 20160816

CC Certificate of correction
AS Assignment

Owner name: QLOGIC CORPORATION, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JP MORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:046496/0001

Effective date: 20180706

Owner name: CAVIUM NETWORKS LLC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JP MORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:046496/0001

Effective date: 20180706

Owner name: CAVIUM, INC, CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JP MORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:046496/0001

Effective date: 20180706

AS Assignment

Owner name: CAVIUM, LLC, CALIFORNIA

Free format text: CERTIFICATE OF CONVERSION AND CERTIFICATE OF FORMATION;ASSIGNOR:CAVIUM, INC.;REEL/FRAME:047185/0422

Effective date: 20180921

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

AS Assignment

Owner name: CAVIUM INTERNATIONAL, CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CAVIUM, LLC;REEL/FRAME:051948/0807

Effective date: 20191231

AS Assignment

Owner name: MARVELL ASIA PTE, LTD., SINGAPORE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CAVIUM INTERNATIONAL;REEL/FRAME:053179/0320

Effective date: 20191231

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8