CN1842770A

CN1842770A - Integrated mechanism for suspension and deallocation of computational threads of execution in a processor

Info

Publication number: CN1842770A
Application number: CN 200480024800
Authority: CN
Inventors: 凯文·基塞尔
Original assignee: MIPS Technologies Inc
Current assignee: MIPS Tech LLC
Priority date: 2003-08-28
Filing date: 2004-08-26
Publication date: 2006-10-04
Also published as: CN100538640C; CN1842769A; CN1846194A; CN1846194B; CN1842771A; CN100489784C

Abstract

A mechanism for processing in a processor enabled to support and execute multiple program threads includes a parameter for scheduling a program thread and an instruction disposed within the program thread and enabled to access the parameter. When the parameter equals a first value the instruction, when issued by a program thread, reschedules the program thread in accordance with one or more conditions encoded within the parameter.

Description

A kind ofly in processor, hang up and discharge the integrated mechanism of computational threads in the implementation and the related application of the mutual reference of the present invention

The present invention requires the right of priority of following application:

(1) the temporary transient application case No.60/499 of the U.S., 180, August 28 2003 applying date, its title is " the special expansion (Multithreading Application SpecificExtension) that multithreading is used " (procurator's label P3865, inventor Kevin Ji Saier (Kevin D.Kissell), express numbering EV 315085819 US)

(2) the temporary transient application case No.60/502 of the U.S., 358, September 12 2003 applying date, its title is " the special expansion (MultithreadingApplication Specific Extension to a Processor Architecture) that multithreading is used on a processor architecture " (procurator's label 0188.02US, inventor Kevin Ji Saier (Kevin D.Kissell), express numbering ER456368993US) and

(3) the temporary transient application case No.60/502 of the U.S., 359, September 12 2003 applying date, its title is " the special expansion (MultithreadingApplication Specific Extension to a Processor Architecture) that multithreading is used on a processor architecture " (procurator's label 0188.03US, inventor Kevin Ji Saier (Kevin D.Kissell), express numbering ER456369013US), the full content of each application case of wherein mentioning is all the reference of institute of the present invention reference.

The present invention also with the application in U.S. nonvolatile application case No. (not receiving as yet) relevant, October 10 2003 applying date, its title is " determining the mechanism (Mechanisms for Assuring Quality of Servicefor Programs Executing on a Multithreaded Processor) of the service quality of the program carried out on the multiline procedure processor " (procurator's label 3865.01, inventor Kevin Ji Saier (Kevin D.Kissell), express numbering EL988990749 US), the full content of the application case of mentioning here is all the reference of institute of the present invention reference.

Technical field

The invention belongs to the field (for example, microprocessor, digital signal processor, microcontroller or the like) of digital processing unit, particularly relevant for, relate in single processor the apparatus and method of the execution of a plurality of threads of management.

Technical background

In the field of digital computation, the developing history of computing power has shown all lasting progress in every respect.The progress that continues is taking place always, and for example the device density of processor and the technology of interconnect can be used for improving arithmetic speed, fault-tolerant ability, use clock signal or more other improvement more at a high speed.Another can improve overall calculation Research on ability field is parallel processing, and it not only comprises a plurality of processors execution parallel work-flows that separate of use.

Concept of parallel processing comprises task is dispersed to a plurality of processors that separate, but also comprise the scheme that a plurality of programs are carried out simultaneously on a processor.This scheme is commonly referred to as multithreading.

Next will introduce the notion of multithreading:, be hidden in more and more difficult that delay intrinsic in the operation of computer system (latency) also becomes along with the processor operations frequency is accelerated gradually.An advanced processor has been lost centesimal instruction in its data cache in an application-specific, if it then may cause general 50 percent time to be paused for the delay that the outer RAM of sheet has 50 cycles.If when this processor pauses because of the high-speed cache instruction of losing, the instruction that belongs to another different application can be performed, therefore this performance of processors can be improved, and a part of or whole also can effectively being eliminated with the internal memory delay associated.For instance, Figure 1A has shown the single instruction stream 101 that pauses because of cache miss.Support the machine of this instruction running only can in a time, carry out single thread or task.Opposite, Figure 1B has shown that instruction stream 102 can be performed when instruction stream 101 pauses.In the middle of this situation, this support machine can be supported two threads simultaneously, the resource that also therefore more effective this machine of use is had.

Say that more generally each independent computer instruction all has specific grammer, make different types of instruction need different resources to go the computing of carry out desired.The integer load does not fully use the logical OR register of whole Float Point Unit, and any computing except register shift all needs to use the resource of load/store unit.The single instruction of neither one uses the resource of whole processors, and after therefore having added more pipeline stages and parallel function unit in order to pursue more high performance design, meeting and then reduction are on average used and the ratio of whole processor resources that consume by all instructions.

Most stems from the development of multithreading, if a sequential programme can not use whole resources of processor basically efficiently fully, this processor just should be able to distribute the part in these resources in the middle of a plurality of threads that belong to the program execution.This mode might not cause any specific program to be carried out faster, in fact, some multithreading schemes have reduced the performance that the single thread program is carried out in fact, yet but can make being integrated in the shorter time and/or using the less processor number to move of a parallel instruction stream.This notion can illustrate with Fig. 2 A and Fig. 2 B, has wherein shown single thread processor 210 and dual thread processor 250 separately.Processor 210 is supported single thread 212, is expressed as to use load/store unit 214.Lose if when accessing cache 216, take place one, just processor 210 can pause (describing according to Figure 1A) up to regaining this obliterated data so.In the middle of this process, multiplier/divider unit 218 is in idle state all the time and is not effectively used.Yet processor 250 is supported two threads; Promptly 212 and 262.Therefore, if thread 212 pauses, processor 250 is simultaneously execution thread 262 and multiplier/divider unit 218 still, thereby has more effectively utilized all resources (describing according to Figure 1B).

On single-processor, has the benefit that multithreading can obtain better multitasking ability.Yet, bundle a plurality of program threads and on critical event, can reduce the event response time, and the parallel processing of thread-level can be fully utilized in single application program on the principle.

Various multithreading processing modes have been proposed.One of them is the staggered multithreading (interleaved multithreading) of instruction, and just time-sharing multiplex (TDM) scheme switches to another thread for each instruction of sending from a thread.This scheme has to a certain degree " fairness " in scheduling, but for a plurality of initiation grooves of static allocation (issue slot) to a plurality of threads, can limit the performance of single program threads usually.Dynamically staggered mode can improve this problem, but realizes this mode more complicated.

The scheme of another multithreading is the staggered multithreading (blocked multithreading) of piece, it sends continuous a plurality of instruction constantly from a single program threads, take place up to certain specific obstructing event, cache miss or reset for example, for instance, cause this thread to be suspended and another thread is activated.Because the frequency of the staggered multithreading conversion thread of piece is less, so its implementation can be fairly simple.On the other hand, the action of obstruction is not have " fairness " in the process of scheduling thread.Single thread can be monopolized the entire process device a very long time, if it luckily can find all data of its needs in the extreme in high-speed cache.A kind of mixed scheduling scheme combines the specific of the staggered multithreading of the staggered multithreading and instruction of piece, also often is used and studies.

The form that another kind of multithreading is still arranged is a kind of scheme that realizes on superscalar processor for while multithreading (simultaneousmultithreading).In synchronizing multiple threads, can be sent simultaneously from a plurality of instructions of different threads.For example, a superscale Reduced Instruction Set Computer (RISC), the phase is sent two instructions altogether weekly, and the superscalar pipeline of a synchronizing multiple threads, and any sends and amounts to two instructions from two threads the phase weekly.The cycle that those single program threads depended on or paused can cause this processor not to be fully utilized, and therefore these cycles can be filled up by the instruction of sending of another thread in synchronizing multiple threads.

Therefore synchronizing multiple threads also becomes a very useful technology, in order to the efficient that solves and recover to be wasted in superscalar pipeline.But be considered to the most complicated employed method of multi-threaded system also disputablely,, can make the realization of memory access protective device complicated more or the like because in one-period, activate more than a thread.Another one is more noticeable, and for certain workload, the pipelineization of central processing unit (CPU) operation is perfect more, can reduce the efficient of the potential acquisition of its multithreading realization more.

Multithreading comes down to very relevant with multiple processing.In fact, can think that generally wherein difference has only with just a multi-processor shared drive and/or circuit line, and multiline procedure processor is gone back shared instruction and is extracted and send logic and other possible processor resource except shared drive and/or circuit line.In the middle of single multiline procedure processor, each thread is competed mutually and is initiated groove and other resource, thereby has limited parallel processing capability.Some multithread programs has supposed that with framework model new thread can be assigned to different processors, makes that this program can be by effective parallel processing.

When the application's case is filed an application, should have many multithreading solutions, in order to solve the many different problems in current field.One of them is the improvement scheme of real-time thread.In general, the real-time multimedia algorithm is performed in application specific processor/digital signal processor (DSP) usually, in order to guarantee service quality (QoS) and response time, and do not comprise and thread is mixed and be shared in the middle of the multithreading scheme, because real-time software and can't obtain easily to guarantee that this software can in time be carried out.

In this respect, what be perfectly clear must have a scheme and mechanism, groove is initiated in the instruction that allows one or more real-time threads or virtual processor can guarantee obtain special ratios in multiline procedure processor at the interval between specific instruction, thereby computation bandwidth and response time can clearly be defined.If such mechanism can be used, the thread that then has strict qos requirement can be comprised in the using with of this multithreading.In addition, the real-time thread in this system (thread relevant as DSP) can be avoided in running into interruption more or less because of moving the time that valuable source changes execution.This technology can be used under special situation has risc processor and the kernel that DSP adds powerful, to replace general RISC and the DSP kernel that separates that use in the consumer multimedia application.

When the application's case was filed an application, another problem of multithreading scheme was an output and the thread of eliminating activation in processor in the current techniques.In order to support relative fine granularity multithreading (fine-grained multithreading), expectation produces and eliminates the parallel thread of program process with possible minimum overhead, and can not interfere with necessary operation system function at least under general situation.Aspect this, what know needs is that some instructions are as FORK (thread generation) and JOIN (thread termination).Another problem that is present in the multiline procedure processor is, dispatching principle makes a thread continuous service up to being blocked by other resources, yet a thread that is not blocked by any resource needs still to make that this processor switches to other thread.So can clearly know aspect this, need PAUSE or YIELD instruction in addition.

Summary of the invention

A basic purpose of the present invention provides a kind of robust system that is applicable to the fine granularity multithreading, wherein can utilize minimum system overhead to produce and eliminate thread.According to this purpose, in a preferred embodiment according to the present invention, can support and carry out in the processor of multiprogram thread at one, a kind of treatment mechanism is provided, it comprises: a parameter is used to dispatch a program threads; With an instruction that is arranged in this program threads and can this parameter of access.When this parameter equaled first numerical value, this instruction promptly rescheduled this program threads according to the one or more conditions that are coded in this parameter.In relevant this machine-processed preferred embodiment, this parameter is stored in the data storage device.And in another preferred embodiment, when this parameter equaled second value, wherein this second value was not equal to this first numerical value, and this instruction promptly discharges this program threads.In certain embodiments, this second value is zero.

In certain embodiments, when this parameter equaled this second value, wherein this second value was not equal to this first numerical value, and this instruction is unconditional this program threads that reschedules.In addition in certain embodiments, this second value is an odd number value.In other the preferred embodiment, this second value is negative 1 at some.

In some preferred embodiments, to convey other threads relevant up to the program threads that a condition is satisfied with carrying out chance for condition in these one or more conditions.In addition, in certain embodiments, this condition is encoded in the bit vector or bit field in this parameter.Also have, in certain embodiments, under the situation that this program threads is rescheduled, the execution of this program threads should continue after the instruction in the middle of this thread.Also have, in some other preferred embodiment, when this parameter equaled third value, wherein this third value was not equal to this first numerical value and second value, and this instruction unconditionally reschedules this program threads.

Certain some relevant in this machine-processed preferred embodiment, a condition in these one or more conditions is a hardware interrupts.Also have, in certain embodiments, a condition in these one or more conditions is a software interruption.Also have, in many examples, under the situation that this program threads is rescheduled, the execution of this program threads should continue instruction position afterwards in this thread.

According to a further aspect in the invention, can support and carry out in the processor of multiprogram thread at one, a kind of method of carrying out or discharging this thread itself that rescheduled by a thread is provided, it comprises: (a) send an instruction, part of records in data storage device of its energy access, the relevant one or more parameters of one or more conditions that this partial record has been encoded and whether this thread of decision can be rescheduled; (b) follow this condition, reschedule or discharge this thread according to the one or more parameters in this partial record.In a preferred embodiment, this record is placed in the general-purpose register (GPR).In addition, in a preferred embodiment, the thread d/d with this rather than that rescheduled of a parameter in these parameters is relevant.In some preferred embodiments, the value of this parameter relevant with this d/d thread is zero.

In the middle of some implemented the embodiment of this method, a parameter in these parameters was relevant with the thread of the wait scheduling of being requeued.In the middle of some embodiment, this parameter is odd number value arbitrarily.In the middle of some embodiment, this parameter is negative 1 two's complement value.In certain embodiments, to convey other threads relevant up to the embodiment that specified conditions are satisfied with carrying out chance for parameter in these parameters.In addition, in further embodiments, this condition is encoded in the bit vector or one or more bit field in this record.

In the middle of the embodiment of many these methods of enforcement, send under this instruction and the quilt situation that reschedules at this thread, when these one or more conditions were satisfied, continued the position after the execution of this thread this instruction that this thread sends in the thread instruction stream.In the middle of some embodiment, a parameter in these parameters is relevant with the thread that is released rather than rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of other some embodiment, a parameter in these parameters is released with this rather than is relevant by the thread that rescheduled, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.In the middle of other some embodiment, parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters to convey other thread relevant up to the thread that a certain conditions is satisfied with carrying out chance.In addition, in the middle of other some embodiment, the thread that a parameter in these parameters is released with this rather than is rescheduled is relevant, and another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters to convey other thread relevant up to the thread that a certain conditions is satisfied with carrying out chance.

According to another aspect of the present invention, providing a kind of can support and the digital processing unit of carrying out a plurality of software entitys, comprise a part of records in the data storage device, this partial record one or more parameters relevant of having encoded with one or more conditions, these one or more conditional decisions when a thread will be carried out chance and convey other thread this thread whether can be rescheduled.

In the middle of some embodiment of this processor, this partial record is placed in the general-purpose register (GPR).In some other preferred embodiments, a parameter in these parameters is relevant with the thread that is released rather than rescheduled.In other preferred embodiments, the value of parameter that should be relevant with d/d thread is zero.

In the middle of some other embodiment of this processor, a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of some embodiment, this parameter is odd number value arbitrarily.In the middle of other embodiment, this parameter is negative 1 two's complement value.In further embodiments, to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.In addition, in some cases, this parameter can be encoded in the bit vector or one or more bit field in this record.

In the middle of some other embodiment of this processor, the thread that a parameter in these parameters is released with this rather than is rescheduled is relevant, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of other some embodiment, parameter in these parameters is relevant with the thread that is released rather than rescheduled, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.In the middle of other some embodiment, a parameter in the middle of this parameter is relevant with the thread that is rescheduled by the wait of being requeued, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

In the middle of other some embodiment, a parameter in these parameters is relevant with the thread that is released rather than rescheduled, and another parameter in the middle of this parameter is relevant with the thread of wait for being dispatched by requeuing, and another parameter again in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

According to a further aspect in the invention, provide a kind of and can support and the disposal system of carrying out a plurality of program threads, comprising: a digital processing unit; Part of records in data storage device, this partial record has been encoded and the relevant one or more parameters of one or more conditions that determine whether a thread can be rescheduled; With an instruction set, it comprises an instruction in order to reschedule and to discharge this thread.When this thread sent this instruction, this instruction accessing these one or more parameters in should record, and this system follows this one or more conditions according to these the one or more parameters in this partial record, rescheduled and discharge the thread that sends this instruction.

In the middle of some preferred embodiments of this disposal system, this record is placed in the general-purpose register (GPR).Other preferred embodiments at some, a parameter in these parameters is relevant with the thread that is released rather than rescheduled.In certain embodiments, the value of this parameter relevant with d/d thread is zero.In the middle of other the embodiment, a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued at some.In the middle of some embodiment, this parameter that is used for rescheduling is odd number value arbitrarily.In the middle of other some embodiment, this parameter that is used for rescheduling is negative 1 two's complement value.

In the middle of some embodiment of this system, to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.In addition, in certain embodiments, this parameter is encoded in the bit vector or one or more bit field in this record.In the middle of many embodiment of this system, a thread send this instruction and the situation that rescheduled conditionally under, when these one or more conditions were satisfied, continued the position of the execution of this thread after should instruction in this thread instruction stream.

In the middle of some embodiment of this disposal system, a parameter in the middle of this parameter is relevant with the thread that is released rather than rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of other some embodiment, a parameter in the middle of this parameter is relevant with the thread that is released rather than quilt is rescheduled, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

In the middle of other some embodiment, a parameter in the middle of this parameter is relevant with the thread that is rescheduled by the wait of being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.In addition, in the middle of other some embodiment, a parameter in the middle of this parameter is relevant with the thread that is released rather than rescheduled, and another parameter in the middle of this parameter is relevant with the thread of wait for being dispatched by requeuing, and another parameter again in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

In addition, according to a further aspect of the invention, a kind of digital storage medium is provided, write instruction on it from an instruction set, be used on a digital processing unit, carrying out each software thread of a plurality of software threads, this instruction set has comprised an instruction, this instruction makes the thread that sends this instruction abandon carrying out, an and parameter in the record of the some in data storage device of access, wherein, about condition and this parameter correlation that discharges or reschedule, and follow this condition, this parameter in partly writing down according to this is carried out release or is rescheduled.

In the middle of some embodiment of this medium, this record is placed in the general-purpose register (GPR).In some other embodiment of this medium, a parameter in these parameters is relevant with the thread that is released rather than rescheduled.In certain embodiments, the value of parameter that should be relevant with d/d thread is zero.In the middle of some other embodiment of this medium, a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of some embodiment, this parameter is odd number value arbitrarily.In the middle of other embodiment, this parameter is negative 1 two's complement value.

In the middle of other embodiment of this medium, to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.In further embodiments, this parameter is encoded in the bit vector or one or more bit field in this record.In the middle of other embodiment, a parameter in the middle of this parameter is relevant with the thread that is released rather than rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.In the middle of other some embodiment, a parameter in the middle of this parameter is relevant with the thread that is released rather than quilt is rescheduled, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

In the middle of some embodiment of this mechanism, a parameter in the middle of this parameter is relevant with the thread that is rescheduled by the wait of being requeued, and another parameter in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.In addition, in the middle of some embodiment of this digital storage medium, a parameter in the middle of this parameter is relevant with the thread that is released rather than rescheduled, and another parameter in the middle of this parameter is relevant with the thread of wait for being dispatched by requeuing, and another parameter again in these parameters to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance.

In the middle of some embodiment of this mechanism, this instruction is a YIELD instruction.Also have, in the middle of some embodiment of this mechanism, this partial record comprises a bit vector.In addition, in the middle of some other embodiment of this mechanism, this partial record comprises one or more multiple bit fields.

In the middle of some embodiment of this method, this instruction is a YIELD instruction.Also have, in the middle of some embodiment of this disposal system, this instruction is a YIELD instruction.

In the middle of the embodiment of this digital storage medium, this instruction is a YIELD instruction.

According to another aspect of the present invention, a kind of computer data signal that is included in the transmission medium is provided, comprise computer-readable program code, this program code has been described the processor that can support and carry out the multiprogram thread, and comprise the mechanism that is used to discharge and reschedules a thread, this program coding comprises: first program code segments, be used for describing a part of record of a data storage device, this partial record has been encoded and the relevant one or more parameters of one or more conditions that determine whether a thread can be rescheduled; With second program code segments, be used for describing an instruction of these one or more parameters that can this record of access, wherein, when this thread sends this instruction, these one or more values during this instruction accessing should write down, and follow this one or more conditions, reschedule or discharge this thread according to these one or more values.

According to a further aspect in the invention, in the middle of the processor that can support the multiprogram thread, provide a kind of method, having comprised: carry out an instruction, the parameter that this instruction accessing is relevant with thread scheduling, wherein this instruction is included in the program threads; When this parameter equals first numerical value, then discharge this program threads according to this instruction.In the middle of some embodiment of this method, this first numerical value is zero.In the middle of other embodiment of this method, this method also comprises a step: when this parameter equaled second value, according to the execution of this this program threads of instruction suspends, wherein this second value was not equal to this first numerical value.In the middle of some embodiment of this method, this second value is represented, carries out the required condition that possesses of this program threads and does not satisfy.

In the middle of some other embodiment of this method, this condition is coded among this parameter with the form of a bit vector or value field.In the middle of other the embodiment, when this parameter equals third value, reschedule this program threads according to this instruction at some, wherein this third value is not equal to this first numerical value and second value.In the middle of some other embodiment, this third value is negative 1.In the middle of other embodiment, this third value is an odd number value.

According to another aspect of the present invention, in the middle of the processor that can support the multiprogram thread, a method is provided, comprising: carry out an instruction, the parameter that this instruction accessing is relevant with thread scheduling, wherein this instruction is contained in the program threads; When this parameter equals first numerical value, then according to the execution of this this program threads of instruction suspends.In the middle of some other embodiment of this method, this method also comprises a step: when this parameter equals second value, reschedule this program threads according to this instruction, wherein this second value is not equal to this first numerical value.

According to a further aspect in the invention, in the middle of the processor that can support the multiprogram thread, provide a method, having comprised: carry out an instruction, the parameter that this instruction accessing is relevant with thread scheduling, wherein this instruction is contained in the program threads; When this parameter equals first numerical value, then reschedule this program threads according to this instruction.In the middle of some embodiment of this method, this method also comprises a step: when this parameter equals second value, discharge this program threads according to this instruction, wherein this second value is not equal to this first numerical value.

Embodiments of the invention will provide a kind of real strong system that is used for the fine granularity multithreading for the first time in these embodiments with more detailed description hereinafter, make the generation system overhead used with eliminating thread minimize.

Description of drawings

Figure 1A is a synoptic diagram, shows the situation that single instruction stream pauses owing to cache miss;

Figure 1B is a synoptic diagram, shows that an instruction stream still can be performed when the instruction stream as Figure 1A is paused;

Fig. 2 A is a synoptic diagram, shows the single thread processor;

Fig. 2 B is a synoptic diagram, shows dual thread processor 250;

Fig. 3 is a synoptic diagram, has described according to one embodiment of present invention, and a processor supports first and second VPE;

Fig. 4 is a synoptic diagram, has described according to one embodiment of present invention, and a processor can be supported single VPE, and this VPE can further support three threads;

During Fig. 5 has shown according to one embodiment of present invention, the form of a FORK instruction;

During Fig. 6 has shown according to one embodiment of present invention, the form of a YIELD instruction;

Fig. 7 is a form, has shown a qualification mask that is used for the sixteen bit of GPR rs;

During Fig. 8 has shown according to one embodiment of present invention, the form of a MFTR instruction;

Fig. 9 is a form, illustrated according to one embodiment of present invention, the field of a MFTR instruction;

During Figure 10 has shown according to one embodiment of present invention, the form of a MTTR instruction;

Figure 11 is a form, illustrated according to one embodiment of present invention, MTTR instruction u and sel position;

During Figure 12 has shown according to one embodiment of present invention, the form of an EMT instruction;

During Figure 13 has shown according to one embodiment of present invention, the form of a DMT instruction;

During Figure 14 has shown according to one embodiment of present invention, the form of an ECONF instruction;

Figure 15 be according to one embodiment of present invention in, the explanation form of a system coprocessor privileged resource;

Figure 16 be according to one embodiment of present invention in, the framework of a ThreadControl register;

Figure 17 be according to one embodiment of present invention in, the explanation form of each field in ThreadControl register framework;

Figure 18 be according to one embodiment of present invention in, the framework of a ThreadStatus register;

Figure 19 be according to one embodiment of present invention in, the explanation form of each field in ThreadStatus register framework;

Figure 20 be according to one embodiment of present invention in, the framework of a ThreadContext register;

Figure 21 be according to one embodiment of present invention in, the framework of a ThreadConfig register;

Figure 22 be according to one embodiment of present invention in, the explanation form of each field in ThreadConfig register framework;

Figure 23 be according to one embodiment of present invention in, the framework of a ThreadSchedule register;

Figure 24 be according to one embodiment of present invention in, the framework of a VPESchedule register;

Figure 25 be according to one embodiment of present invention in, the framework of a Config4 register;

Figure 26 be according to one embodiment of present invention in, the explanation form of each field in Config4 register framework;

Figure 27 is a form, defined the abnormality code value of the unusual required Cause register of thread;

Figure 28 is a form, has defined the ITC designator;

Figure 29 is a form, has defined each field in the Config3 register framework;

Figure 30 is a form, has described the contextual VPE disable bit of each VPE;

Figure 31 is a form, has described the function mode that ITC stores;

Figure 32 is a synoptic diagram, the YIELD function operations in having described according to one embodiment of present invention;

Figure 33 is a synoptic diagram, has described a computer operating system according to an embodiment of the invention;

Figure 34 is a synoptic diagram, has described according to one embodiment of present invention, uses VPE and use in the VPE thread to implement scheduling in a processor.

Embodiment

According to a preferred embodiment of the present invention, a processor architecture comprises an instruction set, and this instruction set comprises a plurality of features, a plurality of function and a plurality of instruction, and can produce the computing of multithreading on a compatible processor.The present invention also is not limited to any specific processor architecture and instruction collection, but can roughly classify as well known and MIPS framework reference, instruction set and processor technology (in sum, being the MIPS technology).And add that the present invention embodiment described in detail also can classify as the MIPS technology.The information of more relevant MIPS technology (comprising that following institute is with reference to the file that arrives) can (MIPS ttechnology Inc.) (be positioned at Mountain View, California) with its website from MIPS scientific ﹠ technical corporation Www.mips.com(the said firm website) obtains.

The device that " processor " mentioned and " digital processing unit " its meaning comprise any programmable (for instance, microprocessor, microcontroller, digital signal processor, CPU (central processing unit), processor cores or the like), (for example be included in hardware aspect, but special-purpose silicon, scene program gate array (FPGA) or the like), (for example, hardware description language, C language, C+ language or the like) or any its composition (or its combination) aspect software.

Term " thread " is represented identical meaning in this article with " program threads ".

Summary description

In an embodiment of the present invention, " thread context " is the set of a processor state, is used to be described in the state that the instruction stream on the processor is carried out.Said state is reflected in the content of processor register usually.For instance, in a processor (MIPS processor) with commercial size MIPS32 and/or MIPS64 instruction set architecture compatibility, thread context is by general-purpose register (GPRs), just (Hi/Lo) multiplication result register has the register of programmable counter (PC) the function franchise system state of a control relevant with some to form.The part of system's state of a control so-called the 0th coprocessor (CP0) in the MIPS processor keeps, and (Translation Lookaside Buffer TLB) preserves (if having used TLB) by system control register and translation lookaside buffer for most.Opposite, " processor context " is a bigger processor state set, comprises at least one thread context.The MIPS processor of mentioning before referring again to is an example, a processor context comprises at least one thread context (as described above), just CP0 and necessary system state are in order to describe known MIPS32 and MIPS64 dedicated resources or privileged resource framework (PRA).(briefly, PRA be one group about the environment of instruction set architecture operation time institute's foundation and the set of ability parameter.This PRA provides the resource of the necessary mechanism of operating system in order to management processor, for example, and virtual memory, high-speed cache, unusual computing and user's context).

According to one embodiment of present invention, multithreading (the multithreading application-specific extensions that expands about special applications of an instruction set architecture and PRA, Multithreading ASE) allow in a processor, to comprise two differences, but not mutually exclusive multi-threading performance.At first, a single-processor can have the processor context of some, and wherein each all sees through the resource of sharing some processor and supports an instruction set architecture and as processing unit operation independently.These independently processing unit be called as Virtual Processing Unit (VPE) here.For software, the processor with N VPE is regarded as the multiprocessor (SMP) of N path and symmetry.This allows existing tool SMP function operations system can manage the VPE set, just the performance element of shared processing device pellucidly.

Fig. 3 has described relevant performance with a single-processor 301, and it has supported one the one VPE (VPE0), and VPE0 comprises the 0th buffer status 302 and the 0th system coprocessor state 304.Processor 301 is also supported the 2nd VPE (VPE1), and it comprises first buffer status 306 and the first system coprocessor state 308.These parts of VPE0 and VPE1 shared processing device 301 comprise and get finger, decoding, pipeline execution and high-speed cache 310.Be executed on this processor 301 with the operating system 320 of SMP compatibility, and support VPE0 and VPE1.As shown in the figure, software process A 322 and process C 326 are performed in respectively on VPE0 and the VPE1, are performed on two different processors as them.Process B 324 is in quene state, and can carry out on any of VPE0 or VPE1.

Second ability that multithreading ASE is allowed be, outside all can be in the basic framework required single thread context of each processor or VPE, contains the thread context of some number again.Multithreading VPEs needs special operating system support, and under supporting, it provides simple and easy, a fine granularity multithread programs model, wherein thread can be produced and be eliminated, making under general situation not can the perturbation operation system, and the system service thread can the response external condition (for example, incident or the like) arranges scheduling, and do not have the delay of interruption.

Fig. 4 has described this second ability and has used processor 401 to support single VPE, and it comprises buffer status 402,404 and 406 (supporting three threads 422) and system coprocessor state 408.Different with Fig. 3 is that three threads are in single application address space in this example, and share CP0 resource (and hardware resource) on single VPE.A special multithreading operation system 420 has also been described in addition.In this example, multithreading VPE is handling the packet from a broadband network 450, and the download of this packet is distributed in the first-in first-out buffer (FIFO) 452 (each FIFO all has different addresses in the I/O memory headroom of multithreading VPE) of whole group.Controlling application program has produced abundant thread, and is identical with the FIFO number that uses, and each thread is applied in reading the tight loop of FIFO.

A thread context can be one of four kinds of states.It can be idle (free), activates (activated), stops (halted) or line (wired).The thread context of a free time does not have effective content, and can not be scheduled as and can send instruction.The thread context of an activation can be dispatched according to the rule of implementing, and extracts and send instruction from programmable counter.A thread context that stops can to have effective content, but can not extract and send instruction.The thread context of a line can be designated be used as image register, that is to say that it is retained as to be exclusively used in exception handler, stores in this exception handler and recovers the expense that memory context was produced avoiding.The thread context of a free time cannot be to activate, and stops or line.Have only the thread context of activation to be scheduled.Have only the thread context of free time can be assigned with and produce new thread.

For the fine granularity that allows collaborative thread synchronous, memory space for communication (ITC) between the thread inside is produced in virtual memory, and free (empty)/full (full) position grammer is used for allowing thread to get clogged when loading or store, and is produced or consumes by other thread up to data.

Thread generation/elimination and synchronizing characteristics in the ordinary course of things can the intervention operation systems, but the resource that they controlled can be virtual with it by operating system.This has allowed multithread programs can utilize more virtual thread to carry out, and its number is more than the thread context number on a VPE, and makes the load of migration energy balance multicomputer system of thread.

Careful again certain from implementation is put, and a particular thread context on thread and the specific VPE bundles.The thread context combined indexes of VPE provides a unique identifier at the time of origin of this point.But contextual switching can make the execution of single thread in succession have a succession of different thread index with migration, for example is on a succession of different VPE.

The dynamic binding of a plurality of VPE on execution thread context, TLB project and other resource and the same processor in a specific processor reset adjustment state.Each VPE imports its vector of resetting, and is exactly a separate processor as it.

The execution of multithreading and unusual model

Multithreading ASE does not force the execution that any special implementation or scheduling model are used for parallel thread and VPE.Scheduling mode can be circulating, the time cutting of any granularity or while.Yet the neither one implementation allows the thread of an obstruction to monopolize any shared processor resource, makes the hardware operation be absorbed in deadlock then.

In a MIPS processor, a plurality of thread execution are on a single VPE, and all shared the same system coprocessor (CP0), the same TLB and the same virtual address space.Each thread has independently kernel/overseer/user's state, is used for the decoding of internal storage access and instruction.When unusual a generation, except carrying out this unusual thread, all threads all are stopped or hang up, and are eliminated up to the EXL and the ERL position of status word string.Perhaps, debug under the unusual situation, withdraw from this debugging mode at EJTAG.This status word string is placed in the status register among the CP0.Can obtain from two following publications about this EXL and ERL position also have the unusual particulars of EJTAG debugging, can obtain this publication from MIPS scientific ﹠ technical corporation, and its full content can be classified the reference paper of this paper: MIPS32 as under various situations ^TMArchitecture for Programmers Volumn III:TheMIPS32 ^TMPrivileged Resource Architecture, Rev.2.00, MIPS scientific ﹠ technical corporation (2003) and MIPS64 ^TMArchitecture for Programmers Volumn III:TheMIPS64 ^TMPrivileged Resource Architecture, Rev.2.00, MIPS scientific ﹠ technical corporation (2003).

The synchronous unusual exception handler that causes because carrying out an instruction stream, for example TLB lose unusual with floating-point, all carry out by this thread that is used to carry out this instruction stream.When the asynchronous exception of conductively-closed not, for example interruption, when being promoted to a VPE, its realization is relevant with that thread of having carried out this exception handler.

Even when using the shadow register group to carry out exception handler, each is unusually all relevant with a thread context.RDPGPR and WRPGPR instruction target to be processed that this relevant thread context is carried out by exception handler.About the detailed description of RDPGPR and WRPGPR instruction (with visiting shadow register) can obtain from following two publications, can obtain this publication from MIPS scientific ﹠ technical corporation, and its full content can be classified the reference paper of this paper: MIPS32 as under various situations ^TMArchitecture forProgrammers Volumn III:The MIPS32 ^TMInstruction Set, Rev.2.00, MIPS scientific ﹠ technical corporation (2003) and MIPS64 ^TMArchitecture for ProgrammersVolumn III:The MIPS64 ^TMInstruction Set, Rev.2.00, MIPS scientific ﹠ technical corporation (2003).

This multithreading ASE has comprised two unusual conditions.First is the situation that thread is not obtained, and one of them thread distributes requirement not to be satisfied.Second is the thread underflow case, and the termination of one of them thread makes do not have thread to be distributed on the VPE with release.It is unusual that these two kinds of unusual conditions all are mapped to a new single thread.When this unusual generation, they can be provided with according to the position of CP0 register and be distinguished.

Instruction

In a preferred embodiment, multithreading ASE comprises seven instructions.FORK and YIELD instruction control thread distribute, and discharge and scheduling, and can obtain in the whole execution patterns that are performed and enable.MFTR and MTTR instruction are system coprocessor (Cop0) instructions, can be used for franchise system software and come the management thread state.The Cop0 instruction that new EMT instruction and a new DMT instruction are privileges, it is used for activating the multithreading operation with a VPE of forbidden energy.At last, a new ECONF instruction is the Cop0 instruction of privilege, is used to withdraw from a special processor configuration status and reinitializes this processor.

FORK-distributes and new thread of scheduling

The FORK instruction can be ordered about an idle thread context and be assigned with and activate.Its form 500 as shown in Figure 5.The FORK instruction is from obtaining two operand values by field 502 (rs) with the GPR (general-purpose register) that 504 (rt) identify.The content of GPR rs is to be used for new thread is begun the address of extracting and carrying out.The content of GPR rt is a value, is used for being sent to the GPR of new thread.Purpose GPR determines that by the value of the ForkTarget field of the ThreadConfig register of CP0 its explanation is arranged in Figure 21, and in being described after a while.Kernel/the overseer of new thread/user's state is set in the thread of FORK processing.If do not have the idle thread context to use for this FORK instruction, produce about a thread of this FORK instruction unusual.

Thread of YIELD-reschedule and release with good conditionsi

The YIELD instruction makes that current thread is rescheduled.Its form 600 as shown in Figure 6, and the process flow diagram among Figure 32 3200 described system operation according to an embodiment of the invention, and the function of YIELD instruction is described.

YIELD instruction for example from field 602 (rs) GPR of appointment obtain a single operation numerical value.In a preferred embodiment, used a GPR, but in other embodiments, this operand value can substantially anyly can be stored or obtain in the data memory device (for example, non-GPR register, internal memory or the like) by system's visit.In one embodiment, the content of GPR rs can be considered to be a descriptor, has described a situation that the thread that sends should be rescheduled.If the content of this GPR rs is zero (promptly the value of this operand is zero), shown in the step 3202 of Figure 32, then this thread can't be by reschedule, but can be as being released shown in the step 3204 (promptly, termination or permanent stopping further to be carried out), and the reservoir of relative thread context (being the above-mentioned register of mentioning that is used for preservation state) becomes the free time, thereby can be distributed by the ensuing FORK instruction that other thread sends.If the least significant bit (LSB) of this GPR rs be set (that is, rs0=1), then this thread is rescheduled at once shown in the step 3206 of Figure 32, and if do not have other executable thread to try to be the first, just continue to carry out this thread.In this embodiment, the content of this GPR rs is regarded as 15 qualifier mask, as the description of form among Fig. 7 700 bit vector of various conditions (that is, be used to encode).

Please refer to form 700,10 expressions offer the hardware interrupt of processor to the position in the position 15 of register rs, the software interruption signal that produces in the 8 expression processors of position 9 to position, the most related operation that loads (Load Linked) and condition storage (Store Conditional) synchronization primitives in position 7 to the position 6 expression MIPS frameworks, also have position 5 to the position the 2 outside non-look-at-mes of representing to offer processor.

If the contents value of GPR rs is an even number (that is, position 0 be not set), and any other position in the qualifier mask of GPRrs all is set (step 3208), and then this thread is suspended, up to satisfying at least one respective conditions.If when this situation generation, this thread is just rescheduled (step 3210), and the instruction after YIELD restarts to carry out.The influence that enables to be subjected to the CP0.Status.iMn interrupt mask bit of this process, ten external conditions of therefore total total quilt position 15 to 10 and position 5 to 2 (as shown in Figure 7) coding (for example, incident or the like) and by four software condition of position 9 to 6 (as shown in Figure 7) coding, in present embodiment, be used to the response external signal enabling independently thread, and do not need the processor execute exception to handle.In this specific examples, six hardware interrupts and four non-look-at-mes are arranged, add two software interruption and two non-look-at-mes, add a signal (being rs0) that is exclusively used in the reschedule function at last, altogether corresponding to 15 conditions.(this CP0.Status.iMn interrupt mask bit is one eight set in CP0 Status register, and it optionally shields eight basic interruption inputs of MIPS processor.If an IM position is set, then other relevant interruption input just can not cause the anomalous event of processor.)

In the interrupt mode of EIC, position IP2 to IP7 encodes to the interruption that highest priority is arranged, and has not just represented a vector that the quadrature indication is arranged.When processor used the interrupt mode of EIC, therefore the position of the GPR rs that is associated with position IP2 to IP7 in a YIELD instruction no longer can be used to remove to enable again a thread at a specific external event.In the interrupt mode of EIC, has only the qualifier that can be used as YIELD with the external event designator (in the present embodiment, being the position 5 to 2 of GPR rs for example) of system's associated.The interrupt mode of EIC further has been described in following publication with position IP2 to IP7, has above pointed out and quoted the whole contents of this publication: MIPS32 ^TMArchitecture for Programmers Volumn III:The MIPS32 ^TMPrivilegedResource Architecture is with MIPS64 ^TMArchitecture for ProgrammersVolumn III:The MIPS64 ^TMPrivileged Resource Architecture.

If carry out the result of YIELD is to go up the release of the thread of distribution recently for processor or VPE, then about this YIELD instruction to produce a thread unusual, have underflow indication in the ThreadStatus of CP0 register (as shown in figure 18 and can be illustrated after a while).

The operand that is comprised among the GPR rs that the foregoing description has used YIELD to instruct is as the parameter of thread scheduling.In this example, this parameter is counted as the vector (with reference to Fig. 7,

position

1 and 15 is retained, so have only 15 conditions to be encoded) of one 15 quadrature indication in this preferred embodiment.It is the value (that is, being used to determine whether a given thread should be released, with reference to the step 3202 of Figure 32) of an appointment that this embodiment also treats as this parameter.Yet the characteristic of such parameter can be changed, to be fit to the embodiment of various different instructions.For example, not to rely on least significant bit (LSB) (being rs0) to determine whether that a thread can be rescheduled immediately, and the value that is to use this parameter itself (for example, negative one of two's complement form { 1}) decides a thread whether should to be rescheduled (that is what, be used to dispatch requeues) immediately.

In other embodiment of this instruction, can regard such thread scheduling parameter as comprise one or more multidigit values field, so that a thread can determine that it will produce about a single incident in the big time name space (for example, 32 or bigger).In such embodiments, relevant with this object event at least position can be by current YIELD instruction access.Certainly, desired as a specific embodiment, more bit field can be transferred into this instruction (being associated with more incident).

Other embodiment of this YIELD instruction can comprise the combination of aforementioned bit vector and value field in a thread scheduling parameter by this instruction access, or other concrete improvement and raising of using, and (for example) is to satisfy the needs of specific implementation.The optional embodiment of YIELD instruction can visit the parameter of a foregoing thread scheduling with any known method, for example, from a GPR (as shown in Figure 6), from arbitrarily other data storage device (comprising internal memory) and as the immediate value this instruction itself.

MFTR-moves from thread register

The MFTR instruction is a privilege (Cop0) instruction, can allow an operating system to carry out a thread and visit another different thread context.Its form 800 is described in Fig. 8.

Want accessed thread context to be determined by the value of the AlternateThread field of the ThreadControl register of CP0, this field as shown in figure 16 and can describe after a while.In selected thread context, the value of the rt operand register that the register that be read is demarcated by field 802 and the u and the sel position that lay respectively in the field 804 and 806 of this MFTR instruction are determined, and foundation form 900 shown in Figure 9 describes.The value that produces is written into the destination register rd that is demarcated by field 808.

MTTR-moves to thread register

The MTTR instruction is opposite with the MFTR instruction.It is a franchise Cop0 instruction, and it copies to the register value in the thread context of current thread in the register of another thread context.Its form 1000 as shown in figure 10.

Want accessed thread context to be determined by the value of the AlternateThread field of the ThreadControl register of CP0, this field as shown in figure 16 and can describe after a while.In selected thread context, the register that is written into is by the value in the rd operand register of field 1002 demarcation, determine in conjunction with the u and the sel position that in the

field

1004 and 1006 of this MTTR instruction, provide respectively, and foundation shown form 1100 in Figure 11 make an explanation (it is encoded similar in appearance to MFTR).Be copied to selected register by the value among the register rt of field 1008 demarcation.

EMT-enables multithreading

EMT instruction is franchise Cop0 instruction, and the TE position of its ThreadControl register by setting CP0 enables the executed in parallel of a plurality of threads, and this register as shown in figure 16 and can describe after a while.The form 1200 of this instruction is shown in Figure 12.The value that is included in this ThreadControl register of this EMT execution TE (Thread Enabled) place value before can be passed back register rt.

DMT-forbidden energy multithreading

DMT instruction is franchise Cop0 instruction, the executed in parallel that multithreading is forbidden in the TE position of its ThreadControl register by removing CP0, and this register is as shown in figure 16 and can describe after a while.The form 1300 of this instruction is shown in Figure 13.

Except the thread that sends this DMT instruction, all threads all be under an embargo further instruction fetch and execution.This and all thread halted states have nothing to do.The value that comprises this ThreadControl register of this DMT execution TE (Thread Enabled) place value before can be passed back register rt.

The configuration of ECONF-end process device

The ECONF instruction is a franchise Cop0 instruction, and it notifies VPE the end of configuration, and enables the execution of many VPE.The form 1400 of this instruction is shown in Figure 14.

When an ECONF instruction is performed, the VPC position of Config3 register (after a while describe) promptly is eliminated, and that the currency of the MVP position of this register also becomes is read-only, and all VPE of processor, comprise this VPE that carries out ECNOF, it is unusual all to produce a Reset.

Privileged resource (Privileged Resource)

The form 1500 of Figure 15 has been listed the privileged resource relevant with multithreading ASE of system coprocessor.Except special explanation, no matter be the register as described below of the 0th coprocessor new or that revised (CP0) all be addressable (promptly, write and read), just as the system control register of the 0th traditional coprocessor (that is MIPS processor) the same.

New privileged resource

(A) ThreadControl register (CP0 register number 7 is selected number 1)

This ThreadControl register is as a part of system coprocessor in each VPE.Its structure 1600 is shown in Figure 16.The field of this ThreadControl register can be set according to the form 1700 of Figure 17.

(B) ThreadStatus register (CP0 register number 12 is selected number 4)

This ThreadStatus register is in each thread context.Each thread all has the copy of its ThreadStatus, and the privileged program code can be by the ThreadStatus of MFTR and other thread of MTTR instruction access.Its structure 1800 is shown in Figure 18.The field of this ThreadStatus register can be set according to the form 1900 of Figure 19.

Write one 1 a Halted position of activating thread, can make this activations thread stop to extract and instruct, and inside is restarted programmable counter (PC) be set to the instruction that the next one sends.Write one 0 a Halted position of activating thread, make this thread that is scheduled restart programmable counter (PC) address extraction and execution command internally.As long as having arbitrary in the Activated position of the thread that is not activated or Halted position is 1, then this thread just can be avoided by a FORK command assignment and activation.

(C) ThreadContext register (CP0 register number 4 is selected number 1)

This ThreadContext register 2000 is present in each thread context, and its length is identical with the GPR of processor as shown in figure 20.This is the register of a software readable/write purely, can be operated the pointer that system stores as particular thread, for example the zone that thread context is preserved.

(D) ThreadConfig register (CP0 register number 6 is selected number 1)

This ThreadConfig register is among each processor or the VPE.Its structure 2100 is shown among Figure 21.The field of this ThreadConfig register is defined within the form 2200 of Figure 22.

The WiredThread field of ThreadConfig register allows to be cut apart being integrated between shadow register set and the executed in parallel thread of obtainable thread context on the VPE.If the index of thread context less than the value of this WireThread, then can obtain this thread context from shadow register.

(E) ThreadSchedule register (CP0 register number 6 is selected number 2)

The ThreadSchedule register is optionally, but when being implemented, preferably is implemented in each thread.Its structure 2300 is shown among Figure 23.

Scheduling vector (Schedule Vector) (as shown in the figure, its width is 32 in a preferred embodiment) is for to dispatch desired description of sending bandwidth for related linear program.In this embodiment, what each all represented this processor or VPE sends 1/32 of bandwidth, and each bit position has been represented a clear and definite period in the scheduling circulation that 32 periods are arranged.

If set one in the ThreadSchedule of thread register, this thread has been guaranteed that promptly per 32 possible are sent continuously on relevant processor or VPE so, can obtain sending the period of a correspondence.Write one 1 on of ThreadSchedule register, when other thread on same processor or the VPE has had the same ThreadSchedule position when setting, it is unusual then will to produce thread.Though here, the preferable width of ThreadSchedule register is 32, and that can expect arrives, and this width can change (that is, increase or reduce) in other embodiments.

(F) VPESchedule register (CP0 register number 6 is selected number 3)

The VPESchedule register is selectable, and preferably is present among each VPE.It has only when the MVP position of Config3 register is set Shi Caineng and is written into (please refer to Figure 29).Its form 2400 is shown in Figure 24.

Scheduling vector (as shown in the figure, its width is 32 in a preferred embodiment) is for to dispatch desired description of sending bandwidth for relevant VPE.In this embodiment, what each all represented VPE processor more than sends 1/32 of total bandwidth, and each bit position has been represented a clear and definite period in the scheduling circulation that 32 periods are arranged.

If a position in the VPESchedule register of VPE is set, this thread has been guaranteed that promptly per 32 possible are sent continuously on associative processor so, can obtain sending the period of a correspondence.Write one 1 on of the VPESchedule of VPE register, when other VPE has had the same VPESchedule position when setting, it is unusual then will to produce thread.

Thread scheduling principle (for example, round-robin method or the like) according to processor is given tacit consent at present is not ranked by any thread especially as long as send the period, and it still can be distributed to any executable VPE/ thread freely.

VPESchedule register and ThreadSchedule register have been created a structure of sending allocated bandwidth.The bandwidth for VPE has been specified in the setting of VPESchedule register, it is for all obtaining the certain proportion of bandwidth on a processor or kernel, and the ThreadSchedule register has been specified the bandwidth for thread, and it is for obtaining the certain proportion of whole bandwidth among the VPE that comprises thread at.

Though here, the preferable width of VPESchedule register is 32, can expect, this width can change (that is, increase or reduce) in other embodiments.

(G) Config4 register (CP0 register number 16 is selected number 4)

Register Config4 is present in each processor.It has comprised the necessary configuration information for dynamic many VPE processor configuration institute.If processor is not in VPE configuration status (that is, the VMC position of Config3 register is set), then the value of all fields except M (continuously) field all can become relevant with embodiment and be unpredictable its result.Its structure 2500 is described in Figure 25.The definition of the field of Config4 register is as shown in the form 2600 of Figure 26.In some implementation or embodiment, the VMC position of Config3 register can be that a quilt keeps/not specified position in advance.

Modification for the privileged resource framework of present existence

This multithreading ASE changes for some unit of current MIPS32 and MIPS64PRA.

(A) Status register

Configuration has some extra meanings for multithreading in CU position in the Status register.The action of setting the CU position just require with a coprocessor context with and the related thread in this CU position bind.If a coprocessor context is available, then it and this thread are bound together, so that the instruction that this thread sent can be sent to this coprocessor, and this CU position can keep and writes this 1.If neither one coprocessor context is available, just then this CU position can read back 0.Write one 0 and go to set this CU position, can make any coprocessor that is associated be released.

(B) Cause register

As shown in figure 27, thread needs the abnormality code value of a new Cause register unusually.

(C) EntryLo register

As shown in figure 28, a cache tag that is retained in advance becomes the ITC designator.

(D) Config3 register

As shown in the form 2900 of Figure 29, defined the field of new Config3 register, be used for representing whether multithreading ASE and a plurality of thread context be available.

(E)Ebase

As shown in figure 30, the Ebase register disable bit that the position 30 that is retained in advance becomes a VPE in each VPE context.

(F)SRSCtl

The HSS field that had before presetted has become a function of the WiredThread field of ThreadConfig register now.

Do not use the thread of FORK instruction to distribute and initialization

In a preferred embodiment, the process that an operating system " manually " produces a thread is as follows:

1. carry out a DMT, carry out in order to the execution or the possible FORK instruction that stop other thread.

2. instruct by continuous value being set in the AlternateThread field of ThreadControl register, and with MFTR and to read the ThreadStatus register, discern an obtainable thread context.The thread of a free time does not have Halted in its ThreadStatus register or the Activated position is set.

3. set the Halted position of the ThreadStatus register of selected thread, in order to avoid it by other thread configuration.

4. carry out an EMT instruction and remove to enable again multithreading.

5. use the MTTR instruction and make its u field be set at 1, the GPR that duplicates any needs is to selected thread context.

6. use the MTTR instruction and make its u and sel field be set at 0 and the rt field be set at 14 (EPC), restart in the address register thereby write required beginning executive address to the inside of this thread.

7. use the MTTR instruction with 0 and 1 Halted position and the Activated position that writes selected ThreadStatus register respectively.

This newly assigned thread can be scheduled then.If in this process, set EXL or ERL, because they have implied the execution of forbidding multithreading, then carry out DMT, setting the Halted position of new thread and these steps of execution EMT can be omitted.

Do not use the termination and the release of the thread of YIELD instruction

In a preferred embodiment of the present invention, the process that operating system is used for stopping current thread is as follows:

1. if operating system is not supported about the thread of thread underflow condition unusual, then use MFTR to instruct the setting that scans the ThreadStatus register, with check another thread that can move is arranged on processor, if opposite not having just sent rub-out signal to program.

2. the value that writes any important GPR register is to internal memory.

3. in the Status/ThreadStatus register, set kernel mode (Kernel mode).

4. when present thread maintains a privileged mode, remove EXL/ERL and allow other threads to be scheduled.

5. use the MTC0 of a standard to instruct and write 0 Halted and the Activated position that is worth to the ThreadStatus register.

Normal process is that a thread stops oneself in this manner.In the middle of a privileged mode, a thread also can use MTTR to instruct and stop another thread, only have extra problem and produce, at this moment operating system need determine discharge which thread context, and where on the some compute mode of this thread be stable.

The storage of cross-thread communication (Inter-Thread Communication Storage)

The storage of cross-thread communication (ITC) is a selectable functions, and it can substitute the association that is used for the fine granularity multithreading and be written into/condition storage method for synchronous.Because operate, be sightless so this ITC is stored in the instruction set architecture, but in the privileged resource framework, it is visible, and needs the support of effective micro-architecture by the action that is written into and stores.

With reference to the virtual memory page, it includes the TLB project that is denoted as the ITC storage, and can be classified as is a storage that particular community is arranged.Each page-map to one group 1-128 64 memory location, wherein all there is a relative Empty/Full position each memory location, and can use being written into and save command of standard, visits this memory location with one of four methods.This access module is coded in minimum effectively (with untranslated) position of the virtual address that is produced, shown in the form 3100 of Figure 31.

Therefore each storage location can be described with the structure of C language:

struct{

unit64?ef_sync_location；

unit64?force_ef_location；

unit64?bypass_location；

unit64?ef_state；

}ITC_location；

Wherein, whole four positions are all with reference to identical 64 of potential storage space.When same Empty/Full agreement is implemented in each visit, the reference of this storage can have access type less than 64 (for example, LW, LH, LB).

Empty and Full position are inequality, and therefore the multinomial order data buffer that does not intercouple as FIFO, can be mapped to the ITC storage area.

Can by to duplicate from storage common that { bypass_location, the storage of ITC is preserved and recovered to the mode that ef_state} is right.Strictly speaking, when 64 bypass_location must be retained, have only the least significant bit (LSB) of ef_state to be controlled.In multinomial destination data impact damper, each position must be read up to the Empty position, thereby reads the content of this impact damper by copy.

The number of the position number of every 4K page and the ITC page of each VPE all is the parameter that VPE or processor can be set.

" physical address space " that ITC stores can be overall, crosses over all VPE and processor in the multicomputer system, and such thread just can be synchronized on the position of another different VPE from a VPE who carries out this thread.The ITC of the overall situation stores the address and can obtain from the CPUNum field of the EBase register of each VPE.10 positions of this CPUNum store 10 significance bits of address corresponding to ITC.Generally, the designed processor of the application of uniprocessor or kernel do not store to ITC for not needing to export a physical interface, and can be with its resource as a processor inside.

Many VPE processor

Kernel or processor can be realized the VPE of a plurality of shared resources, as the sharing functionality unit.Each VPE can see concrete enforcement and the privileged resource framework in MIPS32 or MIPS64 instruction of oneself.Each can see oneself register file or thread context array, and also can see CP0 system coprocessor of oneself and the TLB state of oneself.For for the software of the SMP multiprocessor with 2-CPU cache coherence, two VPE on same processor can't distinguish.

Each VPE on processor can see a different value at CP0 in the CPUNum of Ebase register field.

Resource on the processor architecture, as thread context, TLB stores and coprocessor, can be under the configuration of hardware type and the VPE binding, perhaps can in a processor of supporting necessary allocative abilities, dynamically be disposed.

Reset and the virtual processor configuration

For can backwards-compatible MIPS32 and MIPS64PRA, when resetting, configurable multithreading/many VPE processor must have gives tacit consent to thread/VPE configuration completely.Be not always the case under the general situation, but have the contextual single VPE of single thread but not necessarily so necessary for one.The MVP position of Config3 register can be obtained when resetting, with deciding whether dynamic VPE configuration is possible.If this ability is left in the basket, for example in traditional software, this processor will be operated according to each concrete setting the in the default configuration.

If this MVP position is set, then software set just can be passed through in the VPC of register Config3 (virtual processor configuration) position.This can be so that processor enters a set condition, under this set condition, the content of Config4 register can be read out, in order to determine spendable VPE context, thread context, the number of TLB project and coprocessor, and can make " presetting " field of Config register read-only under some normal condition become and can write.Some restrictions can be applied on the configuration status instruction stream, for example they can be under an embargo and use memory address high-speed cache or the TLB mapping.

In configuration status, whole numbers of configurable VPE are encoded in the PVPE field of Config4 register.Write by index in the CPUNum field of EBase register, can select each VPE each VPE.For selecteed VPE, following register field all can be set by writing.

·Config1.MMU_Size

·Config1.FP

·Config1.MX

·Config1.C2

·Config3.NThreads

·Config3.NITC_Pages

·Config3.NITC_PLocs

·Config3.MVP

·VPESchedule

It is configurable that not all above-mentioned setup parameter of mentioning all needs.For instance, even the ITC page of each VPE is configurable, the number of the ITC position of each page can be fixed, and perhaps two parameters can be fixed, and for each VPE, FPU can be allocated in advance or be hard-wired, or the like.

Coprocessor is assigned to VPE as the different units that separates.Coprocessor can should be expressed and control by the degree of multithreading via coprocessor specific control and status register.

By removing the VPI disable bit of EBase register, enable a VPE, the execution after being used to dispose.

Can withdraw from this configuration status by sending an ECONF instruction.This instruction makes all not forbidden VPE obtain one and resets unusually, and beginning is carried out simultaneously.If the MVP position of Config3 register is eliminated during disposing, and remained zero by ECONF instruction, then this VPC position just can not be set again, and this processor disposes and will effectively be freezed, up to next processor reset.If MVP still is set, then sets this VPC position once more and can make an operating system enter configuration mode once more., if an operating VPE of this processor reenters configuration mode, may have uncertain result.

Service quality scheduling for multiline procedure processor

Up to the present this instructions has been described a kind of concrete application extension of MIPS compatible system, is used to realize multithreading.As previously described, described MIPS realizes just being used to enumerate description, and is not limited to the scope that the present invention comprises.Can be applied to system outside the MIPS as described function before and mechanism.

The problem of the special service in the multithreading of real-time and intimate real-time thread is suggested at the background paragraph, and this problem is relating in the explanation relevant for ThreadSchedule register (Figure 23) and VPESchedule register (Figure 24) before simply.This instructions will handle this problem in more detail with the lower part; Also be illustrated more clearly in the specific expansion that is used for concrete processing threads level service quality (QoS).

Background

The network design that generally is used for transmitting multimedia data all can involve the notion of service quality (QoS), and being used for describing needs to use Different Strategies to handle data stream different in network.With the example that is transmitted as of voice, relative less demanding for bandwidth, but but be impatient at the delay of a few tens of milliseconds.In the multi-media network in broadband, the QoS agreement can guarantee that in the time be in the transmission of key element, can obtain any special processing and right of priority, and this is must guaranteeing of in good time transmitting in time.

One of them subject matter that the combined program of influence RISC and DSP on one chip is carried out is, in the environment of the multitask of a combination, the strict executed in real time that remove to guarantee the DSP program code is unusual difficulty.Thereby this DSP uses and can be regarded as, and needs a QoS condition in the processor bandwidth.

Multithreading and QoS

There are many kinds of modes to dispatch to sending from the instruction of multithreading.The scheduler of alternating expression can be at each cyclomorphosis thread, and the scheduler of piece alternating expression can change thread when a cache miss or other serious pause take place.More than the multithreading ASE of Xiang Ximiaoshuing provides a framework to multiline procedure processor, is used to avoid mechanism or tactful any dependence for a particular thread scheduling.Yet, scheduling strategy may for Qos be the execution of various threads give security in have great influence.

It is more useful that RISC with DSP extended function can become under Qos can guarantee situation that real-time DSP program code can be performed.On this processor, realize multithreading, make and independently carry out the DSP program code on the thread at one, even also might be independently to carry out the DSP program code on the virtual processor at one, make the hardware scheduling of DSP thread to be determined to come guaranteed qos able to programmely, thus natural elimination have a major obstacle that DSP adds the RISC of powerful.

QoS thread scheduling algorithm

The scheduling of service quality thread the group scheduling mechanism of being defined as that can be loose and tactful, it allows programmer or system designer can make that be sure of and predictable statement for the execution time of one section specific program code.In general, the form of these statements is " this section program code will be carried out no more than Nmax and be no less than Nmin cycle ".Under many situations, have only the Nmax numeral in the execution of reality, to be considered, but in some applications, the operation of program code is ahead of scheduling and also can throw into question, so Nmin also should be considered.If the gap of Nmax numeral and Nmin numeral can be littler, the behavior of total system also more can be predicted accurately.

Simple priority scheme

A kind of simple model is suggested, and is used for providing when multithreading sends scheduling QoS to a certain degree, and it specifies real-time thread for simply highest priority being distributed to one, therefore when this thread can be carried out, always selects it to send instruction.As if this mode can provide the minimum value of a Nmin, and the possible minimum value of the Nmax of this given thread also can be provided, but still has some not so good consequences.

At first, in this scheme, have only a thread that the assurance of QoS can be arranged.This algorithm has implied in a thread that is different from this appointment real-time thread, and the Nmax of random procedure code can become unfettered in fact.Secondly, when the Nmin number of one section program code in this particular thread is minimized, then just must be comprised in the middle of the model unusually.Should be unusual if this given thread produces, then the value of this Nmax will become more complicated, and be impossible determine under certain situation.If this unusually by beyond this given thread thread produced, then the Nmax of the program code in this given thread will be subjected to strict constraint, unfettered but the interrupt response time of this processor becomes.

Perhaps, this simple priority scheme is useful in some cases, and actual advantage is also arranged in the realization of hardware, but they still do not provide the solution of a general QoS scheduling.

Based on the scheme that keeps

Another with better function and unique thread scheduling model is based on to keep and sends the period.In this scheme, hardware scheduling mechanism allows one or more threads to be assigned with and obtains M N of sending continuously in the period.In an environment that does not have to interrupt, for a real-time code segment, the low Nmin value that provides to priority scheme is not provided this scheme, has but had other advantage.

Can be guaranteed QoS more than one thread.

Even when interrupt to be with the thread with highest priority outside other threads bindings, this interruption delay also can be affined.Can make the real-time program code segments that lower Nmax is arranged like this.

A kind of simple form based on the scheduling of reservation scheme is to send the period with every N and distribute to a real-time thread.The intermediate value of N not between 1 and 2, the real-time thread of this explanation in a multi-thread environment can obtain sending the period of maximum 50% processor.When a real-time task need be used bandwidth more than 50% flush bonding processor, very need a kind of scheme, it allows to distribute more neatly to send bandwidth.

Mixing thread scheduling with QoS

Above-mentioned multi-threaded system is to stress neutral scheduling strategy, but can also be expanded, to allow to form a kind of thread scheduling model of mixing.In this model, real-time thread can be given the fixed schedule that a certain proportion of thread sends the period, and uses the acquiescence scheduling scheme relevant with implementation to distribute the remaining period.

The binding thread is to sending the period

Instruction in the processor is sent apace in proper order.In the middle of the environment of a multithreading, in the middle of the thread of majority, a thread can calculate employed bandwidth by the ratio of period number shared in a given period number.Opposite, the present invention recognizes, can state a period that ascertains the number arbitrarily, and limits this processor keeps the some in this fixed number for certain specific thread period.Thereby a fixed part in can nominated bandwidth assigns to guarantee a real-time thread.

Very clearly, the period can be distributed to pari passu more than a real-time thread, and this scheme granularity of operating is subjected to sending the constraint of this fixed number of period, described ratio is that the basis obtains with this fixed number.For instance, if select 32 periods, then any one specific thread can be guaranteed to have 1/32 to 32/32 of bandwidth.

Perhaps be used for that to give the general model of tool of thread be that { this distributes to the molecule that sends the period ratio and the denominator of this thread to integer representation, for example is 1/2 or 4/5 for N, D} association with each thread and a pair of integer with fixedly sending allocated bandwidth.If the integer range that is allowed is enough big, can allow almost any fine-grained adjustment like this, but the words of so doing still have some substantial shortcomings for the thread priority allocation.One of them problem is to use a hardware logic with very big pairing set { N0, a D0}, N1, D1} ... { Nn, Dn}} converts one to, and to send scheduling be not a simple thing, and be assigned with this error situation more than period of 100% and can't very easily be detected.Another problem is exactly, this scheme allowed on the quite a long time, a thread is assigned with sending the period of N/D ratio, but it might not allow statement arbitrarily which sends the thread that the period is assigned to a shorter subroutine code snippet about.

Therefore, in a preferred embodiment of the present invention, do not use integer right, but be associated with the thread that each needs real-time bandwidth QoS with a bit vector, this bit vector indicates to be assigned to the scheduling slot of this thread.In this preferred embodiment, the content of the just aforementioned ThreadSchedule register of this vector (Figure 23) can be seen by system software.Though this ThreadSchedule register has the scheduling " mask " of 32 bit wides, can have longer or shorter bit width in the middle of this shielding in other embodiments.Thread scheduling mask with 32 bit widths can allow a thread to be assigned with the bandwidth of sending of this processor of from 1/32 to 32/32, and also can further give the specific bandwidth mode of sending for the specific thread that sends.For one 32 mask, value 0xaaaaaaaa distributes to this thread with per second period.Value 0x0000ffff also can give this thread with 50% the allocated bandwidth of sending, but distributes with the block mode of 16 continuous times.To be worth 0xeeeeeeee distributes to thread X and will be worth 0x01010101 and distribute to thread Y, thereby give thread X three in per four cycles (in 32 24) and give thread Y in per eight cycles (in 32 4), and each that will be left is organized 4 in 32 cycles, may be distributed to other thread by the lower hardware algorithm of determinacy by other.Further, thread X will have three in per four cycles, and this thread Y has between two groups of continual commands and is no more than eight cycles.

Scheduling conflict in this embodiment can be detected very simply, because do not have in the ThreadSchedule register that will be set at more than a thread.That is to say that if be that a thread has been set a specific position, for being assigned with the every other thread that sends mask, this position must be a null value so.Therefore, if any conflict is arranged, can be detected easily.

Real-time thread send logic relative simple directly: each sends the index that chance all is associated to one 32 modulus, and this index can be transmitted to all ready threads, and the meeting at the most in these ready threads is assigned with sending the period of this association.If obtain this period, then this thread that is associated will send its next instruction.If have this period without any thread, then this processor can be selected an executable non real-time thread.

Be less than 32 if the hardware of ThreadSchedule register is realized using, can reduce the storage of each thread and the size of logic, but can reduce the dirigibility of scheduling simultaneously.In principle, this register can extend to 64, or even be implemented (under the situation of MIPS processor) for a series of register, increased the selective value in MIPS32 CP0 register space, thereby longer scheduling vector be provided.

Make thread exempt break in service

As previously mentioned, break in service can have very big changeability so that carry out the thread of this unusual program on the execution time.Therefore, expectation makes the thread that needs strict QoS to guarantee can exempt break in service.A preferred embodiment has been proposed here, utilize a single position for each thread, this can be seen for operating system, be used for making any asynchronous exception to postpone, thread up to a non-release is scheduled (promptly, the IXMT position of ThreadStatus register please refer to Figure 18 and Figure 19).Can increase the delay of interruption like this, but the selection of the value by the ThreadSchedule register, this interruption delay can be limited in be tied and controlled degree under.If interrupt handling routine is just carried out and is not assigned to sending in the period of excusable real-time QoS thread at those, naturally this break in service for the execution time of this real-time program code just without any preferential influence.

The period of sending of thread and Virtual Processing Unit distributes

More than the multithreading ASE of Xiang Ximiaoshuing has described a kind of stratified distribution of thread resources, and the VPE of some of them number (Virtual Processing Unit) has the thread of some numbers separately.The hardware that each VPE has a CP0 realizes and privileged resource framework (on being configured in a MIPS processor time), can not directly know and control by other VPE is desired and send the period so operate in operating system software (OS) on one of them VPE.Therefore the period name space that sends of each VPE is associated with this VPE, and this has just formed a hierarchical structure of sending the period distribution.

Figure 34 is the block diagram of dispatch circuit 3400, and its level of having described this thread resources distributes.Processor scheduler 3402 (that is whole scheduling logics of primary processor) is transmitted whole VPESchedule registers that send among whole VPE of period number to this primary processor via " period selection " signal 3403.Signal 3403 is corresponding to a bit position (in this preferred embodiment, being in 32 positions) in the VPESchedule register.Move to the position of an increase when occurring by making this bit position send the period at each, and when having arrived the highest significant position position (, be the 31st in this preferred embodiment) be reset to least significant bit (LSB) position (that is, the 0th) again, scheduler 3402 is cycle signal 3403 repeatedly.

Referring again to Figure 34, is example with this figure, and position 1, position (that is, the period 1) is passed to whole VPESchedule registers of this primary processor, i.e. register 3414 and 3416 via signal 3403.In any VPESchedule register, if its corresponding position is " setting " (that is, this position is a logical one), this register just comes the notification processor scheduler with " VPE sends a requirement " signal.In response, scheduler is just used " VPE sends a permission " signal and is allowed present the sending the period of this VPE use.Refer again to Figure 34, the position 1, position of (among the VPE0) VPESchedule register 3414 is set, therefore sent a VPE and sent and require signal 3415 to processor scheduler 3402, this processor scheduler 3402 also can respond a VPE and sends and allow signal 3405 then.

When a VPE is awarded one when sending, he adopts similar logic on the VPE level.Refer again to Figure 34, VPE scheduler 3412 (being the scheduling logic of VPE0 3406) sends the period number to the whole ThreadSchedule registers in this VPE in response to signal 3405 and select signal 3413 to transmit one via the period.Each all is associated to the thread of being supported by this relevant VPE these ThreadSchedule registers.Signal 3413 is corresponding to the position, a position in the ThreadSchedule register (in the present embodiment, can be in 32 positions one).Move to the position of an increase when occurring by making a position send the period at each, and when having arrived the highest significant position position (, be the 31st in this preferred embodiment) be reset to least significant bit (LSB) position (that is, the 0th) again, scheduler 3412 is cycle signal 3403 repeatedly.This period number is independent of at the employed period number of VPESchedule level.

Please refer to Figure 34 and be example with it, position 0, position (that is, " period 0 ") is passed to whole ThreadSchedule registers in this target VPE, just register 3418 and 3420 via signal 3413.The position of this select location of the ThreadSchedule register of any thread is set, this thread notice VPE scheduler, thus be allowed to use present sending the period.With reference to Figure 34, the position 0, position of (thread 0) ThreadSchedule register 3418 is set, therefore thread is sent and require signal 3419 to be sent to this VPE scheduler 3412, and this VPE scheduler has also responded a thread and sends and allow signal 3417 (thereby allowing thread 0 can use present sending the period).In the middle of some cycles, if there be not to set the position corresponding in the VPESchedule register with the period of appointment, or do not have to set the position corresponding in the ThreadSchedule register with the period of appointment, then this processor or VPE scheduler will distribute the next one to send the period according to certain other acquiescence dispatching algorithm.

According to described before, in the middle of a preferred embodiment, each VPE, the for example VPE0 of Figure 34 (3406) and VPE1 (3404), all has a VPESchedule register (its form is shown in Figure 24), being used for allowing with the length of this content of registers is the specific period of mould, can distribute to this VPE with being determined.The VPESchedule register of Figure 34 is the register 3414 of VPE0 and the register 3416 of VPE1.Those are not assigned to sending the period of any VPE, and the allocation strategy by the specific implementation mode distributes.

Basis description above in addition, the period that is assigned to the thread within a VPE is to distribute from the period that gives this VPE.Lift an object lesson, if a processor has two VPE, as shown in figure 34, the VPESchedule register of one of them VPE has the 0xaaaaaaaa value, and the VPESchedule register of another VPE has the 0x55555555 value, then sends the period and will be given this two VPE by alternate allocation.If the ThreadSchedule register of a thread among one of these two VPE comprises the 0x55555555 value, then this thread can be obtained per two of sending in the period of the VPE that comprises this thread, or says per four of sending in the period of entire process device.

Therefore, the value of the VPESchedule register that this each VPE is relevant has determined each VPE which can obtain and has handled the period.Particular thread is assigned to each VPE, for example is the thread 0 and thread 1 shown in the VPE0.Other thread that does not have to show also is assigned to VPE1 similarly.Each thread all has the ThreadSchedule register of an association, the register 3418 of thread 0 for example, the register 3420 of thread 1.The value of ThreadSchedule register has determined the distribution of processing period of each thread among the VPE.

Scheduler 3402 and 3412 can realize that to carry out above-mentioned function, according to disclosure of the present invention, these schedulers of construction do not need complicated experiment just can be realized by those skilled in the art with simple combinational logic.For example, the formation of scheduler also can be used traditional method, as by combinational logic, but programmed logic, software or the like is in order to obtain described function.

Figure 33 has described the computer system 3300 of a common version, can implement on this computer system according to various embodiments of the present invention.This system has comprised the processor 3302 (persons skilled in the art should be very clear to this) with necessary decoding and actuating logic, in order to support one or more above-mentioned instructions (that is, FORK, YIELD, MFTR, MTTR, EMT, DMT and ECONF).In the middle of a preferred embodiment, kernel 3302 also comprises dispatch circuit 3400 as shown in figure 34, and represents above-mentioned " primary processor ".System 3300 also comprises: system interface controller 3304, can with this processor two-way communication; RAM3316 and ROM3314 can be carried out access by system interface controller; Three I/O devices 3306,3308 and 3310 are communicated by letter with system interface controller by bus 3312.To the detailed description of device and program code application, system 3300 can be used as a multi-threaded system and operates by here.Those skilled in the art should be very clear, and the general type shown in Figure 33 can have a lot of alternative forms.For instance, bus 3312 can have many forms to realize, and in the middle of some embodiment can be bus on a kind of chip.Same, the number of I/O device comes down to do change arbitrarily also just for convenience in different systems.In addition, though have only device 3306 to send an interrupt request in the drawings, clearly other device also can send interrupt request.

Further improve

Up to the present, among the described embodiment, 32 ThreadSchedule register and VPESchedule register do not allow accurately to distribute the bandwidth of sending of odd number ratio.Give the thread of an appointment if programmer's expectation accurately distributes all to send 1/3rd of the period, he can only be similar to 10/32 or 11/32.In one embodiment, but one have the mask of program or the register of length, allow the programmer to go to specify the subclass of the position in ThreadSchedule register and/or the VPESchedule register, before restarting this subsequence, be issued logic and use.In the middle of the example that is proposed, this programmer has set that to have only 30 positions be effectively, and VPESchedule register and/or ThreadSchedule register suitably be programmed for have value 0x24924924.

Multithreading ASE described herein certainly is implemented in the hardware, for example, and at CPU (central processing unit) (CPU), microprocessor, digital signal processor, processor cores, in system combination chip (SOC) or other any programming device, or be connected with above-mentioned each device.In addition, this multithreading ASE (for example also can be implemented among the software, computer readable program code, program code, any type of instruction and/or data are as source language, target language or machine language), this software setting is in the medium of computing machine spendable (for example, readable), and this medium is used for storing this software.This software has been realized the function of device described herein and process, makes, and modeling, emulation is described and/or test.For instance, these can be realized by using following instrument: general programming language is (such as the C language, C Plus Plus), the GDSII database, hardware description language (HDL), it comprises Verilog HDL, VHDL, AHDL (Altera HDL) or the like, perhaps other available program, database and/or circuit (being schematic diagram) design tool.These softwares can be placed in the spendable medium of any known computing machine, comprise semiconductor, disk, CD (for example CD-ROM, DVD-ROM or the like), or (for example can use as any computing machine, can read) transmission medium (for example, carrier wave comprises any other medium of numeral, optics, or based on the medium of simulation) in the computer data signal that held.Therefore, this software can transmit in the communication network that comprises internet and Intranet.

Multithreading ASE by software implementation can be contained in the semi-conductive intellecture property kernel, as a processor cores (for example, realizing with HDL), and can be transformed into hardware in the production run of integrated circuit.In addition, a multithreading ASE described herein also can be used as the combination realization of hardware and software.

To those skilled in the art, clearly can under the situation that does not exceed disclosed spirit and scope, disclosed embodiment be retouched and be revised.For instance, described before embodiment is to use the MIPS processor mostly, and framework and technology are as object lesson.The present invention has various embodiment, can be used to wider scope, and be not limited to these object lessons.Further, those skilled in the art can find method to the functional change of making slightly described in the invention, and this remains in spirit of the present invention and scope.When describing QoS, the content of ThreadSchedule register and VPESchedule register is not limited to described length, and can modify within the spirit and scope of the present invention.

Therefore, can only limit scope of the present invention according to the scope of claims in fact.

Claims

1. can support and carry out in the processor of multiprogram thread at one, a kind of mechanism of processing comprises:

A parameter is used to dispatch a program threads; And

An instruction places this program threads, and can this parameter of access;

Wherein, when this parameter equaled first numerical value, this instruction rescheduled this program threads according to one or more conditions that are encoded in this parameter.

2. mechanism as claimed in claim 1, wherein this parameter is kept in the data storage device.

3. mechanism as claimed in claim 1 wherein equals second value when this parameter, and this second value is when being not equal to this first numerical value, and this instruction discharges this program threads.

4. mechanism as claimed in claim 3, wherein this second value is zero.

5. mechanism as claimed in claim 1 wherein equals second value when this parameter, and this second value is when being not equal to this first numerical value, and this instruction unconditionally reschedules this program threads.

6. mechanism as claimed in claim 5, wherein this second value is an odd number.

7. mechanism as claimed in claim 5, wherein this second value is negative 1.

8. mechanism as claimed in claim 1, wherein to convey other thread this program threads till this condition is satisfied relevant with carrying out chance for condition in these one or more conditions.

9. mechanism as claimed in claim 8, wherein this condition is encoded in one of bit vector in this parameter or bit field.

10. mechanism as claimed in claim 5, wherein, under the situation that this program threads is rescheduled, the execution meeting of this program threads should continue instruction position afterwards in this thread.

11. mechanism as claimed in claim 3 wherein equals third value when this parameter, and this third value is when being not equal to this first numerical value and this second value, this instruction unconditionally reschedules this program threads.

12. mechanism as claimed in claim 1, wherein a condition in these one or more conditions is a hardware interrupts.

13. mechanism as claimed in claim 1, wherein a condition in these one or more conditions is a software interruption.

14. mechanism as claimed in claim 1, wherein, under the situation that this program threads is rescheduled, the execution meeting of this program threads should continue instruction position afterwards in this thread.

15. can support and carry out in the processor of multiprogram thread at one, a kind ofly reschedule the method for carrying out or discharging this thread itself by a thread, comprising:

(a) send an instruction, the part of records in data memory storage of this instruction accessing, this part record coding with the relevant one or more parameters of one or more conditions that whether rescheduled of this thread of decision; And

(b) reschedule this thread or discharge this thread according to this condition according to these the one or more parameters in this partial record.

16. method as claimed in claim 15, wherein this record is placed in the general-purpose register (GPR).

17. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

18. method as claimed in claim 17, this wherein relevant with this d/d thread parameter is a null value.

19. method as claimed in claim 15, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

20. method as claimed in claim 19, wherein this parameter is any odd number value.

21. method as claimed in claim 19, wherein this parameter is negative 1 two's complement value.

22. method as claimed in claim 15, wherein to convey other thread this thread till specified conditions are satisfied relevant with carrying out chance for parameter in these parameters.

23. method as claimed in claim 22, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

24. method as claimed in claim 15, wherein, send under this instruction and the quilt situation that reschedules at this thread, when these one or more conditions were satisfied, continued the position after the execution meeting of this thread this instruction that this thread sent in this thread instruction stream.

25. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

26. method as claimed in claim 15, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

27. method as claimed in claim 15, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

28. method as claimed in claim 15, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

29. support and the digital processing unit of carrying out a plurality of software entitys comprise:

Part of records in data storage device, this partial record one or more parameters relevant of having encoded with one or more conditions, these one or more conditional decisions when a thread will be carried out chance and convey other thread this thread whether rescheduled.

30. digital processing unit as claimed in claim 29, wherein this partial record is placed in the general-purpose register (GPR).

31. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

32. digital processing unit as claimed in claim 31, this wherein relevant with this d/d thread parameter is a null value.

33. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

34. digital processing unit as claimed in claim 33, wherein the value of this parameter is any odd number value.

35. digital processing unit as claimed in claim 33, wherein the value of this parameter is negative 1 two's complement value.

36. digital processing unit as claimed in claim 29, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

37. digital processing unit as claimed in claim 36, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

38. digital processing unit as claimed in claim 29, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters and the quilt thread that wait dispatches that requeues is relevant.

39. digital processing unit as claimed in claim 29, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

40. digital processing unit as claimed in claim 29, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

41. digital processing unit as claimed in claim 29, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

42. the disposal system that can support and carry out a plurality of program threads comprises:

A digital processing unit;

Part of records in data storage device, this partial record one or more parameters relevant of having encoded with one or more conditions, these one or more conditional decisions a thread whether rescheduled; And

An instruction set comprises an instruction that is used to reschedule and discharge this thread;

Wherein when this thread sends this instruction, this instruction accessing this one or more parameters in should record, and this system reschedules according to these one or more conditions according to these the one or more parameters in this partial record or discharges the thread that this sends instruction.

43. disposal system as claimed in claim 42, wherein this partial record is placed in the general-purpose register (GPR).

44. disposal system as claimed in claim 41, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

45. disposal system as claimed in claim 44, this wherein relevant with this d/d thread parameter is a null value.

46. disposal system as claimed in claim 44, wherein a parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

47. disposal system as claimed in claim 46, wherein the value of this parameter is any odd number value.

48. disposal system as claimed in claim 46, wherein the value of this parameter is negative 1 two's complement value.

49. disposal system as claimed in claim 41, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

50. disposal system as claimed in claim 49, wherein this parameter is encoded in one of bit vector in this record or one or more value fields.

51. disposal system as claimed in claim 44, wherein, a thread send this instruction and the situation that rescheduled conditionally under, when these one or more conditions were satisfied, continued the position of the execution meeting of this thread after should instruction in this thread instruction stream.

52. disposal system as claimed in claim 42, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

53. disposal system as claimed in claim 42, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

54. disposal system as claimed in claim 42, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

55. disposal system as claimed in claim 42, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

56. digital storage medium, be written into instruction on it from an instruction set, be used on a digital processing unit, carrying out each software thread of a plurality of software threads, this instruction set has comprised an instruction, be used to make the thread that sends instruction to abandon carrying out, an and parameter in the record of the some in data storage device of access, wherein, the condition and this parameter correlation that discharge or reschedule, and discharge according to this condition according to this parameter in this partial record or reschedule.

57. digital storage medium as claimed in claim 56, wherein this record is placed in the general-purpose register (GPR).

58. digital storage medium as claimed in claim 57, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled.

59. digital storage medium as claimed in claim 58 should the parameter relevant with d/d thread be a null value wherein.

60. digital storage medium as claimed in claim 56, wherein a parameter in the middle of these parameters is relevant with the thread of the wait scheduling of being requeued.

61. digital storage medium as claimed in claim 60, wherein the value of this parameter is any odd number value.

62. digital storage medium as claimed in claim 60, wherein the value of this parameter is negative 1 two's complement value.

63. digital storage medium as claimed in claim 16, wherein to convey other thread relevant up to the thread that specified conditions are satisfied with carrying out chance for parameter in these parameters.

64. as the described digital storage medium of claim 63, wherein this parameter is encoded in one of bit vector in this record or one or more bit fields.

65. digital storage medium as claimed in claim 56, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters is relevant with the thread of the wait scheduling of being requeued.

66. digital storage medium as claimed in claim 56, wherein parameter in these parameters is relevant with thread d/d rather than that rescheduled, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

67. digital storage medium as claimed in claim 56, wherein parameter in these parameters waits for that the thread that reschedules is relevant with being requeued, and another parameter in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

68. digital storage medium as claimed in claim 56, wherein a parameter in these parameters is relevant with thread d/d rather than that rescheduled, another parameter in these parameters is relevant with the thread of being waited for scheduling by requeuing, and another parameter again in these parameters with will carry out chance and convey other thread and be satisfied relevant up to specified conditions.

69. mechanism as claimed in claim 1, wherein this instruction is a YIELD instruction.

70. mechanism as claimed in claim 1, wherein this partial record comprises bit vector.

71. mechanism as claimed in claim 1, wherein this partial record comprises one or more multiple bit fields.

72. method as claimed in claim 15, wherein this instruction is a YIELD instruction.

73. disposal system as claimed in claim 42, wherein this instruction is a YIELD instruction.

74. digital storage medium as claimed in claim 56, wherein this instruction is a YIELD instruction.

75. a computer data signal that is included in the transmission medium comprises:

Computer-readable program code, it is used to describe the processor that can support and carry out a plurality of program threads, and it comprises a kind ofly in order to discharge and to reschedule the mechanism of a thread, and this program code comprises:

First program code segments is used for describing the part of records of a data memory storage, this partial record one or more parameters relevant with the one or more conditions that determine a thread whether to be rescheduled of having encoded; And

Second program code segments, be used for describing an instruction of these one or more parameters that can this record of access, wherein when this thread sends this instruction, the one or more values of this instruction accessing in should record, and come to reschedule or discharge this thread according to these one or more conditions according to these one or more values.

76. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, this instruction can an access parameter relevant with thread scheduling, and wherein this instruction is included in the program threads; And

When this parameter equals first numerical value, discharge this program threads in response to this instruction.

77. as the described method of claim 76, wherein this first numerical value is zero.

78., also comprise as the described method of claim 76:

When this parameter equals second value, hang up the execution of this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.

79. as the described method of claim 78, wherein this second value is represented, carries out the required condition that possesses of this program threads and does not satisfy.

80. as the described method of claim 79, wherein this condition is encoded in this parameter as bit vector or value field.

81., also comprise as the described method of claim 78:

When this parameter equals third value, reschedule this program threads in response to this instruction, wherein this third value is unequal in this first numerical value and second value.

82. as the described method of claim 81, wherein this third value is negative 1.

83. as the described method of claim 81, wherein this third value is an odd number value.

84. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, parameter relevant of this instruction accessing with thread scheduling, wherein this instruction is contained in the program threads;

When this parameter equals first numerical value, hang up the execution of this program threads in response to this instruction.

85., also comprise as the described method of claim 84:

When this parameter equals second value, reschedule this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.

86. in the processor that can support a plurality of program threads, a kind of method comprises:

Carry out an instruction, the parameter that this instruction accessing is relevant with thread scheduling, wherein this instruction is contained in the middle of the program threads; And

When this parameter equals first numerical value, reschedule this program threads in response to this instruction.

87., also comprise as the described method of claim 86:

When this parameter equals second value, discharge this program threads in response to this instruction, wherein this second value is not equal to this first numerical value.