WO1992003784A1 - Scheduling method for a multiprocessing operating system - Google Patents

Scheduling method for a multiprocessing operating system Download PDF

Info

Publication number
WO1992003784A1
WO1992003784A1 PCT/US1991/004068 US9104068W WO9203784A1 WO 1992003784 A1 WO1992003784 A1 WO 1992003784A1 US 9104068 W US9104068 W US 9104068W WO 9203784 A1 WO9203784 A1 WO 9203784A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
priority
dispatcher
scheduling
run queue
Prior art date
Application number
PCT/US1991/004068
Other languages
French (fr)
Inventor
Gregory G. Gaertner
George A. Spix
Diane M. Wengelski
Keith J. Thompson
Original Assignee
Supercomputer Systems Limited Partnership
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Supercomputer Systems Limited Partnership filed Critical Supercomputer Systems Limited Partnership
Publication of WO1992003784A1 publication Critical patent/WO1992003784A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Definitions

  • the present invention relates generally to multiprocessor computer systems and specifically to allocating processors in a tightly-coupled configuration to execute the threads of one or more multithreaded programs that are running on the system simultaneously.
  • Processes are entities that are scheduled by the operating system to run on processors.
  • different threads may execute simultaneously on different processors. If the processes executing the different threads of a program are scheduled to execute simultaneously on different processors, then multiprocessing of the multithreaded program is achieved. In addition, if multiple system processes are scheduled to run simultaneously in multiple processors, the operating system has achieved multiprocessing.
  • One problem with existing implementations of multithreaded systems is that a bottleneck occurs when multiple threads must wait at a final, central point to be dispatched to a processor.
  • This scheme may use a lock manager to schedule processes. The result requires a process to wait in line for a processor. Inefficient scheduling may occur if a lower priority process is in the queue ahead of a higher priority process.
  • the result of the lock manager is to reduce a multithreaded process into a single thread of execution at the point where processes are dispatched.
  • SHEET processes In a classical Unix implementation, only one kind of process entity can be created or processed. A process consists of system side context and user side context. Because a classical Unix implementation has only one creating entity, the fork, the system processes contain more context information than is actually required. The user context (i.e., user block and various process table fields) is ignored by the system and not used. However, this context has overhead associated with memory and switching which is not ignored and thus consumes unnecessary resources.
  • Another problem with existing implementations is that when an interrupt occurs, the processor which receives the interrupt stops processing to handle the interrupt, regard ⁇ less of what the processor was doing. This can result in delaying a high priority task by making the processor service a lower priority interrupt.
  • This invention relates to the scheduling of multiple computing resources between multiple categories of contenders and the efficient use of these computing
  • Unix is a trademark of AT&T Bell Laboratories. resources. In particular, it relates to the scheduling of multiple, closely-coupled computing processors.
  • the scheduling method improves the efficiency in scheduling multiple processors by providing a focal point for scheduling. That is, there are not independent schedules for each category of processes to be scheduled. This ensures that the highest priority process will be scheduled regardless of its type. For example, if schedulers were independent, a processor running the interrupt process scheduler would choose the highest priority process from the interrupt process running, even though there may be higher priority processes waiting on a different scheduler's run queue.
  • the integrated dispatcher provides a mechanism which allows processors to be self scheduling. The integrated dispatching process runs in a processor to choose which process will run next in this processor. That is, there is not a supreme, supervisory dispatcher which allocates processes to specific processors.
  • the entities scheduled by the integrated dispatcher need not be homogeneous. That is, the integrated dispatcher chooses what entity will run next in this processor based on the common criterion of priority, regardless of type.
  • the types of entities which can be scheduled are the iprocs, mprocs, and procs as described hereinafter.
  • the integrated dispatcher provides the method to efficiently multithread scheduling. That is, mechanisms have been added which will allow all processors to be simultaneously running the integrated dispatcher with limited chance of conflict. If each processor has selected a process of a different priority, each passes through the scheduler unaware that other processors are also dispatching. If two processes have the same priority, they are processed in a pipeline fashion as described hereinafter.
  • the mechanism referred to above is preferably the software synchronization mechanisms employed to allow simultaneous execution of specific pieces of code without endangering the integrity of the system.
  • the present invention is best suited for scheduling tightly coupled processors and works best if the hardware provides convenient locking mechanisms on memory access as described in the copending patent application, DISTRIBUTED ARCHITEC ⁇ TURE FOR INPUT/OUTPUT FOR A MULTIPROCESSOR SYSTEM, Serial No. 07/536,182.
  • this scheduling method is not restricted to this situation and can be used all or in part in a facility requiring efficient processor scheduling.
  • Figure 1 is a schematic diagram showing the relationship of the integrated dispatcher to other parts of the operating system kernel.
  • SUBSTITUTE SHEET Figure 2 is the integrated dispatcher, run queue, wake queue, and process entity scheme.
  • Figure 3 is a diagram of the run queue with pipelined priority.
  • Figure 4 is a comparative diagram between a full process and several microprocesses.
  • the process control subsystem 1110 of the preferred embodi ⁇ ment is shown within the context of the Unix kernel in Figure 1.
  • the integrated dispatcher 1112 integrates the scheduling of microprocesses, interrupts, processes, and standard Unix processes.
  • the integrated dispatcher 1112 runs on a system of tightly coupled multiprocessors with hardware support for locking memory on a per word basis.
  • This hardware mechanism is described in copending application METHOD AND APPARATUS FOR A LOAD AND FLAG INSTRUCTION.
  • the integrated dispatcher is multithreaded. This means that all processors can be executing the integrated dispatcher simultaneously.
  • the integrated dispatcher is multithreaded to allow proces ⁇ sors to simultaneously execute the dispatching code with minimal blocking. That is, all other processors should not block until one processor completes dispatching. This is a bottleneck and extremely inefficient. Rather, the present invention uses software synchronization methods to protect common data structures while allowing all processors to continue. Until processors contest for a specific common data structure, they will continue unimpeded. If processor does attempt to access a protected structure, it will block until the processor that locked it is finished
  • SHEET with it.
  • the data structures are set up to minimize these instances of blockage. For instance, if every processor simultaneously attempts to dispatch at a different priority, each will access a unique protectable structure allowing each to continue without blocking.
  • the processor runs the integrated dispatcher, which selects the highest priority process from the run queue 10 and starts the process running in that processor. Contention for run queue entries is reduced because the dispatcher locks only the entries it currently is looking at. Entries on the run queue are also locked when new processes are put on the run queue 10 by mproc create 20, by consume entry 30, by Zero-level interrupt 50, by inewproc 40, and by the usual Unix scheduling means (waking, switching, and creating routines) 60.
  • An mproc is a kernel representation of a microprocess.
  • Microprocesses are minimal context user processes designed to run very specific tasks very efficiently. That is, they are expected to be short lived. Microprocesses share a process image and user area making them very lean and efficient to create, but doing so also prevents them from switching because they do not have the unique context save areas that full processes do.
  • Each full process has a unique area and shared image process entry.
  • the shared image process table entry contains information relevant to maintaining the segments which comprise the image.
  • Microprocesses do not have unique context save areas (full process fields and user area structure) in which to store the processor state, so they are expected to run to
  • Figure 4 illustrates an organizational comparison between a full process and microprocesses.
  • An iproc is a minimal context process designed to effi ⁇ ciently execute kernel functions.
  • the kernel procedure that creates these minimal context processes is called inewproc. It is a very simplified version of newproc which creates full processes.
  • a full process has both system context and user context. Kernel functions do not need their user context so allocating a full process context process to execute these functions is inefficient. A subset of a full process is allocated which allows the kernel function to execute. It does not have a user area or shared image process entry since it will not be executing user code. These functions are then very efficiently switched in and out since they depend on very little context. What little context they do depend on is saved in the iproc table entry.
  • the application METHOD OF IMPLEMENTING KERNEL FUNCTIONS USING MINIMAL CONTEXT PROCESSES describes the inewproc method.
  • the run queue 10 resides in main memory and is equally accessible by all processors. Each processor runs the integrated dispatcher when the current process exits, when the current process waits for an event, or when the curren process allows itself to be preempted.
  • the integrated dispatcher runs the integrated dispatcher when the current process exits, when the current process waits for an event, or when the curren process allows itself to be preempted.
  • SUBSTITUTE SHEET dispatcher can then c use a new process to be scheduled into the processor.
  • Figure 3 shows the run queue data structure 10.
  • the integrated dispatcher schedules all entities through the run queue. Entities with the highest priority on the run queue 11 are dispatched first.
  • a semaphore counter on each entry 12 prevents multiple processes from accessing the same run queue entry simultaneously. If more than one entity has the same priority 13, the remaining entity remains in the run queue until its turn to be processed. The result if a first-in-first-out queue for each priority.
  • the semaphore is released after the processor pulls off the process, allowing another processor to pull an entry off that queue. This pipeline for each priority keeps processes flowing quickly to processors.
  • Counting semaphores are software mechanisms for synchroni ⁇ zation.
  • the semaphore consists of a count of the available resources to be managed and a list associated with entities waiting for that resource. To implement a lock, this count is set to one so that only one resource, the lock, exists. If he semaphore is going to govern multiple resources, it is set to the number of resources available. As a resource is taken, this count is decremented. When the semaphore count goes to zero, no more resources are available so the requester is put to sleep to wait for one to become avail ⁇ able. As a process frees a resource, it increments the semaphore counter and wakes up a waiting process.
  • th run queue In the case of th run queue, a process acquires the 'lock' on the entry associated with a specific priority. It is then free to access the processes queued at this priority. All other processes will block at the lock and are prevented from accessing the q vied processes until the first process is finished with them and frees the lock. Once the lock is acquired, the dispatching process can pop a process off of the run queue for execution. If multiple processes are queued at this priority, the rest remain on queue for an ensuing dispatching process to dequeue once it obtains the lock. Note that as long as dispatching processes are accessing th run queue at different priorities, they will not block each other. They can run concurrently and share the dispatcher.
  • the integrated dispatcher schedules all activities which require a processor. These activities include iprocs, mprocs, processes, and events from external coprocessors via the wake queue 70.
  • the integrated dispatcher provides a single focal point for scheduling multiple process types to multiple processors.
  • the integrated dispatcher provides a means for all processors to have access to all processes. As each processor accesses the run queue 10, the dispatcher schedules the highest priority item on the queue independent of its category.
  • the integrated dispatcher uses a common run queue 10 designed as a vector of first-in-first-out-queues. This scheme facilitates implementing the integrated dispatcher as a pipeline to eliminate the bottleneck for dispatching processes and is thus designed for efficient multiprocessor scheduling.
  • the integrated dispatcher is completely symmetric with no preferential processors or processes and is therefore completely multithreaded.
  • SUBSTITUTE SHEET Completely symmetric refers to the equalness of all proces ⁇ sors and schedulable entities. That is, no processors are given preference nor is any process type ([im]proc) given preference. Each processor has an equal chance of obtaining the entry lock and therefore being allowed to pop a process to schedule. Processes are queued onto a run queue entry based solely upon their priority without biasing on account of their type ([im]proc) and are ordered on the queue in a first-in, first-out order.
  • This organization also maintains symmetry by having a single focal point for dispatching rather than having multiple dispatchers per schedulable entity type ([im]proc). If there were multiple dispatchers per type, this symmetry could not be maintained since one dispatcher would have to be checked before another thereby giving that process type preference. That is, given an iproc dispatcher, an mproc dispatcher and a proc dispatcher and given entries of equal priority on each, the dispatcher that is checked first actually has an inflated priority value since its entity is always chosen to run prior to the others also at that same priority but on the queue of another dispatcher.
  • the inte ⁇ grated dispatcher handles the various system process entities (iproc, mproc, proc, and so on) that are imple ⁇ mented in the preferred embodiment using standard multi ⁇ threading methods.
  • the run queue is protected by means well known in the art of creating multithreaded code and
  • conditional switch capa ⁇ bility of the invention enables lower priority processes to handle interrupts rather than delaying high priority processes.
  • Switching means 'yielding the processor' The current state of context of the processor must be saved somewhere so that it can be restored before it is allowed to execute again. After this processor context is tucked away (usually in the process user area) the context of the process that was chosen to run next in this processor is restored. When the switched out process is chosen to run again, its processor context is restored so that the processor looks just as it did before it switched out. Note that processes do not have to resume on the processor from which it was switched. Any processor running the dispatching code and seeing this process as the highest priority process will restore this process context and allow it to run.
  • Switching causes dispatcher code execution to choose the next process which should run on this processor. There ⁇ fore, the normal flow is: save outgoing processes context; run dispatcher to choose next process to run in this processor; restore context of chosen process; allow chosen process to resume execution.
  • Conditional switch is a more efficient version of this scheme that takes into account that this process may be experiencing preemption and not voluntarily giving up the processor because it knows it is going to be waiting for a certain event to happen (sleep).
  • Conditional switch delays saving the current process's context until it is sure that it is going to be switched out. That is, if the process
  • UTE SHEET which was just switched out is the highest priority process on the run queue, it is chcjen as the 'next process' to run on this processor. This results in an unnecessary saving/restoring of context. Therefore, the flow of conditional switch is: determine highest priority process on run queue; compare this priority with the priority of current process; if current process is at an equal or higher priority, do not continue with this switch but allow current process to continue to execute; if current process is at a lower priority, continue with context switch, save current process's context, restore higher priority process's context and allow that process to resume execute.
  • a 'daemon* is a process that does not terminate. Once created, it performs a specific task and then goes to sleep waiting for that task to be needed again. These tasks are frequently needed making it much more efficient to leave the process around rather than creating it each time its function is needed.
  • An example of a daemon in UNIX is the buffer cache daemon which is created at system boot time and woken whenever the buffer cache needs to be flushed out to disk.
  • any processor can wake to handle the interrupt, or any processor may spawn a new minimal context system process to handle it. Either way, the interrupted processor running need not be delayed by handling the interrupt itself.
  • the integrated dispatcher runs it will schedule the highest priority process. Therefore, if the newly created or awakened interrupt process is the highest priority process, it is chosen to run.
  • the concept of a light ⁇ weight system process is essential to this scheme because interrupt handling cannot be delayed while a full context process is created to handle the interrupt. This allows a processor doing work at a high priority to cause a processor running at a lower priority to handle the
  • a lightweight system process typically is an iproc in contrast to a microprocess which is an mproc both of which were discussed hereinabove.
  • the current processor can cause another processor to be preempted by sending that processor a signal (see application SIGNALING MECHANISM FOR A MULTIPROCESSOR SYSTEM) .
  • the signalled processor preempts when it receives the signal.
  • the process running on the signalled processor can then decide whether or not it will handle the interrupt; if the iproc assigned to handle the interrupt is at a higher priority, the current process will be switched out and the iproc run.
  • a system variable holds the processor number that is currently executing at the lowest priority. Given this identification, any processor can signal any other processor as provided by the hardware outlined in the aforementioned copending patent application. Assuming this machine hardware exists, a processor receiving an interrupt can spawn an iproc to handle that interrupt and then signal the lowest priority processor. The interrupted processor then resumes execution of its work without servicing the interrupt. Receipt of the signal by the lowest priority processor forces it into the interrupt code and a conditional switch. Note that this 'lowest priority process* may not actually be at a lower priority than the newly spawned iproc which is servicing the interrupt.
  • the conditional switch allows this processor to continue to execute the current process rather than servicing the interrupt.
  • the interrupt is serviced when a processor drops to a priority below that of the iproc.
  • the dispatcher has a mechanism for generating minimal context system processes as described in the copending application METHOD OF IMPLEMENTING KERNEL FUNCTIONS USING
  • a 'user block' or 'user area' is a data structure used by the kernel to save information related to a process that is not needed until the process is ready to run. That is, the kernel keeps data related to a process in process table entries and in user areas. All information needed regard ⁇ less of the state of the process is in the process table. The user area can actually be swapped and must therefore not contain any information the system may need while it is not in memory. In the preferred embodiment, the user block is eliminated completely, and the information contained in the process table is minimized. The overall result is that overhead is minimized.
  • the integrated dispatcher addresses yet another limitation of prior implementations.
  • the dispatcher runs in conjunc ⁇ tion with a mechanism which allows slower coprocessors to schedule work in the multiprocessor as described below as well as in copending application METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSORS WITH UNEQUAL SPEEDS.
  • the integrated dispatcher contains the intelligence for managing the wake queue 70.
  • the wake queue is a method of communication between slow and fast processes which prevents bottlenecking the fast processes. Access to the wake queue occurs in such a way as to permit slower coprocessors to put entries on the wake queue without holding up the faster processors.
  • coprocessors of unequal speed can efficiently schedule tasks for a fast processor to run without locking the run queue from which the fast processors are scheduled. For example, when a peripheral device which is slow requests action from a processor (which is fast), the request is held in the wake queue until a processor is available to handle the request.
  • the wake queue concept provides an intermediate queue upon which the coprocessor can queue information.
  • the integrated dispatcher checks to see if any entries are queued upon the wake queues and if so, processes them. This processing varies widely dependent upon the data queued. Any information can be queued and processed in this way as long as the coprocessor knows to queue it, and the intelligence is added in the integrated dispatcher to process it. For example, if the information queued is simply the address of a process waiting for some coprocessor task to complete, the integrated dispatcher can now wake that process. The results in that process being queued on the run queue. Specific details of the wake queue are described in the copending application METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSORS WITH UNEQUAL SPEEDS.

Abstract

An integrated dispatcher (1112) schedules non homogeneous processes in a tightly-coupled microprocessor system. Self-scheduling processes (20, 30, 40, 60) may be scheduled on any available processor; the only criterion for scheduling is priority (0-99). The integrated dispatcher (1112) has a mechanism (10) to efficiently multithread scheduling.

Description

SCHEDULINC ETHOD FOR A MULTIPROCESSING OPERATING SYSTEM
RELATED APPLICATIONS
This application is a continuation-in-part of an applica¬ tion filed in the United States Patent and Trademark Office on June 11, 1990, entitled INTEGRATED SOFTWARE ARCHITECTURE FOR A HIGHLY PARALLEL MULTIPROCESSOR SYSTEM, Serial No. 07/537,466, and assigned to the assignee of the present invention, the disclosure of which is hereby incorporated by reference in the ; resent application. The application is also related to copending applications entitled, DIS¬ TRIBUTED ARCHITECTURE FOR INPUT/OUTPUT FOR A MULTIPROCESSOR SYSTEM, Serial No. 07/536,182, METHOD AND APPARATUS FOR A LOAD AND FLAG INSTRUCTION, Serial No. 07/536,217 and SIGNALING MECHANISM FOR A MULTIPROCESSOR SYSTEM, Serial No. 07/536,192. The application is also related to copending application filed concurrently herewith, entitled DUAL LEVEL SCHEDULING OF PROCESSES TO MULTIPLE PARALLEL REGIONS OF A MULTITHREADED PROGRAM ON A TIGHTLY COUPLED MULTIPRO¬ CESSOR COMPUTER SYSTEM, METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSORS OF UNEQUAL SPEEDS and METHOD OF IMPLEMENTING KERNEL FUNCTIONS USING MINIMAL CONTEXT PRO¬ CESSES all of which are assigned to the assignee of the present invention, the disclosures of which are hereby incorporated by reference in the present application.
TECHNICAL FI« LD
The present invention relates generally to multiprocessor computer systems and specifically to allocating processors in a tightly-coupled configuration to execute the threads of one or more multithreaded programs that are running on the system simultaneously. BACKGROUND ART
Processes are entities that are scheduled by the operating system to run on processors. In a multithreaded program, different threads may execute simultaneously on different processors. If the processes executing the different threads of a program are scheduled to execute simultaneously on different processors, then multiprocessing of the multithreaded program is achieved. In addition, if multiple system processes are scheduled to run simultaneously in multiple processors, the operating system has achieved multiprocessing.
Generally, in all process scheduling at least four types of contenders compete for processor access:
1) Processes waking up after waiting for an event.
2) Work needing to be done after an interrupt.
3) Multiple threads in the operating system.
4) Multiple threads in user processes.
One problem with existing implementations of multithreaded systems is that a bottleneck occurs when multiple threads must wait at a final, central point to be dispatched to a processor. This scheme may use a lock manager to schedule processes. The result requires a process to wait in line for a processor. Inefficient scheduling may occur if a lower priority process is in the queue ahead of a higher priority process. Thus, the result of the lock manager is to reduce a multithreaded process into a single thread of execution at the point where processes are dispatched.
Another problem related to existing implementations is tha efficiency is reduced because of overhead associated with
SHEET processes. In a classical Unix implementation, only one kind of process entity can be created or processed. A process consists of system side context and user side context. Because a classical Unix implementation has only one creating entity, the fork, the system processes contain more context information than is actually required. The user context (i.e., user block and various process table fields) is ignored by the system and not used. However, this context has overhead associated with memory and switching which is not ignored and thus consumes unnecessary resources.
Another problem with existing implementations is that when an interrupt occurs, the processor which receives the interrupt stops processing to handle the interrupt, regard¬ less of what the processor was doing. This can result in delaying a high priority task by making the processor service a lower priority interrupt.
Another problem can occur when an implementation has multiple schedulers in a tightly coupled multiprocessing environment. Each of the schedulers controls a type of process and as such all schedulers are in contention for access to processors. Decentralizing the run queue function has overhead penalties for the complexity in managing locally scheduled processes.
SUMMARY OF THE INVENTION
This invention relates to the scheduling of multiple computing resources between multiple categories of contenders and the efficient use of these computing
Unix is a trademark of AT&T Bell Laboratories. resources. In particular, it relates to the scheduling of multiple, closely-coupled computing processors.
The scheduling method, hereinafter referred to as an integrated dispatcher, improves the efficiency in scheduling multiple processors by providing a focal point for scheduling. That is, there are not independent schedules for each category of processes to be scheduled. This ensures that the highest priority process will be scheduled regardless of its type. For example, if schedulers were independent, a processor running the interrupt process scheduler would choose the highest priority process from the interrupt process running, even though there may be higher priority processes waiting on a different scheduler's run queue. The integrated dispatcher provides a mechanism which allows processors to be self scheduling. The integrated dispatching process runs in a processor to choose which process will run next in this processor. That is, there is not a supreme, supervisory dispatcher which allocates processes to specific processors.
The entities scheduled by the integrated dispatcher need not be homogeneous. That is, the integrated dispatcher chooses what entity will run next in this processor based on the common criterion of priority, regardless of type. In the preferred embodiment, the types of entities which can be scheduled are the iprocs, mprocs, and procs as described hereinafter.
The integrated dispatcher provides the method to efficiently multithread scheduling. That is, mechanisms have been added which will allow all processors to be simultaneously running the integrated dispatcher with limited chance of conflict. If each processor has selected a process of a different priority, each passes through the scheduler unaware that other processors are also dispatching. If two processes have the same priority, they are processed in a pipeline fashion as described hereinafter.
No new hardware is needed to support this invention. The mechanism referred to above is preferably the software synchronization mechanisms employed to allow simultaneous execution of specific pieces of code without endangering the integrity of the system.
The present invention is best suited for scheduling tightly coupled processors and works best if the hardware provides convenient locking mechanisms on memory access as described in the copending patent application, DISTRIBUTED ARCHITEC¬ TURE FOR INPUT/OUTPUT FOR A MULTIPROCESSOR SYSTEM, Serial No. 07/536,182. However, the application of this scheduling method is not restricted to this situation and can be used all or in part in a facility requiring efficient processor scheduling.
These and other objects of the present invention will become apparent with referen e to the drawings, the detailed descriptions of the preferred embodiment and the appended claims.
Those having normal skill in the art will recognize the foregoing and other objects, features, advantages and applications of the present invention from the following more detailed description of the preferred embodiments as illustrated in the accompanying drawings.
DESCRIPTION OF THE DRAWINGS
Figure 1 is a schematic diagram showing the relationship of the integrated dispatcher to other parts of the operating system kernel.
SUBSTITUTE SHEET Figure 2 is the integrated dispatcher, run queue, wake queue, and process entity scheme.
Figure 3 is a diagram of the run queue with pipelined priority.
Figure 4 is a comparative diagram between a full process and several microprocesses.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The process control subsystem 1110 of the preferred embodi¬ ment is shown within the context of the Unix kernel in Figure 1. The integrated dispatcher 1112 integrates the scheduling of microprocesses, interrupts, processes, and standard Unix processes.
In the preferred embodiment as illustrated in Figure 2, the integrated dispatcher 1112 runs on a system of tightly coupled multiprocessors with hardware support for locking memory on a per word basis. This hardware mechanism is described in copending application METHOD AND APPARATUS FOR A LOAD AND FLAG INSTRUCTION. The integrated dispatcher is multithreaded. This means that all processors can be executing the integrated dispatcher simultaneously.
The integrated dispatcher is multithreaded to allow proces¬ sors to simultaneously execute the dispatching code with minimal blocking. That is, all other processors should not block until one processor completes dispatching. This is a bottleneck and extremely inefficient. Rather, the present invention uses software synchronization methods to protect common data structures while allowing all processors to continue. Until processors contest for a specific common data structure, they will continue unimpeded. If processor does attempt to access a protected structure, it will block until the processor that locked it is finished
SHEET with it. The data structures are set up to minimize these instances of blockage. For instance, if every processor simultaneously attempts to dispatch at a different priority, each will access a unique protectable structure allowing each to continue without blocking.
If a processor has nothing to do, the processor runs the integrated dispatcher, which selects the highest priority process from the run queue 10 and starts the process running in that processor. Contention for run queue entries is reduced because the dispatcher locks only the entries it currently is looking at. Entries on the run queue are also locked when new processes are put on the run queue 10 by mproc create 20, by consume entry 30, by Zero-level interrupt 50, by inewproc 40, and by the usual Unix scheduling means (waking, switching, and creating routines) 60.
An mproc is a kernel representation of a microprocess. Microprocesses are minimal context user processes designed to run very specific tasks very efficiently. That is, they are expected to be short lived. Microprocesses share a process image and user area making them very lean and efficient to create, but doing so also prevents them from switching because they do not have the unique context save areas that full processes do.
Each full process has a unique area and shared image process entry. The shared image process table entry contains information relevant to maintaining the segments which comprise the image. When a full process switches out (yields the processor to another process), it must save the state of the processor so that the same state can be restored before this process is allowed to execute again. Microprocesses do not have unique context save areas (full process fields and user area structure) in which to store the processor state, so they are expected to run to
HEET completion rather than yield the processor on the assumption they will resume at a later time. Figure 4 illustrates an organizational comparison between a full process and microprocesses.
Refer to the copending application DUAL LEVEL SCHEDULING OF PROCESSES TO MULTIPLE PARALLEL REGIONS OF A MULTITHREADED PROGRAM ON A TIGHTLY COUPLED MULTIPROCESSOR COMPUTER SYSTEM for a description of the mproc method. The application METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSOR OF UNEQUAL SPEEDS describes the consume entry method.
An iproc is a minimal context process designed to effi¬ ciently execute kernel functions. The kernel procedure that creates these minimal context processes is called inewproc. It is a very simplified version of newproc which creates full processes.
A full process has both system context and user context. Kernel functions do not need their user context so allocating a full process context process to execute these functions is inefficient. A subset of a full process is allocated which allows the kernel function to execute. It does not have a user area or shared image process entry since it will not be executing user code. These functions are then very efficiently switched in and out since they depend on very little context. What little context they do depend on is saved in the iproc table entry. The application METHOD OF IMPLEMENTING KERNEL FUNCTIONS USING MINIMAL CONTEXT PROCESSES describes the inewproc method.
The run queue 10 resides in main memory and is equally accessible by all processors. Each processor runs the integrated dispatcher when the current process exits, when the current process waits for an event, or when the curren process allows itself to be preempted. The integrated
SUBSTITUTE SHEET dispatcher can then c use a new process to be scheduled into the processor.
Figure 3 shows the run queue data structure 10. The integrated dispatcher schedules all entities through the run queue. Entities with the highest priority on the run queue 11 are dispatched first. A semaphore counter on each entry 12 prevents multiple processes from accessing the same run queue entry simultaneously. If more than one entity has the same priority 13, the remaining entity remains in the run queue until its turn to be processed. The result if a first-in-first-out queue for each priority. The semaphore is released after the processor pulls off the process, allowing another processor to pull an entry off that queue. This pipeline for each priority keeps processes flowing quickly to processors.
Counting semaphores are software mechanisms for synchroni¬ zation. The semaphore consists of a count of the available resources to be managed and a list associated with entities waiting for that resource. To implement a lock, this count is set to one so that only one resource, the lock, exists. If he semaphore is going to govern multiple resources, it is set to the number of resources available. As a resource is taken, this count is decremented. When the semaphore count goes to zero, no more resources are available so the requester is put to sleep to wait for one to become avail¬ able. As a process frees a resource, it increments the semaphore counter and wakes up a waiting process.
In the case of th run queue, a process acquires the 'lock' on the entry associated with a specific priority. It is then free to access the processes queued at this priority. All other processes will block at the lock and are prevented from accessing the q vied processes until the first process is finished with them and frees the lock. Once the lock is acquired, the dispatching process can pop a process off of the run queue for execution. If multiple processes are queued at this priority, the rest remain on queue for an ensuing dispatching process to dequeue once it obtains the lock. Note that as long as dispatching processes are accessing th run queue at different priorities, they will not block each other. They can run concurrently and share the dispatcher.
There is only one 'priority vector'. It is the run queue itself. That is, the priority is actually the index into the run queue array. Each entry on the run queue array consists of a semaphore and the head of the list of pro¬ cesses waiting execution at this priority. The list of processes is actually a linked list of the actual [im]proc table entries. That is, the fun threads through fields in the [im]proc table entries.
In the preferred embodiment, the integrated dispatcher schedules all activities which require a processor. These activities include iprocs, mprocs, processes, and events from external coprocessors via the wake queue 70. In the preferred embodiment, the integrated dispatcher provides a single focal point for scheduling multiple process types to multiple processors. The integrated dispatcher provides a means for all processors to have access to all processes. As each processor accesses the run queue 10, the dispatcher schedules the highest priority item on the queue independent of its category.
The integrated dispatcher uses a common run queue 10 designed as a vector of first-in-first-out-queues. This scheme facilitates implementing the integrated dispatcher as a pipeline to eliminate the bottleneck for dispatching processes and is thus designed for efficient multiprocessor scheduling. The integrated dispatcher is completely symmetric with no preferential processors or processes and is therefore completely multithreaded.
SUBSTITUTE SHEET Completely symmetric refers to the equalness of all proces¬ sors and schedulable entities. That is, no processors are given preference nor is any process type ([im]proc) given preference. Each processor has an equal chance of obtaining the entry lock and therefore being allowed to pop a process to schedule. Processes are queued onto a run queue entry based solely upon their priority without biasing on account of their type ([im]proc) and are ordered on the queue in a first-in, first-out order.
This makes the dispatcher completely multithreaded because each processor is 'self-scheduling' . That is, there is not a processor dedicated to supervising the dispatching of processes to processors. Each processor flows through the dispatching code when it needs to dispatch a process to run on this processor.
This organization also maintains symmetry by having a single focal point for dispatching rather than having multiple dispatchers per schedulable entity type ([im]proc). If there were multiple dispatchers per type, this symmetry could not be maintained since one dispatcher would have to be checked before another thereby giving that process type preference. That is, given an iproc dispatcher, an mproc dispatcher and a proc dispatcher and given entries of equal priority on each, the dispatcher that is checked first actually has an inflated priority value since its entity is always chosen to run prior to the others also at that same priority but on the queue of another dispatcher.
As the dispatching mechanism for the run queue, the inte¬ grated dispatcher handles the various system process entities (iproc, mproc, proc, and so on) that are imple¬ mented in the preferred embodiment using standard multi¬ threading methods. The run queue is protected by means well known in the art of creating multithreaded code and
E SHEET described in the application INTEGRATED SOFTWARE ARCHITEC¬ TURE FOR A HIGHLY PARALLEL MULTIPROCESSOR SYSTEM.
In the preferred embodiment, the conditional switch capa¬ bility of the invention enables lower priority processes to handle interrupts rather than delaying high priority processes.
Switching means 'yielding the processor' . The current state of context of the processor must be saved somewhere so that it can be restored before it is allowed to execute again. After this processor context is tucked away (usually in the process user area) the context of the process that was chosen to run next in this processor is restored. When the switched out process is chosen to run again, its processor context is restored so that the processor looks just as it did before it switched out. Note that processes do not have to resume on the processor from which it was switched. Any processor running the dispatching code and seeing this process as the highest priority process will restore this process context and allow it to run.
Switching causes dispatcher code execution to choose the next process which should run on this processor. There¬ fore, the normal flow is: save outgoing processes context; run dispatcher to choose next process to run in this processor; restore context of chosen process; allow chosen process to resume execution.
Conditional switch is a more efficient version of this scheme that takes into account that this process may be experiencing preemption and not voluntarily giving up the processor because it knows it is going to be waiting for a certain event to happen (sleep). Conditional switch delays saving the current process's context until it is sure that it is going to be switched out. That is, if the process
UTE SHEET which was just switched out is the highest priority process on the run queue, it is chcjen as the 'next process' to run on this processor. This results in an unnecessary saving/restoring of context. Therefore, the flow of conditional switch is: determine highest priority process on run queue; compare this priority with the priority of current process; if current process is at an equal or higher priority, do not continue with this switch but allow current process to continue to execute; if current process is at a lower priority, continue with context switch, save current process's context, restore higher priority process's context and allow that process to resume execute.
A 'daemon* is a process that does not terminate. Once created, it performs a specific task and then goes to sleep waiting for that task to be needed again. These tasks are frequently needed making it much more efficient to leave the process around rather than creating it each time its function is needed. An example of a daemon in UNIX is the buffer cache daemon which is created at system boot time and woken whenever the buffer cache needs to be flushed out to disk.
There may exist a minimal-context system daemon which any processor can wake to handle the interrupt, or any processor may spawn a new minimal context system process to handle it. Either way, the interrupted processor running need not be delayed by handling the interrupt itself. When the integrated dispatcher runs it will schedule the highest priority process. Therefore, if the newly created or awakened interrupt process is the highest priority process, it is chosen to run. Note that the concept of a light¬ weight system process is essential to this scheme because interrupt handling cannot be delayed while a full context process is created to handle the interrupt. This allows a processor doing work at a high priority to cause a processor running at a lower priority to handle the
T interrupt. A lightweight system process typically is an iproc in contrast to a microprocess which is an mproc both of which were discussed hereinabove.
If it is imperative that this interrupt be handled immedi¬ ately, the current processor can cause another processor to be preempted by sending that processor a signal (see application SIGNALING MECHANISM FOR A MULTIPROCESSOR SYSTEM) . The signalled processor preempts when it receives the signal. The process running on the signalled processor can then decide whether or not it will handle the interrupt; if the iproc assigned to handle the interrupt is at a higher priority, the current process will be switched out and the iproc run.
A system variable holds the processor number that is currently executing at the lowest priority. Given this identification, any processor can signal any other processor as provided by the hardware outlined in the aforementioned copending patent application. Assuming this machine hardware exists, a processor receiving an interrupt can spawn an iproc to handle that interrupt and then signal the lowest priority processor. The interrupted processor then resumes execution of its work without servicing the interrupt. Receipt of the signal by the lowest priority processor forces it into the interrupt code and a conditional switch. Note that this 'lowest priority process* may not actually be at a lower priority than the newly spawned iproc which is servicing the interrupt. That is, if all the processors were executing very important tasks at the time of the interrupt, the 'lowest priority' process currently executing may still be higher than the iproc. In this case, the conditional switch allows this processor to continue to execute the current process rather than servicing the interrupt. The interrupt is serviced when a processor drops to a priority below that of the iproc.
SUBSTITUTE SHEET The problem of overhead on system processes has been eliminated or at least substantially minimized by the method provided in th preferred embodiment to create these special process entities which are nearly context free.
The dispatcher has a mechanism for generating minimal context system processes as described in the copending application METHOD OF IMPLEMENTING KERNEL FUNCTIONS USING
MINIMAL CONTEXT PROCESSES.
A 'user block' or 'user area' is a data structure used by the kernel to save information related to a process that is not needed until the process is ready to run. That is, the kernel keeps data related to a process in process table entries and in user areas. All information needed regard¬ less of the state of the process is in the process table. The user area can actually be swapped and must therefore not contain any information the system may need while it is not in memory. In the preferred embodiment, the user block is eliminated completely, and the information contained in the process table is minimized. The overall result is that overhead is minimized.
The integrated dispatcher addresses yet another limitation of prior implementations. The dispatcher runs in conjunc¬ tion with a mechanism which allows slower coprocessors to schedule work in the multiprocessor as described below as well as in copending application METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSORS WITH UNEQUAL SPEEDS.
The integrated dispatcher contains the intelligence for managing the wake queue 70. The wake queue is a method of communication between slow and fast processes which prevents bottlenecking the fast processes. Access to the wake queue occurs in such a way as to permit slower coprocessors to put entries on the wake queue without holding up the faster processors. Thus, coprocessors of unequal speed can efficiently schedule tasks for a fast processor to run without locking the run queue from which the fast processors are scheduled. For example, when a peripheral device which is slow requests action from a processor (which is fast), the request is held in the wake queue until a processor is available to handle the request.
Thus, faster processors are not held up by slower coprocessors.
The intelligence for servicing the wake queue has been added to the integrated dispatcher. Rather than having a slow coprocessor interrupt a fast processor to schedule a task or indicate to it that some tasks are complete, the wake queue concept provides an intermediate queue upon which the coprocessor can queue information. When the integrated dispatcher is run, it checks to see if any entries are queued upon the wake queues and if so, processes them. This processing varies widely dependent upon the data queued. Any information can be queued and processed in this way as long as the coprocessor knows to queue it, and the intelligence is added in the integrated dispatcher to process it. For example, if the information queued is simply the address of a process waiting for some coprocessor task to complete, the integrated dispatcher can now wake that process. The results in that process being queued on the run queue. Specific details of the wake queue are described in the copending application METHOD OF EFFICIENT COMMUNICATION BETWEEN COPROCESSORS WITH UNEQUAL SPEEDS.
Although the description of the preferred embodiment has been presented, it is contemplated that various changes could be made without deviating from the spirit of the present invention.
While the exemplary preferred embodiments of the present invention are described herein with particularity, those having normal skill in the art will recognize various changes, modifications, additions and application other than those specifically mentioned herein without departing from the spirit of this invention.
What is claimed is:

Claims

1. In a multiprocessor environment wherein each processor is capable of accessing one or more storage areas, a method for self scheduling of service requests from a plurality of sources having potentially different priority levels for handling comprising the steps of:
establishing a run queue in a first storage area having storage positions therein arrayed in sequential order corresponding to priority levels;
monitoring service requests from the plurality of sources;
assigning a priority level to each service request for entry thereof into the corresponding position in said run queue;
creating an integrated dispatcher including a multiplicity of instructions for retention in a second storage area;
causing each processor upon becoming available to handle a service request to retrieve said integrated dispatcher;
responding to execution of said integrated dispatcher in said processor by inspecting said run queue to identify the service request having the highest priority; and
configuring said processor to handle the service request thus assigned to it.
2. The method in accordance with claim 1 wherein the service requests can include a demand for microprocessor scheduling, a demand for interrupt process scheduling or a process scheduling and wherein the method includes the step of preventing any other processor from accessing said first storage area at the same priority level as that which said process is handling pursuant to said configuring step until the processor having the preceding run queue has completed its loading step.
3. The method in accordance with claim 1 which includes the steps of associating all service requests of given priority level with a particular position in the run queue, and maintaining a given priority level in the run queue active until all service requests of that priority level are handled.
4. The method in accordance with claim 3 wherein said associating step includes the step of establishing a process identifying area and a semaphore counter function area corresponding to each said process identifying area.
PCT/US1991/004068 1990-08-23 1991-06-10 Scheduling method for a multiprocessing operating system WO1992003784A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57195290A 1990-08-23 1990-08-23
US571,952 1990-08-23

Publications (1)

Publication Number Publication Date
WO1992003784A1 true WO1992003784A1 (en) 1992-03-05

Family

ID=24285734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/004068 WO1992003784A1 (en) 1990-08-23 1991-06-10 Scheduling method for a multiprocessing operating system

Country Status (1)

Country Link
WO (1) WO1992003784A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010032205A1 (en) * 2008-09-17 2010-03-25 Nxp B.V. Electronic circuit comprising a plurality of processing devices
CN112749006A (en) * 2019-10-31 2021-05-04 爱思开海力士有限公司 Data storage device and operation method thereof

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4286322A (en) * 1979-07-03 1981-08-25 International Business Machines Corporation Task handling apparatus
US4316245A (en) * 1973-11-30 1982-02-16 Compagnie Honeywell Bull Apparatus and method for semaphore initialization in a multiprocessing computer system for process synchronization
US4333144A (en) * 1980-02-05 1982-06-01 The Bendix Corporation Task communicator for multiple computer system
US4394727A (en) * 1981-05-04 1983-07-19 International Business Machines Corporation Multi-processor task dispatching apparatus
US4475156A (en) * 1982-09-21 1984-10-02 Xerox Corporation Virtual machine control
US4642758A (en) * 1984-07-16 1987-02-10 At&T Bell Laboratories File transfer scheduling arrangement
US4642756A (en) * 1985-03-15 1987-02-10 S & H Computer Systems, Inc. Method and apparatus for scheduling the execution of multiple processing tasks in a computer system
US4727487A (en) * 1984-07-31 1988-02-23 Hitachi, Ltd. Resource allocation method in a computer system
US4805107A (en) * 1987-04-15 1989-02-14 Allied-Signal Inc. Task scheduler for a fault tolerant multiple node processing system
US4897780A (en) * 1984-10-09 1990-01-30 Wang Laboratories, Inc. Document manager system for allocating storage locations and generating corresponding control blocks for active documents in response to requests from active tasks
US4985831A (en) * 1988-10-31 1991-01-15 Evans & Sutherland Computer Corp. Multiprocessor task scheduling system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4316245A (en) * 1973-11-30 1982-02-16 Compagnie Honeywell Bull Apparatus and method for semaphore initialization in a multiprocessing computer system for process synchronization
US4286322A (en) * 1979-07-03 1981-08-25 International Business Machines Corporation Task handling apparatus
US4333144A (en) * 1980-02-05 1982-06-01 The Bendix Corporation Task communicator for multiple computer system
US4394727A (en) * 1981-05-04 1983-07-19 International Business Machines Corporation Multi-processor task dispatching apparatus
US4475156A (en) * 1982-09-21 1984-10-02 Xerox Corporation Virtual machine control
US4642758A (en) * 1984-07-16 1987-02-10 At&T Bell Laboratories File transfer scheduling arrangement
US4727487A (en) * 1984-07-31 1988-02-23 Hitachi, Ltd. Resource allocation method in a computer system
US4897780A (en) * 1984-10-09 1990-01-30 Wang Laboratories, Inc. Document manager system for allocating storage locations and generating corresponding control blocks for active documents in response to requests from active tasks
US4642756A (en) * 1985-03-15 1987-02-10 S & H Computer Systems, Inc. Method and apparatus for scheduling the execution of multiple processing tasks in a computer system
US4805107A (en) * 1987-04-15 1989-02-14 Allied-Signal Inc. Task scheduler for a fault tolerant multiple node processing system
US4985831A (en) * 1988-10-31 1991-01-15 Evans & Sutherland Computer Corp. Multiprocessor task scheduling system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010032205A1 (en) * 2008-09-17 2010-03-25 Nxp B.V. Electronic circuit comprising a plurality of processing devices
CN112749006A (en) * 2019-10-31 2021-05-04 爱思开海力士有限公司 Data storage device and operation method thereof
CN112749006B (en) * 2019-10-31 2024-04-16 爱思开海力士有限公司 Data storage device and method of operating the same

Similar Documents

Publication Publication Date Title
US5452452A (en) System having integrated dispatcher for self scheduling processors to execute multiple types of processes
US5390329A (en) Responding to service requests using minimal system-side context in a multiprocessor environment
EP0806730B1 (en) Real time dispatcher
US8612986B2 (en) Computer program product for scheduling ready threads in a multiprocessor computer based on an interrupt mask flag value associated with a thread and a current processor priority register value
US5274823A (en) Interrupt handling serialization for process level programming
US5202988A (en) System for communicating among processors having different speeds
US5469571A (en) Operating system architecture using multiple priority light weight kernel task based interrupt handling
US5630128A (en) Controlled scheduling of program threads in a multitasking operating system
US5247675A (en) Preemptive and non-preemptive scheduling and execution of program threads in a multitasking operating system
US6505229B1 (en) Method for allowing multiple processing threads and tasks to execute on one or more processor units for embedded real-time processor systems
US5745778A (en) Apparatus and method for improved CPU affinity in a multiprocessor system
JP2866241B2 (en) Computer system and scheduling method
US5333319A (en) Virtual storage data processor with enhanced dispatching priority allocation of CPU resources
US6633897B1 (en) Method and system for scheduling threads within a multiprocessor data processing system using an affinity scheduler
US5010482A (en) Multi-event mechanism for queuing happened events for a large data processing system
US20040172631A1 (en) Concurrent-multitasking processor
US5257375A (en) Method and apparatus for dispatching tasks requiring short-duration processor affinity
US20060130062A1 (en) Scheduling threads in a multi-threaded computer
EP0052713B1 (en) A process management system for scheduling work requests in a data processing system
Horowitz A run-time execution model for referential integrity maintenance
WO1992003784A1 (en) Scheduling method for a multiprocessing operating system
EP0544822B1 (en) Dual level scheduling of processes
WO2002046887A2 (en) Concurrent-multitasking processor
WO1992003779A1 (en) Method of efficient communication between coprocessors
Rothberg Interrupt handling in Linux

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE