US20090328055A1 - Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance - Google Patents

Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance Download PDF

Info

Publication number
US20090328055A1
US20090328055A1 US12/164,775 US16477508A US2009328055A1 US 20090328055 A1 US20090328055 A1 US 20090328055A1 US 16477508 A US16477508 A US 16477508A US 2009328055 A1 US2009328055 A1 US 2009328055A1
Authority
US
United States
Prior art keywords
threads
thread
cores
recited
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/164,775
Other versions
US8296773B2 (en
Inventor
Pradip Bose
Alper Buyuktosunoglu
Eren Kursun
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/164,775 priority Critical patent/US8296773B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BOSE, PRADIP, BUYUKTOSUNOGLU, ALPER, KURSUN, EREN
Publication of US20090328055A1 publication Critical patent/US20090328055A1/en
Application granted granted Critical
Publication of US8296773B2 publication Critical patent/US8296773B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to integrated circuit management systems and more particularly to systems and methods which reduce energy usage and improve performance of integrated circuit processing cores especially during low utilization periods.
  • Energy efficiency and performance can be optimized using techniques described herein.
  • the energy efficiency and relative performance within a core or in an SMT setting is relatively low during the low utilization periods.
  • a server chip still needs to stay active—even if the number of tasks is significantly lower than the full capacity.
  • systems and methods are provided that improve the energy efficiency and/or performance of servers at low utilization periods through task assignment and turning off cores.
  • the present principles can benefit the energy efficiency and performance during all periods of operation, e.g., even during moderate-high utilization periods.
  • a system and method for improving efficiency of a multi-core architecture includes, in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency. Threads of the workload are reassigned to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
  • a system and method for improving efficiency of a multi-core architecture includes in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency based on a run-time sensor and hardware counters where information is processed simultaneously to identify problem threads and reassign threads without a profiling phase, and reassigning threads of the workload to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
  • FIG. 1 is a block/flow diagram showing a core-shutdown and thread assignment system/method in accordance with one illustrative embodiment
  • FIG. 2 is a block/flow diagram showing a core-shutdown system/method in accordance with one illustrative embodiment
  • FIG. 3 is a block/flow diagram showing a thread assignment system/method in accordance with one illustrative embodiment
  • FIG. 4 is a thread history table in accordance with one illustrative embodiment
  • FIG. 5 is a block/flow diagram showing a thread assignment system/method in accordance with another illustrative embodiment.
  • FIG. 6 is a block/flow diagram showing a thread assignment selection system/method in accordance with one illustrative embodiment.
  • the present principles provide methods and systems for improving energy efficiency and performance of server architectures especially during low utilization periods by shutting down processing units and through hardware-aware thread reassignments.
  • the present principles may be used even at high or moderate utilization periods.
  • a core shut-down scheme includes one or more processing units/cores which are switched to an off state based on a metric (e.g., designated as “m” or “M”), which accounts for one or more of the following: core-level and chip-level utilization, number of threads in the task queue, memory accesses and power/performance constraints, ratio of dynamic power versus maximum power (P dyn /P max ) per core and length of the low utilization period, etc. Other metrics may also be employed (e.g., temperature, etc.).
  • the core scheme preferably gets activated during low utilization periods longer than a threshold “t” with a utilization level lower than 1 (e.g., 100%).
  • core activation/deactivation may be provided based upon hardware or priority constraints.
  • a multi-core shared level 2 (L2) architecture may include one or more cores from each node which are shut-down, whereas shared L2 caches may be kept active due to unique characteristics of the architecture.
  • the threads are assigned to the remaining active cores in the node depending on priority and other constraints.
  • the number of active core constraints may be relaxed for performance constraints.
  • the active cores in the n-core node may provide increased L2 access (through interconnect configurations, partitioning priorities, etc.).
  • Simultaneous multi-threading (SMT) modes of the existing cores may be re-adjusted to compensate for the inactive cores.
  • a scheduler along with a resource manager assigns SMT modes to cores according to the SMT flags of the existing threads in the queue and the above power/performance constraints.
  • Each thread has a tag for SMT mode preference, depending on job priority, resource usage, memory access patterns, etc. for maximum efficiency.
  • thread assignment and migration includes a minimum overhead thread assignment history table which enables leakage power and performance aware thread assignment to the cores by storing power dissipation, performance, memory accesses for thread combinations dynamically at run time from the existing thread combinations. Note that since there is no special profiling phase and dedicated cores, there is no computational overhead.
  • the size of the thread assignment history table is minimized by storing the combinations with performance numbers below a pre-determined threshold. This unwanted list stays active as long as the thread combinations remain in the task queue. As the threads are executed to completion, they are replaced with new combinations as the system adaptively learns the new combinations. The most wanted list is stored as well as the unwanted (“bad”) list. The scheduler finds these threads in the queue and assigns them together.
  • This methodology employs an iterative scheme to isolate the threads which cause reduced throughput (by identifying the common threads in all unwanted list items as well as looking at available single thread performance, if possible). These threads are clustered and run on a minimum number of cores, and as a result, the overall system performance is improved.
  • an instruction per cycle (IPC)-Hybrid scheme employs threads which are assigned according to the estimated single thread performance as well as a desired SMT mode.
  • the thread assignment queue has corresponding bins for high to low performance threads.
  • the thread scheduler assigns threads from respective queues giving priority to the assignment of high performance threads assigned together within the SMT constraints of the threads.
  • a simultaneous multi-threading 2 (SMT2) based performance analysis for threads may be performed.
  • SMT2 simultaneous multi-threading 2
  • each thread is assigned a “thread friendliness” factor for SMT2.
  • These numbers are used for extrapolating other SMT modes.
  • the memory and CPU behavior are observed during these runs.
  • the operating system assesses the qualities of the thread for future assignment.
  • the threads are grouped into a memory log and specific resource contention classes (FP, FX, or other known instructions). The assignment is based on these simple thread qualities. High priority threads are assigned to high-performance cores due to process variability for maximum efficiency.
  • a thread is a sequence of tasks that can be performed by a processing core.
  • a thread is included inside a process, and different threads in the same process may share resources.
  • Multithreading includes parallel execution of multiple tasks or threads where the processor or processing core switches between different threads.
  • Operating systems support threading with a process scheduler.
  • the operating system kernel permits programmers to manipulate threads via a system call interface.
  • Embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements.
  • the present invention is implemented in hardware, but may include software elements.
  • Software elements may include but are not limited to firmware, resident software, microcode, etc.
  • a computer-usable or computer-readable medium can be any apparatus that may include, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM)r a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks, modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • the systems described herein may be part of the design for integrated circuits, chips or boards.
  • the design of the systems, chips or boards are created in a graphical computer programming language, and stored in a computer storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network). If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly.
  • the stored design is then converted into the appropriate format (e.g., Graphic Data System II (GDSII)) for the fabrication of photolithographic masks, which typically include multiple copies of the chip design in question that are to be formed on a wafer.
  • GDSII Graphic Data System II
  • the photolithographic masks are utilized to define areas of the wafer (and/or the layers thereon) to be etched or otherwise processed.
  • the resulting integrated circuit chips/boards can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form.
  • the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multi-chip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections).
  • the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product.
  • the end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
  • a core-shutdown and thread assignment system 100 is applied to a system 102 of chips, boards or other units to improve the performance and energy efficiency of multi-core architectures, especially during phases of low utilization.
  • High-level execution of an illustrative system includes measuring current chip utilization of a plurality of chips 104 , 106 and 108 in terms of characteristics such as, for example, power, performance, temperature, thread profiles, job queue utilization, job deadlines, etc.
  • chip utilization characteristics include, e.g., a number of active threads in the job queue compared to the maximum capacity, dynamic power dissipation of cores compared to the maximum power (indicating high leakage power percentage) and length of low utilization mode.
  • Chips 104 , 106 and 108 may be of the same type or different types, may be of any number and may each include cores implementing different SMT modes, e.g., ST, SMT, SMT2, SMT4, etc. it should be understood that the chips may be mounted on boards and may include systems on a chip, individual chips or any other circuit unit or components.
  • the chips 104 , 106 and 108 include processing cores 110 .
  • a processing core 110 may include a processor chip designed to handle computations and other tasks provided in a job queue of a scheduler 120 .
  • a thread history table 122 is employed to track previous assignments of tasks on the cores 110 to extract performance characteristics of threads executed or performed by the cores 110 .
  • Power/performance constraints 124 are employed to make a state selection 126 and thread assignment decisions for the chips 104 , 106 , 108 and/or cores 110 .
  • On-chip power/performance and thermal measurements 128 (or other information) are collected periodically to assist the decision making process. This information is provided to a control module 133 which determines state selections for cores 110 and otherwise allocates core activity.
  • the thread history table 122 may include other information used in assignment decisions. For example, temperature information may be stored in the thread history table 122 , such that threads can be assigned based on the resulting thermal profile of a combination of threads. For example, even if the thread combination yields high performance, the thread combination may not be desirable if the combination raises the temperature above a critical temperature threshold.
  • a control module 133 reassigns the workload onto an active subset of cores.
  • the activity of the cores 110 in FIG. 1 are indicated by different hatchings.
  • the CPU 111 assesses utilization of the cores 110 from program 130 and inputs this information to the module 133 .
  • those cores with different hatchings in FIG. 1 varying degrees of activity while those cores with no fill are inactive.
  • the thread assignment is used to compensate for a potential performance drop.
  • the overall performance may improve with the task assignment scheme.
  • the number of active cores is calculated to match the overall chip utilization. The remaining cores are shut down or put into lower power states for leakage savings.
  • Assigning the threads to a fewer number of cores in simultaneous multi-threading may also cause performance degradation.
  • the present methods as implemented by module 133 , perform thread clustering with minimum performance degradation onto a minimum number of cores as possible. During the thread assignment, thread characteristics are taken into consideration for performance.
  • module 133 may include but are not limited to the following. It should be understood that additional methods may be employed or these methods may be combined or alternately applied as needed.
  • 1) Friendliness-based thread clustering This is employed based on SMT analysis of the threads. For example, the SMT4 version has higher overhead due to the increased number of combinations it handles.
  • Randomized-IPC hybrid Assigns threads to a number of IPC bins, e.g., high/intermediate/low IPC. The assignment from within the bin is random, yet the scheme resembles the pure IPC scheme in terms of the higher level method steps.
  • IPC-Friendly hybrid Unlike the pure IPC or pure friendly-threads based methods, this method only considers the symbiotic combinations with high IPC, the rest of the assignments are random.
  • Memory bound based scheme assigns threads based on memory characteristics.
  • two bins may be employed for the IPC-Friendliness determination. These bins (e.g., good and bad) may be employed to separate threads based on high performance and memory intensity. By reducing the number of combinations, this improves the efficiency of assignment—yet provides benefits comparable to tracking all possible combinations.
  • the number of active cores at any point in time depends on the workload demand and power saving mode. In general, the present principles optimize the energy efficiency by matching the overall chip utilization to the percentage of active cores. Discrete settings of percentages, e.g., 25%, 50%, 75% and 100% for core activity is also a possible implementation/embodiment. These setting percentages may be applied to the chip ( 104 , 106 or 108 ) or applied to individual cores 110 .
  • Module 133 sets different cores 110 on a processor or system 102 to different operation modes depending on the power saving mode and the application demand. Some cores 110 are over-clocked to compensate for the performance degradation in the inactive cores, while others may be put into power saving modes. By employing the heterogeneity in the applications and the core performance (e.g., SMT mode and power saving mode) both performance and energy may be optimized simultaneously.
  • optimization may be performed using an objective function to compare different scenarios in module 133 .
  • the best case scenario may be selected or a scenario that meets particular constraints may be selected.
  • cores in single-thread (ST) mode may be used for threads with single-thread high performance requirements as well as deadline restrictions.
  • Higher levels of multi-threading may be used for threads which do benefit from SMT or do not have strict deadline restrictions—as well as deeper power saving modes.
  • module 133 and system 100 may be implemented in software with hardware assistance.
  • System 100 may be employed as an operating system scheduler or may be employed as a separate system for monitoring and controlling processing systems and multi-core processors.
  • the system/method includes selectively shutting down processing units/cores and hardware-aware thread reassignments.
  • the method/system is preferably implemented by module 133 with inputs as depicted in FIG. 1 , and may be employed during regular operation to monitor and optimize core usage.
  • the system/method may be triggered under present system operating conditions. For example, the method gets activated during low utilization periods longer than a threshold period with a utilization level lower than 1 (100%). Other criteria may also be employed.
  • core shut-down operations are described.
  • core evaluation is performed by gathering information or taking measurements on utilization, power and performance of each core. This information may include the IPC of each core, core power, temperature and thread IPC measurements under given conditions. These values may be initialized to a set value and updated during the execution of the method as will be described.
  • the method has special modes for power and performance caps. While we are trying to improve the efficiency of the chip, we are still bound by the power and performance caps and similar restrictions—which affect the way the methods run in these modes. These may be checked in block 206 .
  • utilization is computed or updated as a function of core and or chip utilization. This function may be based on the design of the chip or core or based on performance metrics of the like. The utilization is preferably expressed as a percentage.
  • a comparison between a threshold and the utilization is made.
  • the threshold may be a user selected or dynamically computed based on the application or conditions. If the utilization is not less than the threshold, the program path returns to block 204 .
  • the utilization parameters are periodically updated. This preferably includes a check of the metric M.
  • the program path goes to block 210 , where a determination of whether power, or performance constraints exist that can be applied. If yes, then in block 212 , thread reassignment based upon runtime history is applied. If no constraints exist, then the program path returns to block 204 .
  • one or more processing units/cores are switched to an off state based on the metric, M.
  • M may be computed as:
  • M alpha_stat*Power_management_mode*[C 1 *( ⁇ (Pdyn_core/Pmax_core)/Number of Cores)+C 2 *(Number of High_Priority_Threads/Threshold 1 +Total_Number_of_threads/Full_Capacity_of_Tasc_Queue)+1/C 3 *(NumberofMemoryAccess/Preset_Memory_Access_Threshold)++C 4 *(Target_Throughput/Preset_Threshold)+1/C 5 *(Average chip temperature ⁇ Temp_Threshold)/Nominal_target_avg_temp]; where alpha_stat is an experimental constant (static) based on number of cores, and hardware settings (to offset redundant shut-downs); C 1 -Cn are computational weights (set by the operating system/hypervisor/hardware settings and priority settings); power_management_mode is a dynamic value provided by
  • the metric M is compared to preset values (M 1 -MN). Depending on the outcome (what range M falls in M 1 -MN):
  • the scheme shuts down a number of ( 0 -N) cores (e.g., based on the above).
  • To shut down cores priority may be based on temperature and high temperature cores are shut down first, while keeping low temperature cores active.
  • the metric M may be computed based upon an objective function that evaluates and weights of different parameters that affect efficiency and performance. For example, M may account for one or more of the following criteria, in block 216 : core-level and chip-level utilization, number of threads in a task queue, memory accesses and power/performance constraints, ratio of P dyn /P max per core, temperature, leakage profile, length of the low utilization period, etc.
  • a plurality of shut-down methods may be applied to conserve power and improve efficiency. These methods employ adjustment of parameters to improve performance of remaining active cores in block 218 .
  • L2 multi-core shared level 2
  • Threads are assigned to remaining active cores in the node and depending on priority constraints to the neighboring nodes. The number of active core constraints may be relaxed for performance constraints.
  • a number of options may be provided when shutting down cores. For example, in a case of shared caches, the cache structures are not shut down every time the cores are off. Instead, tracking of how the threads use caches is employed to decide how many caches need to be shutdown.
  • the active cores in the n-core node may be provided increased L2 access (through interconnect configurations, and/or partitioning priorities). This is to enable more efficient utilization of the active cores to permit the shut down of less active cores to improve efficiency. Other steps may also be taken to improve performance, for example, a clock frequency and supply voltage may be increased to provide a performance boost for the active cores and to compensate for the shut down of inactive cores. If all cores are inactive in the n-core node, the shared L2 cache can be brought to a leakage saving state with data retention to preserve the stored memory.
  • Criteria for shut down in block 216 may include that the cores with higher leakage power due to variability are given higher priority for shut-down to alleviate the variation characteristics of the current process technologies.
  • the cores with high temperatures with or without variability are given priority for shut-down due to the higher power savings.
  • SMT modes of the existing cores are re-adjusted in block 218 to compensate for the inactive cores.
  • a scheduler along with a resource manager, which may be present in module 133 may be configured to assign SMT modes to cores according to SMT flags of existing threads in the queue, and the above power/performance constraints. Each thread may have a tag for an SMT mode preference, depending on the job priority, resource usage, memory access patterns etc for maximum efficiency.
  • an efficiency mode may be initiated to improve system function. For example, only thread reassignment may be employed in accordance with the present principles.
  • Thread reassignment is employed as part of FIG. 2 , but may be employed separately as a method for performance enhancement of a multi-core system.
  • core evaluation is performed by gathering information or taking measurements on utilization, power, priority and performance of each core. This information may include the IPC of each core, core power, thread IPC measurements under given conditions, temperature, etc. These values may be initialized to a set value and updated during the execution of the method. These values may be inherited from block 204 of FIG. 1 .
  • power, performance, utilization, priority or other metrics may be compared to a minimum threshold (Thr min ) and compared to a maximum threshold (Thr max ).
  • Thr min minimum threshold
  • Thr max maximum threshold
  • These thresholds may be set based upon desired performance criteria, power criteria or other parameter criteria.
  • the threshold may be determined by a user or based on some function or other constraint.
  • thread priorities add another dimension to the thread-reassignment schemes. For example, the threads with high priorities and strict deadline restrictions are not assigned to cores running in SMT4 mode. This may also be employed as another item tracked for making a core shut down decision.
  • the thread combination is checked against a known good combination list in block 224 .
  • a number of cores are maximized which are assigned threads from the good combination list. This type of thread combination from the thread queue may be referred to as friendly or compatible.
  • the thread or threads are reassigned in block 230 .
  • the reassignment may be based on thread IPC, a known bad combination (switch to a better combination), random reassignment, etc.
  • the bad listed threads or thread combinations are isolated to particular cores to improve performance in block 232 .
  • remaining threads are assigned. This may be performed randomly or based upon design criteria of the system or application.
  • the thread assignment can also be used for increasing the efficiency of a multi-core architecture at high utilization periods, where no core shut-down is necessary.
  • the present principles may cover regular management of threads; not just in situations when the threads get reassigned for core shut-down cases.
  • Thread assignment history table 300 (or 122 ) is shown.
  • Table 300 is preferably configured to have minimum overhead/costs (based on memory, power, etc.).
  • Thread assignment history table 300 enables leakage, power, temperature, priority and performance aware thread assignment to the cores by storing power dissipation, performance, memory accesses for thread combinations dynamically at run time from the existing thread combinations. Since there is no special profiling phase and dedicated cores, there is no computational overhead. Since no special profiling phase is employed, the thread table is dynamically filled in real-time when the system is running.
  • the size of the thread assignment history table is minimized by storing the combinations with performance numbers below a pre-determined threshold. This unwanted or bad list stays active as long as the thread combinations remain in the task queue. As the threads are executed to completion, they are replaced with new combinations—as the system adaptively learns the new combinations. The most wanted list or good combination list is stored as well as the unwanted list or bad combination list. The scheduler finds these thread types in the queue and assigns them in accordance with constraints for improving performance.
  • the thread history table 300 may include an area 302 which identifies thread pairs (e.g., Thr i -Thr j ).
  • area 304 power information for the given pair is provided.
  • performance information for the given pair is provided.
  • a combined parameter, e.g., performance/power is provided in area 308 .
  • temperature information may be stored in the thread history table. Temperature information may be stored in the thread history table such that threads can be assigned based on the resulting thermal profile of a combination of threads. For example, even if the thread combination yields high performance, the thread combination may not be desirable if the combination raises the temperature above a critical temperature threshold.
  • a number of memory accesses is provided for the given pair of threads.
  • priority information may be provided.
  • thread priorities may add another dimension to the thread-reassignment. For example, the threads with high priorities and strict deadline restrictions are not assigned to cores running in, say, SMT4 mode. This may also be employed as another item tracked for making a core shut down decision.
  • the table 300 ranks these parameters in accordance with a formula, which may include weighting factors for each parameter. So, for example, in one embodiment, performance may be employed to rank the thread pairs, in another embodiment, it may be a combination of parameters, e.g., performance/power. In addition, threads or thread pairs may be assigned a priority, which may be incorporated in the ranking methodology. In still another embodiment, all or a subset of the parameters may be combined in a formula to determine the rank of the thread pairs.
  • Data on thread history may be updated as the thread pair is assigned and executed to build the table 300 with thread history based on information on thread combinations, power, performance, number of occurrences and the like for the thread pairs and/or threads.
  • Combinations with unfavorable IPC, power, performance/power, etc. move to a bad or unwanted combination list 312 .
  • Known good combinations move to a good combination list 314 .
  • the two lists are separated by set criteria. In this example, a threshold 1 ( 316 ) and a threshold 2 ( 318 ) designate the lists.
  • the thresholds correspond to Thr min and Thr max as described above. In this example, the thresholds are for comparison with the performance/power parameter. However, any parameter or combination of parameters may be employed.
  • the ranking or positioning of the threads or thread pairs in the table 300 may be weighted or performed in accordance with a plurality of policies or constraints.
  • the following is a non-limiting description of a few illustrative schemes for reassigning threads in accordance with the present principles.
  • threads are assigned according to an estimated single thread performance as well as a desired SMT mode.
  • the thread assignment queue/table 300 has corresponding bins 320 for high to low performance threads.
  • a thread scheduler (not shown) assigns threads from respective queues giving priority to the assignment of high performance threads assigned together within the SMT constraints of the threads.
  • SMT2 based performance analysis for threads may be employed, and as a result each thread is assigned a thread friendliness factor for SMT2. These numbers are used for extrapolating other SMT modes.
  • the memory and CPU behavior can be observed during these runs; hence, the operating system assesses the qualities of the thread for future assignment.
  • the threads are grouped into a memory log and specific resource contention classes (FP, FX, etc). The assignment is based on these simple thread qualities. High priority threads are assigned to high-performance cores ( 110 ) due to process variability for maximum efficiency.
  • the reassignments may include specific cores or random cores to improve performance.
  • the core management embodiments described herein provide multiple degrees of freedom in which to more efficiently improve power and performance of a system. Such degrees include activation/deactivation of one or more cores, adjustment the utilization of one or more cores, reassigning threads, adjusting the type of threads run (adjusting multithreading modes), designating particular threads to run on particular cores, and any combinations of these.
  • a system/method in accordance with another illustrative embodiment is shown.
  • data from one or more of a sensor, a hardware counter and a power monitor are collected.
  • run-time characterization tables FIG. 4
  • the data may be collected for an entire chip, a processing core, a functional unit, a thread combination, or on an individual thread basis.
  • the data for the characterization table is not generated through dedicated runs, such as test runs or profiling runs. Instead, the data is collected through a normal operation period; hence, there is no additional profiling overhead associated with filling/updating the tables. Additional processing of the table data may be employed to extract the useful data out of the regular runs—without specialized profiling runs.
  • the size of the characterization table is minimized by storing only significant thread combinations in terms of power, performance or temperature (e.g., in both best and worst performing edges of the spectrum).
  • values stored in the characterization table are checked at run time. The method scans through the list of worst performing thread combinations (e.g., the marked list). The instructions per cycle (IPC), power and temperature as measured are compared with thresholds.
  • the method does not know which thread or threads (Ti) are causing the marked thread combination's unfavorable characteristics.
  • the list of marked combinations is gone through, and common threads that appear in more than one marked combination are found in block 508 .
  • the method starts reassigning the common elements to other dedicated cores to identify if that particular thread is causing the problem. This includes reassigning the thread to another core with known performance and/or collecting data to note changes to a marked combination. If the thread is not in a marked combination, random threads may be selected and assigned to cores running known good combinations of threads in block 510 .
  • a determination as to whether the thread or combination should be added to the marked list is made. If yes, the thread is assigned to the bad thread list in block 516 .
  • a number of iterations may be performed to finish all the common threads in the marked list. If there is no thread identified as unfriendly, the program moves one thread randomly to a different core in block 510 . If the resulting combination is out of the marked list, this indicates that the thread was using most of the resources ineffectively and hence disturbing the power efficiency of the system. If no thread exhibits the aforementioned qualities, the method quits after a predetermined number (k) of iterations and continues with random thread assignment schemes.
  • a system/method 600 with a global arbiter 602 and thread assignment scheme selection are depicted.
  • the global arbiter 602 makes a thread assignment method selection based on a number of parameters including, but not limited to: performance measured in throughput and IPC, power dissipation, functional unit level hotspot temperatures, priority constraints, power saving modes, etc.
  • the arbiter 602 selects from a number of base methods, each with different power/performance efficiency characteristics.
  • the arbiter 602 assigns different weights to the methods to achieve hybrid schemes 612 depending on the power/performance characteristics.
  • a first scheme 604 is based on thread friendliness, where the threads are assigned based on the compatibility of resource usage and requirements.
  • a second scheme 606 is based on the performance characteristics of the threads (estimated from run-time characterization table). The threads are assigned to high/medium/low performance bins.
  • a third scheme 608 isolates the threads with high resource requirements (which degrades the performance of the threads that they are assigned with).
  • a fourth scheme 610 assigns threads randomly.
  • the global arbiter 602 assigns weights (W 1 -W 4 ) to the four schemes to form hybrid schemes 612 to meet the power/performance/temperature constraints.
  • the process repeats and decisions are remade in periodic intervals, e.g. every N cycles.
  • the hybrid schemes 612 may be customized in accordance with the weights.

Abstract

A system and method for improving efficiency of a multi-core architecture includes, in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency. Threads of the workload are reassigned to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.

Description

    GOVERNMENT RIGHTS
  • This invention was made with Government support under Contract No.: HR0011-07-9-0002 awarded by the Defense Advanced Research Projects Agency (DARPA). The Government has certain rights in this invention.
  • BACKGROUND
  • 1. Technical Field
  • The present invention relates to integrated circuit management systems and more particularly to systems and methods which reduce energy usage and improve performance of integrated circuit processing cores especially during low utilization periods.
  • 2. Description of the Related Art
  • Modern microprocessor architecture trends indicate increased numbers of cores—each core running multiple threads on a same chip. The elevated complexity of the on-chip resources as well as fluctuations in the application demand generates a significant resource management challenge, especially in terms of energy efficiency and performance. Recent studies on data server workload characteristics indicate that long periods of low utilization are common in data centers, such as, web servers, video/news-on-demand applications, banking centers etc.
  • During these low utilization periods, the data centers still require active chips to perform needed tasks. This results in a large power draw even though the full functionality of the component integrated circuits is not needed.
  • SUMMARY
  • Resource optimization can benefit even the highly utilized systems in terms of performance and energy efficiency. Energy efficiency and performance can be optimized using techniques described herein. The energy efficiency and relative performance within a core or in an SMT setting is relatively low during the low utilization periods. To serve the tasks in a job queue, a server chip still needs to stay active—even if the number of tasks is significantly lower than the full capacity.
  • In accordance with the present principles, systems and methods are provided that improve the energy efficiency and/or performance of servers at low utilization periods through task assignment and turning off cores. The present principles can benefit the energy efficiency and performance during all periods of operation, e.g., even during moderate-high utilization periods.
  • A system and method for improving efficiency of a multi-core architecture includes, in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency. Threads of the workload are reassigned to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
  • A system and method for improving efficiency of a multi-core architecture includes in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency based on a run-time sensor and hardware counters where information is processed simultaneously to identify problem threads and reassign threads without a profiling phase, and reassigning threads of the workload to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
  • These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
  • FIG. 1 is a block/flow diagram showing a core-shutdown and thread assignment system/method in accordance with one illustrative embodiment;
  • FIG. 2 is a block/flow diagram showing a core-shutdown system/method in accordance with one illustrative embodiment;
  • FIG. 3 is a block/flow diagram showing a thread assignment system/method in accordance with one illustrative embodiment;
  • FIG. 4 is a thread history table in accordance with one illustrative embodiment;
  • FIG. 5 is a block/flow diagram showing a thread assignment system/method in accordance with another illustrative embodiment; and
  • FIG. 6 is a block/flow diagram showing a thread assignment selection system/method in accordance with one illustrative embodiment.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • The present principles provide methods and systems for improving energy efficiency and performance of server architectures especially during low utilization periods by shutting down processing units and through hardware-aware thread reassignments. The present principles may be used even at high or moderate utilization periods.
  • In one embodiment, a core shut-down scheme includes one or more processing units/cores which are switched to an off state based on a metric (e.g., designated as “m” or “M”), which accounts for one or more of the following: core-level and chip-level utilization, number of threads in the task queue, memory accesses and power/performance constraints, ratio of dynamic power versus maximum power (Pdyn/Pmax) per core and length of the low utilization period, etc. Other metrics may also be employed (e.g., temperature, etc.). The core scheme preferably gets activated during low utilization periods longer than a threshold “t” with a utilization level lower than 1 (e.g., 100%).
  • In another embodiment, core activation/deactivation may be provided based upon hardware or priority constraints. In one example, a multi-core shared level 2 (L2) architecture may include one or more cores from each node which are shut-down, whereas shared L2 caches may be kept active due to unique characteristics of the architecture. The threads are assigned to the remaining active cores in the node depending on priority and other constraints. The number of active core constraints may be relaxed for performance constraints. The active cores in the n-core node may provide increased L2 access (through interconnect configurations, partitioning priorities, etc.).
  • Simultaneous multi-threading (SMT) modes of the existing cores may be re-adjusted to compensate for the inactive cores. A scheduler along with a resource manager assigns SMT modes to cores according to the SMT flags of the existing threads in the queue and the above power/performance constraints. Each thread has a tag for SMT mode preference, depending on job priority, resource usage, memory access patterns, etc. for maximum efficiency.
  • In another embodiment, thread assignment and migration includes a minimum overhead thread assignment history table which enables leakage power and performance aware thread assignment to the cores by storing power dissipation, performance, memory accesses for thread combinations dynamically at run time from the existing thread combinations. Note that since there is no special profiling phase and dedicated cores, there is no computational overhead. The size of the thread assignment history table is minimized by storing the combinations with performance numbers below a pre-determined threshold. This unwanted list stays active as long as the thread combinations remain in the task queue. As the threads are executed to completion, they are replaced with new combinations as the system adaptively learns the new combinations. The most wanted list is stored as well as the unwanted (“bad”) list. The scheduler finds these threads in the queue and assigns them together.
  • This methodology employs an iterative scheme to isolate the threads which cause reduced throughput (by identifying the common threads in all unwanted list items as well as looking at available single thread performance, if possible). These threads are clustered and run on a minimum number of cores, and as a result, the overall system performance is improved.
  • In another embodiment, an instruction per cycle (IPC)-Hybrid scheme employs threads which are assigned according to the estimated single thread performance as well as a desired SMT mode. The thread assignment queue has corresponding bins for high to low performance threads. The thread scheduler assigns threads from respective queues giving priority to the assignment of high performance threads assigned together within the SMT constraints of the threads.
  • A simultaneous multi-threading 2 (SMT2) based performance analysis for threads may be performed. As a result, each thread is assigned a “thread friendliness” factor for SMT2. These numbers are used for extrapolating other SMT modes. The memory and CPU behavior are observed during these runs. Hence, the operating system assesses the qualities of the thread for future assignment. The threads are grouped into a memory log and specific resource contention classes (FP, FX, or other known instructions). The assignment is based on these simple thread qualities. High priority threads are assigned to high-performance cores due to process variability for maximum efficiency.
  • A thread is a sequence of tasks that can be performed by a processing core. In general, a thread is included inside a process, and different threads in the same process may share resources. Multithreading includes parallel execution of multiple tasks or threads where the processor or processing core switches between different threads. Operating systems support threading with a process scheduler. The operating system kernel permits programmers to manipulate threads via a system call interface.
  • Embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements. In a preferred embodiment, the present invention is implemented in hardware, but may include software elements. Software elements may include but are not limited to firmware, resident software, microcode, etc.
  • Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that may include, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM)r a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
  • A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks, modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • The systems described herein may be part of the design for integrated circuits, chips or boards. The design of the systems, chips or boards are created in a graphical computer programming language, and stored in a computer storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network). If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly. The stored design is then converted into the appropriate format (e.g., Graphic Data System II (GDSII)) for the fabrication of photolithographic masks, which typically include multiple copies of the chip design in question that are to be formed on a wafer. The photolithographic masks are utilized to define areas of the wafer (and/or the layers thereon) to be etched or otherwise processed.
  • The resulting integrated circuit chips/boards can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form. In the latter case the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multi-chip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections). In any case the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product. The end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
  • Referring now to the drawings in which like numerals represent the same or similar elements and initially to FIG. 1, a core-shutdown and thread assignment system 100 is applied to a system 102 of chips, boards or other units to improve the performance and energy efficiency of multi-core architectures, especially during phases of low utilization. High-level execution of an illustrative system includes measuring current chip utilization of a plurality of chips 104, 106 and 108 in terms of characteristics such as, for example, power, performance, temperature, thread profiles, job queue utilization, job deadlines, etc. In a particularly useful embodiment, chip utilization characteristics include, e.g., a number of active threads in the job queue compared to the maximum capacity, dynamic power dissipation of cores compared to the maximum power (indicating high leakage power percentage) and length of low utilization mode.
  • Chips 104, 106 and 108 may be of the same type or different types, may be of any number and may each include cores implementing different SMT modes, e.g., ST, SMT, SMT2, SMT4, etc. it should be understood that the chips may be mounted on boards and may include systems on a chip, individual chips or any other circuit unit or components. In a particularly useful embodiment, the chips 104, 106 and 108 include processing cores 110. A processing core 110 may include a processor chip designed to handle computations and other tasks provided in a job queue of a scheduler 120.
  • A thread history table 122 is employed to track previous assignments of tasks on the cores 110 to extract performance characteristics of threads executed or performed by the cores 110. Power/performance constraints 124 are employed to make a state selection 126 and thread assignment decisions for the chips 104, 106, 108 and/or cores 110. On-chip power/performance and thermal measurements 128 (or other information) are collected periodically to assist the decision making process. This information is provided to a control module 133 which determines state selections for cores 110 and otherwise allocates core activity.
  • The thread history table 122 may include other information used in assignment decisions. For example, temperature information may be stored in the thread history table 122, such that threads can be assigned based on the resulting thermal profile of a combination of threads. For example, even if the thread combination yields high performance, the thread combination may not be desirable if the combination raises the temperature above a critical temperature threshold.
  • After a chip utilization program 130 is assessed in a computer processing unit (CPU) 111, a control module 133 reassigns the workload onto an active subset of cores. The activity of the cores 110 in FIG. 1 are indicated by different hatchings. The CPU 111 assesses utilization of the cores 110 from program 130 and inputs this information to the module 133. For illustrative purposes, those cores with different hatchings in FIG. 1 varying degrees of activity while those cores with no fill are inactive. Note that purely reassigning threads to the fewest number of cores 110 may result in significant performance degradation. Hence, the thread assignment is used to compensate for a potential performance drop. In certain cases, the overall performance may improve with the task assignment scheme. The number of active cores is calculated to match the overall chip utilization. The remaining cores are shut down or put into lower power states for leakage savings.
  • Assigning the threads to a fewer number of cores in simultaneous multi-threading may also cause performance degradation. The present methods, as implemented by module 133, perform thread clustering with minimum performance degradation onto a minimum number of cores as possible. During the thread assignment, thread characteristics are taken into consideration for performance.
  • The methods employed by module 133 may include but are not limited to the following. It should be understood that additional methods may be employed or these methods may be combined or alternately applied as needed.
  • In one embodiment, 1) Friendliness-based thread clustering: This is employed based on SMT analysis of the threads. For example, the SMT4 version has higher overhead due to the increased number of combinations it handles.
  • 2) Pure IPC-based thread clustering: This may be employed where the high-IPC threads are clustered on a same core.
  • 3) Hostile thread isolation: In this scheme, the threads with low performance potential in general are isolated, and they are assigned to special cores so that the remaining cores can achieve higher IPC.
  • 4) Randomized-IPC hybrid: Assigns threads to a number of IPC bins, e.g., high/intermediate/low IPC. The assignment from within the bin is random, yet the scheme resembles the pure IPC scheme in terms of the higher level method steps.
  • 5) IPC-Friendly hybrid: Unlike the pure IPC or pure friendly-threads based methods, this method only considers the symbiotic combinations with high IPC, the rest of the assignments are random.
  • 6) Memory bound based scheme: assigns threads based on memory characteristics.
  • To reduce the analysis overhead in module 133, two bins may be employed for the IPC-Friendliness determination. These bins (e.g., good and bad) may be employed to separate threads based on high performance and memory intensity. By reducing the number of combinations, this improves the efficiency of assignment—yet provides benefits comparable to tracking all possible combinations.
  • The number of active cores at any point in time depends on the workload demand and power saving mode. In general, the present principles optimize the energy efficiency by matching the overall chip utilization to the percentage of active cores. Discrete settings of percentages, e.g., 25%, 50%, 75% and 100% for core activity is also a possible implementation/embodiment. These setting percentages may be applied to the chip (104, 106 or 108) or applied to individual cores 110.
  • In accordance with the present principles, power/performance trade-offs in different SMT modes and multi-threading modes are employed as a “power knob” to control the core activity. The simultaneous multi-threading mode (single thread (ST), SMT2, SMT4, etc.) affects the energy efficiency according to our experimental analysis.
  • Module 133 sets different cores 110 on a processor or system 102 to different operation modes depending on the power saving mode and the application demand. Some cores 110 are over-clocked to compensate for the performance degradation in the inactive cores, while others may be put into power saving modes. By employing the heterogeneity in the applications and the core performance (e.g., SMT mode and power saving mode) both performance and energy may be optimized simultaneously.
  • Optimization may be performed using an objective function to compare different scenarios in module 133. The best case scenario may be selected or a scenario that meets particular constraints may be selected. For example, cores in single-thread (ST) mode may be used for threads with single-thread high performance requirements as well as deadline restrictions. Higher levels of multi-threading may be used for threads which do benefit from SMT or do not have strict deadline restrictions—as well as deeper power saving modes.
  • In accordance with the present principles, module 133 and system 100 may be implemented in software with hardware assistance. System 100 may be employed as an operating system scheduler or may be employed as a separate system for monitoring and controlling processing systems and multi-core processors.
  • Referring to FIG. 2, a system/method for improving energy efficiency of server architectures or multi-core processors, especially during low utilization periods is illustratively shown. The system/method includes selectively shutting down processing units/cores and hardware-aware thread reassignments. The method/system is preferably implemented by module 133 with inputs as depicted in FIG. 1, and may be employed during regular operation to monitor and optimize core usage. The system/method may be triggered under present system operating conditions. For example, the method gets activated during low utilization periods longer than a threshold period with a utilization level lower than 1 (100%). Other criteria may also be employed.
  • In block 202, core shut-down operations are described. In block 204, core evaluation is performed by gathering information or taking measurements on utilization, power and performance of each core. This information may include the IPC of each core, core power, temperature and thread IPC measurements under given conditions. These values may be initialized to a set value and updated during the execution of the method as will be described.
  • The method has special modes for power and performance caps. While we are trying to improve the efficiency of the chip, we are still bound by the power and performance caps and similar restrictions—which affect the way the methods run in these modes. These may be checked in block 206. In block 206, utilization is computed or updated as a function of core and or chip utilization. This function may be based on the design of the chip or core or based on performance metrics of the like. The utilization is preferably expressed as a percentage.
  • In block 208, a comparison between a threshold and the utilization is made. The threshold may be a user selected or dynamically computed based on the application or conditions. If the utilization is not less than the threshold, the program path returns to block 204. The utilization parameters are periodically updated. This preferably includes a check of the metric M.
  • If the utilization is less than the threshold, the program path goes to block 210, where a determination of whether power, or performance constraints exist that can be applied. If yes, then in block 212, thread reassignment based upon runtime history is applied. If no constraints exist, then the program path returns to block 204.
  • In block 214, one or more processing units/cores (or other devices, e.g., caches) are switched to an off state based on the metric, M. For example, M may be computed as:
  • M=alpha_stat*Power_management_mode*[C1*(Σ(Pdyn_core/Pmax_core)/Number of Cores)+C2*(Number of High_Priority_Threads/Threshold1+Total_Number_of_threads/Full_Capacity_of_Tasc_Queue)+1/C3*(NumberofMemoryAccess/Preset_Memory_Access_Threshold)++C4*(Target_Throughput/Preset_Threshold)+1/C5*(Average chip temperature−Temp_Threshold)/Nominal_target_avg_temp]; where alpha_stat is an experimental constant (static) based on number of cores, and hardware settings (to offset redundant shut-downs); C1-Cn are computational weights (set by the operating system/hypervisor/hardware settings and priority settings); power_management_mode is a dynamic value provided by the power management mode (if the chip is in high power management mode, the core shut down is more aggressive). The other variables in this example have self-explanatory labels.
  • The metric M is compared to preset values (M1-MN). Depending on the outcome (what range M falls in M1-MN):
      • (1) One-or more cores are shut down depending on the utilization;
      • (2) Core SMT (simultaneous multithreading mode) is changed depending on the utilization;
      • (3) One or more cores change Voltage and Frequency Settings (a) To reduce power dissipation of the inactive cores reduce the voltage/frequency (b) To increase the performance of the active cores increase supply voltage, clock frequency of the active cores;
      • (4) Guarantee pre-determined number of cores to be active for each M and C2*(number of high priority threads/Thread Queue capacity).
        Other consideration may also be employed.
  • Then the scheme shuts down a number of (0-N) cores (e.g., based on the above). To shut down cores priority may be based on temperature and high temperature cores are shut down first, while keeping low temperature cores active.
  • The metric M may be computed based upon an objective function that evaluates and weights of different parameters that affect efficiency and performance. For example, M may account for one or more of the following criteria, in block 216: core-level and chip-level utilization, number of threads in a task queue, memory accesses and power/performance constraints, ratio of Pdyn/Pmax per core, temperature, leakage profile, length of the low utilization period, etc.
  • A plurality of shut-down methods may be applied to conserve power and improve efficiency. These methods employ adjustment of parameters to improve performance of remaining active cores in block 218. For example, in a multi-core shared level 2 (L2) architecture: one or more cores from each node are shut-down, whereas the shared L2 caches are kept active due to unique characteristics of the architecture. Threads are assigned to remaining active cores in the node and depending on priority constraints to the neighboring nodes. The number of active core constraints may be relaxed for performance constraints.
  • A number of options may be provided when shutting down cores. For example, in a case of shared caches, the cache structures are not shut down every time the cores are off. Instead, tracking of how the threads use caches is employed to decide how many caches need to be shutdown.
  • In accordance with the example, the active cores in the n-core node may be provided increased L2 access (through interconnect configurations, and/or partitioning priorities). This is to enable more efficient utilization of the active cores to permit the shut down of less active cores to improve efficiency. Other steps may also be taken to improve performance, for example, a clock frequency and supply voltage may be increased to provide a performance boost for the active cores and to compensate for the shut down of inactive cores. If all cores are inactive in the n-core node, the shared L2 cache can be brought to a leakage saving state with data retention to preserve the stored memory.
  • Criteria for shut down in block 216 may include that the cores with higher leakage power due to variability are given higher priority for shut-down to alleviate the variation characteristics of the current process technologies. The cores with high temperatures with or without variability are given priority for shut-down due to the higher power savings. SMT modes of the existing cores are re-adjusted in block 218 to compensate for the inactive cores. A scheduler along with a resource manager, which may be present in module 133, may be configured to assign SMT modes to cores according to SMT flags of existing threads in the queue, and the above power/performance constraints. Each thread may have a tag for an SMT mode preference, depending on the job priority, resource usage, memory access patterns etc for maximum efficiency. After deactivating cores, the program path returns to block 204 and repeats to update the core/chip utilization in accordance with the present principles.
  • In block 219, if no cares are shut down, an efficiency mode may be initiated to improve system function. For example, only thread reassignment may be employed in accordance with the present principles.
  • Referring to FIG. 3, a thread assignment and migration system/method 212 is illustratively shown. Thread reassignment is employed as part of FIG. 2, but may be employed separately as a method for performance enhancement of a multi-core system. In block 220 (similar to block 204), core evaluation is performed by gathering information or taking measurements on utilization, power, priority and performance of each core. This information may include the IPC of each core, core power, thread IPC measurements under given conditions, temperature, etc. These values may be initialized to a set value and updated during the execution of the method. These values may be inherited from block 204 of FIG. 1.
  • In block 222, for each thread combination, power, performance, utilization, priority or other metrics may be compared to a minimum threshold (Thrmin) and compared to a maximum threshold (Thrmax). These thresholds may be set based upon desired performance criteria, power criteria or other parameter criteria. The threshold may be determined by a user or based on some function or other constraint. In one example, thread priorities add another dimension to the thread-reassignment schemes. For example, the threads with high priorities and strict deadline restrictions are not assigned to cores running in SMT4 mode. This may also be employed as another item tracked for making a core shut down decision.
  • If the parameter is less than the Thrmax, the thread combination is checked against a known good combination list in block 224. In block 228, a number of cores are maximized which are assigned threads from the good combination list. This type of thread combination from the thread queue may be referred to as friendly or compatible.
  • If the parameter is less than the Thrmin, and the thread combination is on a known bad combination list in block 226, the thread or threads are reassigned in block 230. The reassignment may be based on thread IPC, a known bad combination (switch to a better combination), random reassignment, etc. In one embodiment, the bad listed threads or thread combinations are isolated to particular cores to improve performance in block 232. In block 234, remaining threads are assigned. This may be performed randomly or based upon design criteria of the system or application.
  • The thread assignment can also be used for increasing the efficiency of a multi-core architecture at high utilization periods, where no core shut-down is necessary. The present principles may cover regular management of threads; not just in situations when the threads get reassigned for core shut-down cases.
  • Referring to FIG. 4, an illustrative thread assignment history table 300 (or 122) is shown. Table 300 is preferably configured to have minimum overhead/costs (based on memory, power, etc.). Thread assignment history table 300 enables leakage, power, temperature, priority and performance aware thread assignment to the cores by storing power dissipation, performance, memory accesses for thread combinations dynamically at run time from the existing thread combinations. Since there is no special profiling phase and dedicated cores, there is no computational overhead. Since no special profiling phase is employed, the thread table is dynamically filled in real-time when the system is running.
  • The size of the thread assignment history table is minimized by storing the combinations with performance numbers below a pre-determined threshold. This unwanted or bad list stays active as long as the thread combinations remain in the task queue. As the threads are executed to completion, they are replaced with new combinations—as the system adaptively learns the new combinations. The most wanted list or good combination list is stored as well as the unwanted list or bad combination list. The scheduler finds these thread types in the queue and assigns them in accordance with constraints for improving performance.
  • This is performed by iteratively reviewing the thread combinations in the history table 300 and isolating the threads which cause reduced throughput (by identifying the common threads in all unwanted list items as well as looking at available single thread performance, if possible). These threads are cluster run on a minimum number of cores, and as a result, the overall system performance is improved.
  • The thread history table 300 may include an area 302 which identifies thread pairs (e.g., Thri-Thrj). In area 304, power information for the given pair is provided. In area 306, performance information for the given pair is provided. In area 308, a combined parameter, e.g., performance/power is provided. In area 309, temperature information may be stored in the thread history table. Temperature information may be stored in the thread history table such that threads can be assigned based on the resulting thermal profile of a combination of threads. For example, even if the thread combination yields high performance, the thread combination may not be desirable if the combination raises the temperature above a critical temperature threshold.
  • In area 310, a number of memory accesses is provided for the given pair of threads. In area 311, priority information may be provided. In one example, thread priorities may add another dimension to the thread-reassignment. For example, the threads with high priorities and strict deadline restrictions are not assigned to cores running in, say, SMT4 mode. This may also be employed as another item tracked for making a core shut down decision.
  • The table 300 ranks these parameters in accordance with a formula, which may include weighting factors for each parameter. So, for example, in one embodiment, performance may be employed to rank the thread pairs, in another embodiment, it may be a combination of parameters, e.g., performance/power. In addition, threads or thread pairs may be assigned a priority, which may be incorporated in the ranking methodology. In still another embodiment, all or a subset of the parameters may be combined in a formula to determine the rank of the thread pairs.
  • Data on thread history may be updated as the thread pair is assigned and executed to build the table 300 with thread history based on information on thread combinations, power, performance, number of occurrences and the like for the thread pairs and/or threads. Combinations with unfavorable IPC, power, performance/power, etc. move to a bad or unwanted combination list 312. Known good combinations move to a good combination list 314. The two lists are separated by set criteria. In this example, a threshold 1 (316) and a threshold 2 (318) designate the lists. The thresholds correspond to Thrmin and Thrmax as described above. In this example, the thresholds are for comparison with the performance/power parameter. However, any parameter or combination of parameters may be employed.
  • The ranking or positioning of the threads or thread pairs in the table 300 may be weighted or performed in accordance with a plurality of policies or constraints. The following is a non-limiting description of a few illustrative schemes for reassigning threads in accordance with the present principles.
  • In an IPC-Hybrid Scheme, threads are assigned according to an estimated single thread performance as well as a desired SMT mode. The thread assignment queue/table 300 has corresponding bins 320 for high to low performance threads. A thread scheduler (not shown) assigns threads from respective queues giving priority to the assignment of high performance threads assigned together within the SMT constraints of the threads.
  • SMT2 based performance analysis for threads may be employed, and as a result each thread is assigned a thread friendliness factor for SMT2. These numbers are used for extrapolating other SMT modes. The memory and CPU behavior can be observed during these runs; hence, the operating system assesses the qualities of the thread for future assignment. The threads are grouped into a memory log and specific resource contention classes (FP, FX, etc). The assignment is based on these simple thread qualities. High priority threads are assigned to high-performance cores (110) due to process variability for maximum efficiency.
  • Other schemes may be employed for thread reassignment policies. The reassignments may include specific cores or random cores to improve performance. Advantageously, the core management embodiments described herein provide multiple degrees of freedom in which to more efficiently improve power and performance of a system. Such degrees include activation/deactivation of one or more cores, adjustment the utilization of one or more cores, reassigning threads, adjusting the type of threads run (adjusting multithreading modes), designating particular threads to run on particular cores, and any combinations of these.
  • Referring to FIG. 5, a system/method in accordance with another illustrative embodiment is shown. In block 502, data from one or more of a sensor, a hardware counter and a power monitor (power dissipation data) are collected. In block 504, run-time characterization tables (FIG. 4) are updated according to the information collected in block 502. The data may be collected for an entire chip, a processing core, a functional unit, a thread combination, or on an individual thread basis. Note that the data for the characterization table is not generated through dedicated runs, such as test runs or profiling runs. Instead, the data is collected through a normal operation period; hence, there is no additional profiling overhead associated with filling/updating the tables. Additional processing of the table data may be employed to extract the useful data out of the regular runs—without specialized profiling runs.
  • The size of the characterization table is minimized by storing only significant thread combinations in terms of power, performance or temperature (e.g., in both best and worst performing edges of the spectrum). In block 506, values stored in the characterization table are checked at run time. The method scans through the list of worst performing thread combinations (e.g., the marked list). The instructions per cycle (IPC), power and temperature as measured are compared with thresholds.
  • Because there is no dedicated profiling phase, the method does not know which thread or threads (Ti) are causing the marked thread combination's unfavorable characteristics. The list of marked combinations is gone through, and common threads that appear in more than one marked combination are found in block 508. In blocks 512, the method starts reassigning the common elements to other dedicated cores to identify if that particular thread is causing the problem. This includes reassigning the thread to another core with known performance and/or collecting data to note changes to a marked combination. If the thread is not in a marked combination, random threads may be selected and assigned to cores running known good combinations of threads in block 510.
  • In block 514, based on the collected data, a determination as to whether the thread or combination should be added to the marked list is made. If yes, the thread is assigned to the bad thread list in block 516.
  • In block 518, a number of iterations may be performed to finish all the common threads in the marked list. If there is no thread identified as unfriendly, the program moves one thread randomly to a different core in block 510. If the resulting combination is out of the marked list, this indicates that the thread was using most of the resources ineffectively and hence disturbing the power efficiency of the system. If no thread exhibits the aforementioned qualities, the method quits after a predetermined number (k) of iterations and continues with random thread assignment schemes.
  • Referring to FIG. 6, a system/method 600 with a global arbiter 602 and thread assignment scheme selection are depicted. The global arbiter 602 makes a thread assignment method selection based on a number of parameters including, but not limited to: performance measured in throughput and IPC, power dissipation, functional unit level hotspot temperatures, priority constraints, power saving modes, etc. The arbiter 602 then selects from a number of base methods, each with different power/performance efficiency characteristics. The arbiter 602 assigns different weights to the methods to achieve hybrid schemes 612 depending on the power/performance characteristics.
  • For example, a first scheme 604 is based on thread friendliness, where the threads are assigned based on the compatibility of resource usage and requirements. A second scheme 606 is based on the performance characteristics of the threads (estimated from run-time characterization table). The threads are assigned to high/medium/low performance bins. A third scheme 608 isolates the threads with high resource requirements (which degrades the performance of the threads that they are assigned with). A fourth scheme 610 assigns threads randomly.
  • The global arbiter 602 assigns weights (W1-W4) to the four schemes to form hybrid schemes 612 to meet the power/performance/temperature constraints. The process repeats and decisions are remade in periodic intervals, e.g. every N cycles. The hybrid schemes 612 may be customized in accordance with the weights.
  • Having described preferred embodiments for systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope and spirit of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims (25)

1. A method for improving efficiency of a multi-core architecture, comprising:
in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency based on a run-time sensor and hardware counters where information is processed simultaneously to identify problem threads without a profiling phase; and
reassigning threads of the workload to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
2. The method as recited in claim 1, wherein the operational efficiency is determined by measuring one or more of core-level utilization, chip-level utilization, a number of threads in a task queue, temperature, power constraints, performance constraints, a ratio of powers (Pdynamic/Pmaximum) per core and a length of a utilization period.
3. The method as recited in claim 1, further comprising activating the method during a utilization period longer than a time threshold with a utilization level lower than 100%.
4. The method as recited in claim 1, wherein assigning threads based on priority constraints includes assigning threads based upon knowledge of hardware such that devices that need to be kept active are given a higher priority.
5. The method as recited in claim 1, wherein determining a number of cores to shut down includes determining cores to be shut down based upon priority constraints.
6. The method as recited in claim 1, further comprising adjusting operational parameters to optimize performance.
7. The method as recited in claim 1, further comprising adjusting single thread (ST) and simultaneous multi-threading (SMT) modes of the cores to compensate for inactive cores.
8. The method as recited in claim 1, wherein reassigning threads includes assigning threads to the remaining cores based on a thread history table which stores measurement information for previous thread assignments.
9. The method as recited in claim 8, wherein the thread history table stores measurement information for previous thread assignments including one or more of power dissipation, performance, temperature and memory accesses for thread combinations dynamically at run time from existing thread combinations.
10. The method as recited in claim 8, wherein the thread history table includes at least one threshold to differentiate between threads with preferred characteristics and threads with unpreferred characteristics such that assignment priority is given to the threads with preferred characteristics.
11. A computer readable medium comprising a computer readable program for improving efficiency of a multi-core architecture, wherein the computer readable program when executed on a computer causes the computer to perform the steps of:
in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency based on a run-time sensor and hardware counters where information is processed simultaneously to identify problem threads without a profiling phase; and
reassigning threads of the workload to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
12. A method for improving efficiency of a multi-core architecture, comprising:
in accordance with a workload, determining a number of cores to shut down based upon a metric that combines parameters to represent operational efficiency, and based upon priority constraints of the cores; and
reassigning threads of the workload to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture, including:
adjusting single thread (ST) and simultaneous multi-threading (SMT) modes of the cores to compensate for inactive cores; and
assigning threads to the remaining cores based on a thread history table which stores measurement information for previous thread assignments.
13. The method as recited in claim 12, wherein the operational efficiency is determined by measuring one or more of core-level utilization, chip-level utilization, a number of threads in a task queue, memory accesses, temperature, power constraints, performance constraints, a ratio of powers (Pdynamic/Pmaximum) per core and a length of a utilization period.
14. The method as recited in claim 12, further comprising activating the method during a utilization period longer than a time threshold with a utilization level lower than 100%.
15. The method as recited in claim 12, wherein assigning threads based on priority constraints includes assigning threads based upon knowledge of hardware such that devices that need to be kept active are given a higher priority.
16. The method as recited in claim 12, further comprising adjusting operational parameters to optimize performance.
17. The method as recited in claim 12, wherein the thread history table stores measurement information for previous thread assignments including one or more of power dissipation, performance, and memory accesses for thread combinations dynamically at run time from existing thread combinations.
18. The method as recited in claim 12, wherein the thread history table includes at least one threshold to differentiate between threads with preferred characteristics and threads with unpreferred characteristics such that assignment priority is given to the threads with preferred characteristics.
19. A computer readable medium comprising a computer readable program for improving efficiency of a multi-core architecture, wherein the computer readable program when executed on a computer causes the computer to perform the steps of as recited in claim 12.
20. A system for improving efficiency of a multi-core architecture, comprising:
a processor including:
a scheduler and run-time data collection based hardware and thread characterization table such that the scheduler is configured to schedule a computational workload of tasks for the multi-core architecture, the multi-core architecture including at least one chip having a plurality of cores; and
a control module configured to allocate core activity and assign threads in accordance with the scheduler and one or more of constraints, thread history and measurements from the multi-core architecture such that in accordance with the workload, a number of cores are shut down based upon a metric that combines parameters to represent operational efficiency and threads of the workload are reassigned to cores remaining active by assigning threads based on priority constraints and thread execution history to improve the operational efficiency of the multi-core architecture.
21. The system as recited in claim 20, wherein the operational efficiency includes a measure of one or more of core-level utilization, chip-level utilization, performance measurement per thread per core and chip, a number of threads in a task queue, memory accesses, on-chip peak block temperatures, power constraints, performance constraints, a ratio of powers (Pdynamic/Pmaximum) per core and over the chip and the length of a utilization period.
22. The method as recited in claim 20, wherein the threads are assigned based upon at least one of knowledge of hardware such that devices that need to be kept active are given a higher priority.
23. The system as recited in claim 20, wherein the threads are assigned based upon adjusting single thread (ST) and simultaneous multi-threading (SMT) modes of the cores to compensate for inactive cores.
24. The system as recited in claim 20, further comprising a thread history table which stores measurement information for previous thread assignments, wherein the thread history table includes at least one threshold to differentiate between threads with preferred characteristics and threads with unpreferred characteristics such that assignment priority is given to the threads with preferred characteristics.
25. The system as recited in claim 24, wherein the thread history table stores measurement information for previous thread assignments including one or more of power dissipation, performance, and memory accesses for thread combinations dynamically at run time from existing thread combinations.
US12/164,775 2008-06-30 2008-06-30 Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance Expired - Fee Related US8296773B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/164,775 US8296773B2 (en) 2008-06-30 2008-06-30 Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/164,775 US8296773B2 (en) 2008-06-30 2008-06-30 Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance

Publications (2)

Publication Number Publication Date
US20090328055A1 true US20090328055A1 (en) 2009-12-31
US8296773B2 US8296773B2 (en) 2012-10-23

Family

ID=41449242

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/164,775 Expired - Fee Related US8296773B2 (en) 2008-06-30 2008-06-30 Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance

Country Status (1)

Country Link
US (1) US8296773B2 (en)

Cited By (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100058086A1 (en) * 2008-08-28 2010-03-04 Industry Academic Cooperation Foundation, Hallym University Energy-efficient multi-core processor
US20100153954A1 (en) * 2008-12-11 2010-06-17 Qualcomm Incorporated Apparatus and Methods for Adaptive Thread Scheduling on Asymmetric Multiprocessor
US20100299541A1 (en) * 2009-05-21 2010-11-25 Kabushiki Kaisha Toshiba Multi-core processor system
US20110023033A1 (en) * 2009-07-23 2011-01-27 Gokhan Memik Scheduling of threads by batch scheduling
US20110023038A1 (en) * 2009-07-22 2011-01-27 Gokhan Memik Batch scheduling with segregation
US20110023037A1 (en) * 2009-07-22 2011-01-27 Gokhan Memik Application selection of memory request scheduling
US20110022871A1 (en) * 2009-07-21 2011-01-27 Bouvier Daniel L System-On-Chip Queue Status Power Management
US20110035533A1 (en) * 2009-08-05 2011-02-10 In Hwan Doh System and method for data-processing
WO2012009252A3 (en) * 2010-07-13 2012-03-22 Advanced Micro Devices, Inc. Dynamic enabling and disabling of simd units in a graphics processor
US20120117403A1 (en) * 2010-11-09 2012-05-10 International Business Machines Corporation Power management for processing capacity upgrade on demand
US20120179938A1 (en) * 2011-01-10 2012-07-12 Dell Products L.P. Methods and Systems for Managing Performance and Power Utilization of a Processor Employing a Fully Multithreaded Load Threshold
US20120198207A1 (en) * 2011-12-22 2012-08-02 Varghese George Asymmetric performance multicore architecture with same instruction set architecture
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US20120260252A1 (en) * 2011-04-08 2012-10-11 International Business Machines Corporation Scheduling software thread execution
US20120284729A1 (en) * 2011-05-03 2012-11-08 Microsoft Corporation Processor state-based thread scheduling
EP2528373A1 (en) * 2010-01-18 2012-11-28 Huawei Technologies Co., Ltd. Method, apparatus and system for reducing power consumption of service system
US20130060555A1 (en) * 2011-06-10 2013-03-07 Qualcomm Incorporated System and Apparatus Modeling Processor Workloads Using Virtual Pulse Chains
US20130103670A1 (en) * 2011-10-21 2013-04-25 International Business Machines Corporation Dynamic smt in parallel database systems
US20130124826A1 (en) * 2011-11-11 2013-05-16 International Business Machines Corporation Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives
US20130124890A1 (en) * 2010-07-27 2013-05-16 Michael Priel Multi-core processor and method of power management of a multi-core processor
US20130191613A1 (en) * 2012-01-23 2013-07-25 Canon Kabushiki Kaisha Processor control apparatus and method therefor
WO2013107694A1 (en) * 2012-01-19 2013-07-25 International Business Machines Corporation In situ processor re-characterization
US20130205169A1 (en) * 2012-02-03 2013-08-08 Blaine D. Gaither Multiple processing elements
US20130227326A1 (en) * 2012-02-24 2013-08-29 Samsung Electronics Co., Ltd. Apparatus and method for controlling power of electronic device having multi-core
US20130238912A1 (en) * 2010-11-25 2013-09-12 Michael Priel Method and apparatus for managing power in a multi-core processor
US20130275785A1 (en) * 2012-04-17 2013-10-17 Sony Corporation Memory control apparatus, memory control method, information processing apparatus and program
WO2013158116A1 (en) * 2012-04-20 2013-10-24 Hewlett-Packard Development Company, L.P. Voltage regulator control system
US20130283277A1 (en) * 2007-12-31 2013-10-24 Qiong Cai Thread migration to improve power efficiency in a parallel processing environment
US8589665B2 (en) 2010-05-27 2013-11-19 International Business Machines Corporation Instruction set architecture extensions for performing power versus performance tradeoffs
US20130339750A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Reducing decryption latency for encryption processing
US20130339635A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Reducing read latency using a pool of processing cores
US20130346991A1 (en) * 2012-06-22 2013-12-26 Fujitsu Limited Method of controlling information processing apparatus, and information processing apparatus
US20140006852A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation 3-d stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
US8656408B2 (en) 2010-09-30 2014-02-18 International Business Machines Corporations Scheduling threads in a processor based on instruction type power consumption
US20140059550A1 (en) * 2012-08-24 2014-02-27 Canon Kabushiki Kaisha Information processing apparatus, method of controlling the same, and storage medium
US8677361B2 (en) 2010-09-30 2014-03-18 International Business Machines Corporation Scheduling threads based on an actual power consumption and a predicted new power consumption
US8695008B2 (en) 2011-04-05 2014-04-08 Qualcomm Incorporated Method and system for dynamically controlling power to multiple cores in a multicore processor of a portable computing device
CN103713950A (en) * 2012-10-05 2014-04-09 三星电子株式会社 Computing system including multi-core processor and load balancing method thereof
US20140122910A1 (en) * 2012-10-25 2014-05-01 Inventec Corporation Rack server system and operation method thereof
US8736619B2 (en) 2010-07-20 2014-05-27 Advanced Micro Devices, Inc. Method and system for load optimization for power
US20140157013A1 (en) * 2010-09-09 2014-06-05 International Business Machines Corporation Data center power conversion efficiency management
US20140359633A1 (en) * 2013-06-04 2014-12-04 Advanced Micro Devices, Inc. Thread assignment for power and performance efficiency using multiple power states
US20140359635A1 (en) * 2013-05-31 2014-12-04 International Business Machines Corporation Processing data by using simultaneous multithreading
US20150019805A1 (en) * 2012-10-02 2015-01-15 Canon Kabushiki Kaisha Information processing apparatus, control method for the same, program for the same, and storage medium
US8943252B2 (en) 2012-08-16 2015-01-27 Microsoft Corporation Latency sensitive software interrupt and thread scheduling
US20150033240A1 (en) * 2013-07-23 2015-01-29 Fujitsu Limited Measuring method, a non-transitory computer-readable storage medium, and information processing apparatus
US20150045912A1 (en) * 2012-03-29 2015-02-12 Nec Corporation State control device, control method, and program
EP2851796A1 (en) * 2013-09-24 2015-03-25 Intel Corporation Thread aware power management
US9086883B2 (en) 2011-06-10 2015-07-21 Qualcomm Incorporated System and apparatus for consolidated dynamic frequency/voltage control
US9141436B2 (en) 2011-09-26 2015-09-22 Samsung Electronics Co., Ltd. Apparatus and method for partition scheduling for a processor with cores
US20150278066A1 (en) * 2014-03-25 2015-10-01 Krystallize Technologies, Inc. Cloud computing benchmarking
US20150338902A1 (en) * 2014-05-20 2015-11-26 Qualcomm Incorporated Algorithm For Preferred Core Sequencing To Maximize Performance And Reduce Chip Temperature And Power
WO2015198286A1 (en) 2014-06-26 2015-12-30 Consiglio Nazionale Delle Ricerche Method and system for regulating in real time the clock frequencies of at least one cluster of electronic machines
US20160092363A1 (en) * 2014-09-25 2016-03-31 Intel Corporation Cache-Aware Adaptive Thread Scheduling And Migration
GB2530782A (en) * 2014-10-02 2016-04-06 Ibm Voltage droop reduction in a processor
US9311102B2 (en) 2010-07-13 2016-04-12 Advanced Micro Devices, Inc. Dynamic control of SIMDs
US20160170474A1 (en) * 2013-08-02 2016-06-16 Nec Corporation Power-saving control system, control device, control method, and control program for server equipped with non-volatile memory
WO2016115000A1 (en) * 2015-01-15 2016-07-21 Microsoft Technology Licensing, Llc Hybrid scheduler and power manager
US9400686B2 (en) * 2011-05-10 2016-07-26 International Business Machines Corporation Process grouping for improved cache and memory affinity
US20160224100A1 (en) * 2013-09-09 2016-08-04 Zte Corporation Method and device for processing core of processor , and terminal
US20160266941A1 (en) * 2011-12-15 2016-09-15 Intel Corporation Dynamically Modifying A Power/Performance Tradeoff Based On A Processor Utilization
US20160378471A1 (en) * 2015-06-25 2016-12-29 Intel IP Corporation Instruction and logic for execution context groups for parallel processing
US20160378164A1 (en) * 2015-06-29 2016-12-29 Kyocera Document Solutions Inc. Electronic apparatus and non-transitory computer readable recording medium
US20170017611A1 (en) * 2015-07-13 2017-01-19 Google Inc. Modulating processsor core operations
US20170060644A1 (en) * 2015-08-25 2017-03-02 Konica Minolta, Inc. Image processing apparatus, control task allocation method, and recording medium
WO2017052737A1 (en) * 2015-09-23 2017-03-30 Intel Corporation Task assignment in processor cores based on a statistical power and frequency model
US9880603B1 (en) * 2013-03-13 2018-01-30 Juniper Networks, Inc. Methods and apparatus for clock gating processing modules based on hierarchy and workload
US9904563B2 (en) * 2015-12-18 2018-02-27 Htc Corporation Processor management
US20180067892A1 (en) * 2015-02-27 2018-03-08 Intel Corporation Dynamically updating logical identifiers of cores of a processor
US20190004861A1 (en) * 2017-06-28 2019-01-03 Dell Products L.P. Method to Optimize Core Count for Concurrent Single and Multi-Thread Application Performance
US20190079806A1 (en) * 2018-11-14 2019-03-14 Intel Corporation System, Apparatus And Method For Configurable Control Of Asymmetric Multi-Threading (SMT) On A Per Core Basis
US10241834B2 (en) * 2016-11-29 2019-03-26 International Business Machines Corporation Bandwidth aware resource optimization
US10296067B2 (en) * 2016-04-08 2019-05-21 Qualcomm Incorporated Enhanced dynamic clock and voltage scaling (DCVS) scheme
US10503238B2 (en) * 2016-11-01 2019-12-10 Microsoft Technology Licensing, Llc Thread importance based processor core parking and frequency selection
CN111176419A (en) * 2014-08-22 2020-05-19 英特尔公司 Method and apparatus for estimating power performance of jobs running on multiple nodes of a distributed computer system
US20200225992A1 (en) * 2017-08-23 2020-07-16 Samsung Electronics Co., Ltd. Operating method of operating system and electronic device supporting same
US10719063B2 (en) * 2016-10-06 2020-07-21 Microsoft Technology Licensing, Llc Real-time equipment control
US10817341B1 (en) * 2019-04-10 2020-10-27 EMC IP Holding Company LLC Adaptive tuning of thread weight based on prior activity of a thread
US20210064426A1 (en) * 2019-08-29 2021-03-04 Praveen Kumar GUPTA System, Apparatus And Method For Providing Hardware State Feedback To An Operating System In A Heterogeneous Processor
US11023245B2 (en) * 2018-09-04 2021-06-01 Apple Inc. Serialization floors and deadline driven control for performance optimization of asymmetric multiprocessor systems
US11169845B2 (en) * 2017-12-21 2021-11-09 Ciena Corporation Flow and application based processor scheduling for network functions virtualization applications using flow identification based on logical calculations on frame based fields
US11169586B2 (en) * 2018-06-01 2021-11-09 Samsung Electronics Co., Ltd. Computing device and method of operating the same
US11354768B2 (en) * 2017-04-21 2022-06-07 Intel Corporation Intelligent graphics dispatching mechanism
US11551990B2 (en) * 2017-08-11 2023-01-10 Advanced Micro Devices, Inc. Method and apparatus for providing thermal wear leveling
US11742038B2 (en) * 2017-08-11 2023-08-29 Advanced Micro Devices, Inc. Method and apparatus for providing wear leveling
US11953962B2 (en) * 2022-12-22 2024-04-09 Intel Corporation System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101653204B1 (en) * 2010-03-16 2016-09-01 삼성전자주식회사 System and method of dynamically task managing for data parallel processing on multi-core system
US8631253B2 (en) * 2010-08-17 2014-01-14 Red Hat Israel, Ltd. Manager and host-based integrated power saving policy in virtualization systems
US20120297216A1 (en) * 2011-05-19 2012-11-22 International Business Machines Corporation Dynamically selecting active polling or timed waits
CN106909444B (en) * 2011-12-22 2021-01-12 英特尔公司 Instruction processing apparatus for specifying instructions for application thread performance state and related methods
US9003218B2 (en) 2012-05-21 2015-04-07 International Business Machines Corporation Power shifting in multicore platforms by varying SMT levels
US9471395B2 (en) * 2012-08-23 2016-10-18 Nvidia Corporation Processor cluster migration techniques
US9377841B2 (en) * 2013-05-08 2016-06-28 Intel Corporation Adaptively limiting a maximum operating frequency in a multicore processor
US9696787B2 (en) * 2014-12-10 2017-07-04 Qualcomm Innovation Center, Inc. Dynamic control of processors to reduce thermal and power costs
US10073718B2 (en) 2016-01-15 2018-09-11 Intel Corporation Systems, methods and devices for determining work placement on processor cores
US20170300101A1 (en) * 2016-04-14 2017-10-19 Advanced Micro Devices, Inc. Redirecting messages from idle compute units of a processor
US10922137B2 (en) 2016-04-27 2021-02-16 Hewlett Packard Enterprise Development Lp Dynamic thread mapping
US10540300B2 (en) * 2017-02-16 2020-01-21 Qualcomm Incorporated Optimizing network driver performance and power consumption in multi-core processor-based systems
US10372495B2 (en) 2017-02-17 2019-08-06 Qualcomm Incorporated Circuits and methods providing thread assignment for a multi-core processor
US11010330B2 (en) 2018-03-07 2021-05-18 Microsoft Technology Licensing, Llc Integrated circuit operation adjustment using redundant elements

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814982A (en) * 1984-12-24 1989-03-21 General Electric Company Reconfigurable, multiprocessor system with protected, multiple, memories
US20030115495A1 (en) * 2001-12-13 2003-06-19 International Business Machines Corporation Conserving energy in a data processing system by selectively powering down processors
US20050125556A1 (en) * 2003-12-08 2005-06-09 International Business Machines Corporation Data movement management system and method for a storage area network file system employing the data management application programming interface
US20050188372A1 (en) * 2004-02-20 2005-08-25 Sony Computer Entertainment Inc. Methods and apparatus for processor task migration in a multi-processor system
US20050210472A1 (en) * 2004-03-18 2005-09-22 International Business Machines Corporation Method and data processing system for per-chip thread queuing in a multi-processor system
US20050216222A1 (en) * 2004-03-29 2005-09-29 Sony Computer Entertainment Inc. Methods and apparatus for achieving thermal management using processing task scheduling
US20060095913A1 (en) * 2004-11-03 2006-05-04 Intel Corporation Temperature-based thread scheduling
US20060112391A1 (en) * 2004-11-24 2006-05-25 Hernandez Rafael M Method and apparatus for thread scheduling on multiple processors
US20060190942A1 (en) * 2004-02-20 2006-08-24 Sony Computer Entertainment Inc. Processor task migration over a network in a multi-processor system
US20060212677A1 (en) * 2005-03-15 2006-09-21 Intel Corporation Multicore processor having active and inactive execution cores
US20070074011A1 (en) * 2005-09-28 2007-03-29 Shekhar Borkar Reliable computing with a many-core processor
US20080091974A1 (en) * 2006-10-11 2008-04-17 Denso Corporation Device for controlling a multi-core CPU for mobile body, and operating system for the same
US20080307244A1 (en) * 2007-06-11 2008-12-11 Media Tek, Inc. Method of and Apparatus for Reducing Power Consumption within an Integrated Circuit
US20090164812A1 (en) * 2007-12-19 2009-06-25 Capps Jr Louis B Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics
US20090249094A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Power-aware thread scheduling and dynamic use of processors
US7730365B1 (en) * 2007-04-30 2010-06-01 Hewlett-Packard Development Company, L.P. Workload management for maintaining redundancy of non-data computer components

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08305761A (en) 1995-05-10 1996-11-22 Canon Inc Device for monitoring production plan
JP4606142B2 (en) 2004-12-01 2011-01-05 株式会社ソニー・コンピュータエンタテインメント Scheduling method, scheduling apparatus, and multiprocessor system

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814982A (en) * 1984-12-24 1989-03-21 General Electric Company Reconfigurable, multiprocessor system with protected, multiple, memories
US20030115495A1 (en) * 2001-12-13 2003-06-19 International Business Machines Corporation Conserving energy in a data processing system by selectively powering down processors
US20050125556A1 (en) * 2003-12-08 2005-06-09 International Business Machines Corporation Data movement management system and method for a storage area network file system employing the data management application programming interface
US20060190942A1 (en) * 2004-02-20 2006-08-24 Sony Computer Entertainment Inc. Processor task migration over a network in a multi-processor system
US20050188372A1 (en) * 2004-02-20 2005-08-25 Sony Computer Entertainment Inc. Methods and apparatus for processor task migration in a multi-processor system
US20050210472A1 (en) * 2004-03-18 2005-09-22 International Business Machines Corporation Method and data processing system for per-chip thread queuing in a multi-processor system
US20050216222A1 (en) * 2004-03-29 2005-09-29 Sony Computer Entertainment Inc. Methods and apparatus for achieving thermal management using processing task scheduling
US20060095913A1 (en) * 2004-11-03 2006-05-04 Intel Corporation Temperature-based thread scheduling
US20060112391A1 (en) * 2004-11-24 2006-05-25 Hernandez Rafael M Method and apparatus for thread scheduling on multiple processors
US20060212677A1 (en) * 2005-03-15 2006-09-21 Intel Corporation Multicore processor having active and inactive execution cores
US20070074011A1 (en) * 2005-09-28 2007-03-29 Shekhar Borkar Reliable computing with a many-core processor
US20080091974A1 (en) * 2006-10-11 2008-04-17 Denso Corporation Device for controlling a multi-core CPU for mobile body, and operating system for the same
US7730365B1 (en) * 2007-04-30 2010-06-01 Hewlett-Packard Development Company, L.P. Workload management for maintaining redundancy of non-data computer components
US20080307244A1 (en) * 2007-06-11 2008-12-11 Media Tek, Inc. Method of and Apparatus for Reducing Power Consumption within an Integrated Circuit
US20090164812A1 (en) * 2007-12-19 2009-06-25 Capps Jr Louis B Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics
US20090249094A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Power-aware thread scheduling and dynamic use of processors

Cited By (168)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8806491B2 (en) * 2007-12-31 2014-08-12 Intel Corporation Thread migration to improve power efficiency in a parallel processing environment
US20130283277A1 (en) * 2007-12-31 2013-10-24 Qiong Cai Thread migration to improve power efficiency in a parallel processing environment
US20100058086A1 (en) * 2008-08-28 2010-03-04 Industry Academic Cooperation Foundation, Hallym University Energy-efficient multi-core processor
JP2012511788A (en) * 2008-12-11 2012-05-24 クアルコム,インコーポレイテッド Apparatus and method for adaptive thread scheduling for asymmetric multiprocessors
US20100153954A1 (en) * 2008-12-11 2010-06-17 Qualcomm Incorporated Apparatus and Methods for Adaptive Thread Scheduling on Asymmetric Multiprocessor
US9043795B2 (en) * 2008-12-11 2015-05-26 Qualcomm Incorporated Apparatus and methods for adaptive thread scheduling on asymmetric multiprocessor
JP2015158938A (en) * 2008-12-11 2015-09-03 クアルコム,インコーポレイテッド Apparatus and methods for adaptive thread scheduling on asymmetric multiprocessor
US20100299541A1 (en) * 2009-05-21 2010-11-25 Kabushiki Kaisha Toshiba Multi-core processor system
US8214679B2 (en) * 2009-05-21 2012-07-03 Kabushiki Kaisha Toshiba Multi-core processor system with thread queue based power management
US20110022871A1 (en) * 2009-07-21 2011-01-27 Bouvier Daniel L System-On-Chip Queue Status Power Management
US8639862B2 (en) * 2009-07-21 2014-01-28 Applied Micro Circuits Corporation System-on-chip queue status power management
US8607234B2 (en) * 2009-07-22 2013-12-10 Empire Technology Development, Llc Batch scheduling with thread segregation and per thread type marking caps
US8799912B2 (en) 2009-07-22 2014-08-05 Empire Technology Development Llc Application selection of memory request scheduling
US20110023037A1 (en) * 2009-07-22 2011-01-27 Gokhan Memik Application selection of memory request scheduling
US20110023038A1 (en) * 2009-07-22 2011-01-27 Gokhan Memik Batch scheduling with segregation
US8839255B2 (en) 2009-07-23 2014-09-16 Empire Technology Development Llc Scheduling of threads by batch scheduling
US20110023033A1 (en) * 2009-07-23 2011-01-27 Gokhan Memik Scheduling of threads by batch scheduling
US20110035533A1 (en) * 2009-08-05 2011-02-10 In Hwan Doh System and method for data-processing
EP2528373A1 (en) * 2010-01-18 2012-11-28 Huawei Technologies Co., Ltd. Method, apparatus and system for reducing power consumption of service system
EP2528373B1 (en) * 2010-01-18 2014-12-03 Huawei Technologies Co., Ltd. Method, apparatus and system for reducing power consumption of service system
US8589665B2 (en) 2010-05-27 2013-11-19 International Business Machines Corporation Instruction set architecture extensions for performing power versus performance tradeoffs
US9311102B2 (en) 2010-07-13 2016-04-12 Advanced Micro Devices, Inc. Dynamic control of SIMDs
CN103080899A (en) * 2010-07-13 2013-05-01 超威半导体公司 Dynamic enabling and disabling of SIMD units in a graphics processor
WO2012009252A3 (en) * 2010-07-13 2012-03-22 Advanced Micro Devices, Inc. Dynamic enabling and disabling of simd units in a graphics processor
US8736619B2 (en) 2010-07-20 2014-05-27 Advanced Micro Devices, Inc. Method and system for load optimization for power
US20130124890A1 (en) * 2010-07-27 2013-05-16 Michael Priel Multi-core processor and method of power management of a multi-core processor
US9823680B2 (en) * 2010-09-09 2017-11-21 International Business Machines Corporation Data center power conversion efficiency management
US10067523B2 (en) 2010-09-09 2018-09-04 International Business Machines Corporation Data center power conversion efficiency management
US20140157013A1 (en) * 2010-09-09 2014-06-05 International Business Machines Corporation Data center power conversion efficiency management
US9459918B2 (en) 2010-09-30 2016-10-04 International Business Machines Corporation Scheduling threads
US8656408B2 (en) 2010-09-30 2014-02-18 International Business Machines Corporations Scheduling threads in a processor based on instruction type power consumption
US8677361B2 (en) 2010-09-30 2014-03-18 International Business Machines Corporation Scheduling threads based on an actual power consumption and a predicted new power consumption
US20120117403A1 (en) * 2010-11-09 2012-05-10 International Business Machines Corporation Power management for processing capacity upgrade on demand
US8627128B2 (en) * 2010-11-09 2014-01-07 International Business Machines Corporation Power management for processing capacity upgrade on demand
US9146608B2 (en) 2010-11-09 2015-09-29 International Business Machines Corporation Power management for processing capacity upgrade on demand
US20130238912A1 (en) * 2010-11-25 2013-09-12 Michael Priel Method and apparatus for managing power in a multi-core processor
US9335805B2 (en) * 2010-11-25 2016-05-10 Freescale Semiconductor, Inc. Method and apparatus for managing power in a multi-core processor
US20120179938A1 (en) * 2011-01-10 2012-07-12 Dell Products L.P. Methods and Systems for Managing Performance and Power Utilization of a Processor Employing a Fully Multithreaded Load Threshold
US8812825B2 (en) * 2011-01-10 2014-08-19 Dell Products L.P. Methods and systems for managing performance and power utilization of a processor employing a fully multithreaded load threshold
US9207745B2 (en) 2011-01-10 2015-12-08 Dell Products L.P. Methods and systems for managing performance and power utilization of a processor employing a fully-multithreaded load threshold
US9760153B2 (en) 2011-01-10 2017-09-12 Dell Products L.P. Methods and systems for managing performance and power utilization of a processor employing a fully-multithreaded load threshold
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US8695008B2 (en) 2011-04-05 2014-04-08 Qualcomm Incorporated Method and system for dynamically controlling power to multiple cores in a multicore processor of a portable computing device
US20120260252A1 (en) * 2011-04-08 2012-10-11 International Business Machines Corporation Scheduling software thread execution
US20120284729A1 (en) * 2011-05-03 2012-11-08 Microsoft Corporation Processor state-based thread scheduling
US20160328266A1 (en) * 2011-05-10 2016-11-10 International Business Machines Corporation Process grouping for improved cache and memory affinity
US9965324B2 (en) * 2011-05-10 2018-05-08 International Business Machines Corporation Process grouping for improved cache and memory affinity
US9400686B2 (en) * 2011-05-10 2016-07-26 International Business Machines Corporation Process grouping for improved cache and memory affinity
US20130060555A1 (en) * 2011-06-10 2013-03-07 Qualcomm Incorporated System and Apparatus Modeling Processor Workloads Using Virtual Pulse Chains
WO2012170214A3 (en) * 2011-06-10 2013-05-23 Qualcomm Incorporated System and apparatus for modeling processor workloads using virtual pulse chains
US9086883B2 (en) 2011-06-10 2015-07-21 Qualcomm Incorporated System and apparatus for consolidated dynamic frequency/voltage control
US9141436B2 (en) 2011-09-26 2015-09-22 Samsung Electronics Co., Ltd. Apparatus and method for partition scheduling for a processor with cores
US9208197B2 (en) * 2011-10-21 2015-12-08 International Business Machines Corporation Dynamic SMT in parallel database systems
US20130103670A1 (en) * 2011-10-21 2013-04-25 International Business Machines Corporation Dynamic smt in parallel database systems
US20130124826A1 (en) * 2011-11-11 2013-05-16 International Business Machines Corporation Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives
US8898435B2 (en) * 2011-11-11 2014-11-25 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Optimizing system throughput by automatically altering thread co-execution based on operating system directives
US8898434B2 (en) * 2011-11-11 2014-11-25 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Optimizing system throughput by automatically altering thread co-execution based on operating system directives
US9760409B2 (en) * 2011-12-15 2017-09-12 Intel Corporation Dynamically modifying a power/performance tradeoff based on a processor utilization
US20160266941A1 (en) * 2011-12-15 2016-09-15 Intel Corporation Dynamically Modifying A Power/Performance Tradeoff Based On A Processor Utilization
US10740281B2 (en) 2011-12-22 2020-08-11 Intel Corporation Asymmetric performance multicore architecture with same instruction set architecture
US20120198207A1 (en) * 2011-12-22 2012-08-02 Varghese George Asymmetric performance multicore architecture with same instruction set architecture
US9569278B2 (en) * 2011-12-22 2017-02-14 Intel Corporation Asymmetric performance multicore architecture with same instruction set architecture
US10049080B2 (en) 2011-12-22 2018-08-14 Intel Corporation Asymmetric performance multicore architecture with same instruction set architecture
GB2512529A (en) * 2012-01-19 2014-10-01 Ibm In situ processor re-characterization
GB2512529B (en) * 2012-01-19 2018-06-06 Ibm In situ processor re-characterization
US9152518B2 (en) 2012-01-19 2015-10-06 International Business Machines Corporation In situ processor re-characterization
US9176837B2 (en) 2012-01-19 2015-11-03 International Business Machines Corporation In situ processor re-characterization
WO2013107694A1 (en) * 2012-01-19 2013-07-25 International Business Machines Corporation In situ processor re-characterization
CN104067234A (en) * 2012-01-19 2014-09-24 国际商业机器公司 In situ processor re-characterization
US20130191613A1 (en) * 2012-01-23 2013-07-25 Canon Kabushiki Kaisha Processor control apparatus and method therefor
US20130205169A1 (en) * 2012-02-03 2013-08-08 Blaine D. Gaither Multiple processing elements
US8782466B2 (en) * 2012-02-03 2014-07-15 Hewlett-Packard Development Company, L.P. Multiple processing elements
US9477296B2 (en) * 2012-02-24 2016-10-25 Samsung Electronics Co., Ltd. Apparatus and method for controlling power of electronic device having multi-core
KR101930752B1 (en) * 2012-02-24 2018-12-19 삼성전자 주식회사 Apparatus and method for controlling power of electronic device having multi-core
US20130227326A1 (en) * 2012-02-24 2013-08-29 Samsung Electronics Co., Ltd. Apparatus and method for controlling power of electronic device having multi-core
US20150045912A1 (en) * 2012-03-29 2015-02-12 Nec Corporation State control device, control method, and program
US9910412B2 (en) * 2012-03-29 2018-03-06 Nec Corporation State control device, control method, and program
US20130275785A1 (en) * 2012-04-17 2013-10-17 Sony Corporation Memory control apparatus, memory control method, information processing apparatus and program
US9703361B2 (en) * 2012-04-17 2017-07-11 Sony Corporation Memory control apparatus, memory control method, information processing apparatus and program
US9851768B2 (en) 2012-04-20 2017-12-26 Hewlett Packard Enterprise Development Lp Voltage regulator control system
WO2013158116A1 (en) * 2012-04-20 2013-10-24 Hewlett-Packard Development Company, L.P. Voltage regulator control system
US9933951B2 (en) * 2012-06-14 2018-04-03 International Business Machines Corporation Reducing read latency using a pool of processing cores
US10210338B2 (en) 2012-06-14 2019-02-19 International Business Machines Corporation Reducing decryption latency for encryption processing
US8930633B2 (en) * 2012-06-14 2015-01-06 International Business Machines Corporation Reducing read latency using a pool of processing cores
US20130339635A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Reducing read latency using a pool of processing cores
US20130339750A1 (en) * 2012-06-14 2013-12-19 International Business Machines Corporation Reducing decryption latency for encryption processing
US20160139825A1 (en) * 2012-06-14 2016-05-19 International Business Machines Corporation Reducing read latency using a pool of processing cores
US9262080B2 (en) 2012-06-14 2016-02-16 International Business Machines Corporation Reducing read latency using a pool of processing cores
US20140250305A1 (en) * 2012-06-14 2014-09-04 International Business Machines Corporation Reducing decryption latency for encryption processing
US8726039B2 (en) * 2012-06-14 2014-05-13 International Business Machines Corporation Reducing decryption latency for encryption processing
US9864863B2 (en) * 2012-06-14 2018-01-09 International Business Machines Corporation Reducing decryption latency for encryption processing
US20130346991A1 (en) * 2012-06-22 2013-12-26 Fujitsu Limited Method of controlling information processing apparatus, and information processing apparatus
US8826073B2 (en) * 2012-06-28 2014-09-02 International Business Machines Corporation 3-D stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
US20140006750A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation 3-d stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
US8799710B2 (en) * 2012-06-28 2014-08-05 International Business Machines Corporation 3-D stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
US20140006852A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation 3-d stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
US8943252B2 (en) 2012-08-16 2015-01-27 Microsoft Corporation Latency sensitive software interrupt and thread scheduling
US20140059550A1 (en) * 2012-08-24 2014-02-27 Canon Kabushiki Kaisha Information processing apparatus, method of controlling the same, and storage medium
US9110707B2 (en) * 2012-08-24 2015-08-18 Canon Kabushiki Kaisha Assigning wideio memories to functions based on memory access and acquired temperature information
US9576638B2 (en) * 2012-10-02 2017-02-21 Canon Kabushiki Kaisha Information processing apparatus, control method for the same, program for the same, and storage medium
US20150019805A1 (en) * 2012-10-02 2015-01-15 Canon Kabushiki Kaisha Information processing apparatus, control method for the same, program for the same, and storage medium
US20140101670A1 (en) * 2012-10-05 2014-04-10 Samsung Electronics Co., Ltd. Computing system including multi-core processor and load balancing method thereof
CN103713950A (en) * 2012-10-05 2014-04-09 三星电子株式会社 Computing system including multi-core processor and load balancing method thereof
US20140122910A1 (en) * 2012-10-25 2014-05-01 Inventec Corporation Rack server system and operation method thereof
US9880603B1 (en) * 2013-03-13 2018-01-30 Juniper Networks, Inc. Methods and apparatus for clock gating processing modules based on hierarchy and workload
US10571988B1 (en) 2013-03-13 2020-02-25 Juniper Networks, Inc. Methods and apparatus for clock gating processing modules based on hierarchy and workload
US10083066B2 (en) * 2013-05-31 2018-09-25 International Business Machines Corporation Processing data by using simultaneous multithreading
US20140359635A1 (en) * 2013-05-31 2014-12-04 International Business Machines Corporation Processing data by using simultaneous multithreading
US9170854B2 (en) * 2013-06-04 2015-10-27 Advanced Micro Devices, Inc. Thread assignment for power and performance efficiency using multiple power states
US20140359633A1 (en) * 2013-06-04 2014-12-04 Advanced Micro Devices, Inc. Thread assignment for power and performance efficiency using multiple power states
US9513685B2 (en) * 2013-07-23 2016-12-06 Fujitsu Limited Measuring method of a processing load of a processor including a plurality of cores
US20150033240A1 (en) * 2013-07-23 2015-01-29 Fujitsu Limited Measuring method, a non-transitory computer-readable storage medium, and information processing apparatus
US20160170474A1 (en) * 2013-08-02 2016-06-16 Nec Corporation Power-saving control system, control device, control method, and control program for server equipped with non-volatile memory
US10031572B2 (en) * 2013-09-09 2018-07-24 Zte Corporation Method and device for processing core of processor, and terminal
JP2016531371A (en) * 2013-09-09 2016-10-06 ゼットティーイー コーポレーションZte Corporation Processor core processing method, apparatus, and terminal
US20160224100A1 (en) * 2013-09-09 2016-08-04 Zte Corporation Method and device for processing core of processor , and terminal
US10386900B2 (en) * 2013-09-24 2019-08-20 Intel Corporation Thread aware power management
US20150089249A1 (en) * 2013-09-24 2015-03-26 William R. Hannon Thread aware power management
EP2851796A1 (en) * 2013-09-24 2015-03-25 Intel Corporation Thread aware power management
US9996442B2 (en) * 2014-03-25 2018-06-12 Krystallize Technologies, Inc. Cloud computing benchmarking
US20150278066A1 (en) * 2014-03-25 2015-10-01 Krystallize Technologies, Inc. Cloud computing benchmarking
US9557797B2 (en) * 2014-05-20 2017-01-31 Qualcomm Incorporated Algorithm for preferred core sequencing to maximize performance and reduce chip temperature and power
US20150338902A1 (en) * 2014-05-20 2015-11-26 Qualcomm Incorporated Algorithm For Preferred Core Sequencing To Maximize Performance And Reduce Chip Temperature And Power
WO2015198286A1 (en) 2014-06-26 2015-12-30 Consiglio Nazionale Delle Ricerche Method and system for regulating in real time the clock frequencies of at least one cluster of electronic machines
CN111176419A (en) * 2014-08-22 2020-05-19 英特尔公司 Method and apparatus for estimating power performance of jobs running on multiple nodes of a distributed computer system
US20160092363A1 (en) * 2014-09-25 2016-03-31 Intel Corporation Cache-Aware Adaptive Thread Scheduling And Migration
US10339023B2 (en) * 2014-09-25 2019-07-02 Intel Corporation Cache-aware adaptive thread scheduling and migration
US9575529B2 (en) 2014-10-02 2017-02-21 International Business Machines Corporation Voltage droop reduction in a processor
GB2530782A (en) * 2014-10-02 2016-04-06 Ibm Voltage droop reduction in a processor
WO2016115000A1 (en) * 2015-01-15 2016-07-21 Microsoft Technology Licensing, Llc Hybrid scheduler and power manager
US20180067892A1 (en) * 2015-02-27 2018-03-08 Intel Corporation Dynamically updating logical identifiers of cores of a processor
US11567896B2 (en) * 2015-02-27 2023-01-31 Intel Corporation Dynamically updating logical identifiers of cores of a processor
US10706004B2 (en) * 2015-02-27 2020-07-07 Intel Corporation Dynamically updating logical identifiers of cores of a processor
US20160378471A1 (en) * 2015-06-25 2016-12-29 Intel IP Corporation Instruction and logic for execution context groups for parallel processing
US10108244B2 (en) * 2015-06-29 2018-10-23 Kyocera Document Solutions Inc. Electronic apparatus and non-transitory computer readable medium for power consumption control of processors
US20160378164A1 (en) * 2015-06-29 2016-12-29 Kyocera Document Solutions Inc. Electronic apparatus and non-transitory computer readable recording medium
US9779058B2 (en) * 2015-07-13 2017-10-03 Google Inc. Modulating processsor core operations
US20170017611A1 (en) * 2015-07-13 2017-01-19 Google Inc. Modulating processsor core operations
US20170060644A1 (en) * 2015-08-25 2017-03-02 Konica Minolta, Inc. Image processing apparatus, control task allocation method, and recording medium
WO2017052737A1 (en) * 2015-09-23 2017-03-30 Intel Corporation Task assignment in processor cores based on a statistical power and frequency model
US9811389B2 (en) 2015-09-23 2017-11-07 Intel Corporation Task assignment for processor cores based on a statistical power and frequency model
US9904563B2 (en) * 2015-12-18 2018-02-27 Htc Corporation Processor management
US10296067B2 (en) * 2016-04-08 2019-05-21 Qualcomm Incorporated Enhanced dynamic clock and voltage scaling (DCVS) scheme
US10719063B2 (en) * 2016-10-06 2020-07-21 Microsoft Technology Licensing, Llc Real-time equipment control
US10503238B2 (en) * 2016-11-01 2019-12-10 Microsoft Technology Licensing, Llc Thread importance based processor core parking and frequency selection
US10241834B2 (en) * 2016-11-29 2019-03-26 International Business Machines Corporation Bandwidth aware resource optimization
US10929184B2 (en) 2016-11-29 2021-02-23 International Business Machines Corporation Bandwidth aware resource optimization
US11354768B2 (en) * 2017-04-21 2022-06-07 Intel Corporation Intelligent graphics dispatching mechanism
US10564702B2 (en) * 2017-06-28 2020-02-18 Dell Products L.P. Method to optimize core count for concurrent single and multi-thread application performance
US20190004861A1 (en) * 2017-06-28 2019-01-03 Dell Products L.P. Method to Optimize Core Count for Concurrent Single and Multi-Thread Application Performance
US11742038B2 (en) * 2017-08-11 2023-08-29 Advanced Micro Devices, Inc. Method and apparatus for providing wear leveling
US11551990B2 (en) * 2017-08-11 2023-01-10 Advanced Micro Devices, Inc. Method and apparatus for providing thermal wear leveling
US20200225992A1 (en) * 2017-08-23 2020-07-16 Samsung Electronics Co., Ltd. Operating method of operating system and electronic device supporting same
US11169845B2 (en) * 2017-12-21 2021-11-09 Ciena Corporation Flow and application based processor scheduling for network functions virtualization applications using flow identification based on logical calculations on frame based fields
US11169586B2 (en) * 2018-06-01 2021-11-09 Samsung Electronics Co., Ltd. Computing device and method of operating the same
US11507381B2 (en) * 2018-09-04 2022-11-22 Apple Inc. Serialization floors and deadline driven control for performance optimization of asymmetric multiprocessor systems
US11023245B2 (en) * 2018-09-04 2021-06-01 Apple Inc. Serialization floors and deadline driven control for performance optimization of asymmetric multiprocessor systems
US11119788B2 (en) 2018-09-04 2021-09-14 Apple Inc. Serialization floors and deadline driven control for performance optimization of asymmetric multiprocessor systems
US11494193B2 (en) 2018-09-04 2022-11-08 Apple Inc. Serialization floors and deadline driven control for performance optimization of asymmetric multiprocessor systems
US20210247985A1 (en) * 2018-09-04 2021-08-12 Apple Inc. Serialization Floors and Deadline Driven Control for Performance Optimization of Asymmetric Multiprocessor Systems
US20190079806A1 (en) * 2018-11-14 2019-03-14 Intel Corporation System, Apparatus And Method For Configurable Control Of Asymmetric Multi-Threading (SMT) On A Per Core Basis
US11579944B2 (en) * 2018-11-14 2023-02-14 Intel Corporation System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis
US20230131521A1 (en) * 2018-11-14 2023-04-27 Intel Corporation System, Apparatus And Method For Configurable Control Of Asymmetric Multi-Threading (SMT) On A Per Core Basis
WO2020101836A1 (en) 2018-11-14 2020-05-22 Intel Corporation System, apparatus and method for configurable control of asymmetric multi-threading (smt) on a per core basis
US10817341B1 (en) * 2019-04-10 2020-10-27 EMC IP Holding Company LLC Adaptive tuning of thread weight based on prior activity of a thread
US20210064426A1 (en) * 2019-08-29 2021-03-04 Praveen Kumar GUPTA System, Apparatus And Method For Providing Hardware State Feedback To An Operating System In A Heterogeneous Processor
US11698812B2 (en) * 2019-08-29 2023-07-11 Intel Corporation System, apparatus and method for providing hardware state feedback to an operating system in a heterogeneous processor
US11953962B2 (en) * 2022-12-22 2024-04-09 Intel Corporation System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis

Also Published As

Publication number Publication date
US8296773B2 (en) 2012-10-23

Similar Documents

Publication Publication Date Title
US8296773B2 (en) Systems and methods for thread assignment and core turn-off for integrated circuit energy efficiency and high-performance
Zhu et al. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems
US9977699B2 (en) Energy efficient multi-cluster system and its operations
Chaudhry et al. Thermal-aware scheduling in green data centers
US8447994B2 (en) Altering performance of computational units heterogeneously according to performance sensitivity
US7921313B2 (en) Scheduling processor voltages and frequencies based on performance prediction and power constraints
US8683476B2 (en) Method and system for event-based management of hardware resources using a power state of the hardware resources
Sridharan et al. Holistic run-time parallelism management for time and energy efficiency
US20110022857A1 (en) Throttling computational units according to performance sensitivity
EP3237998B1 (en) Systems and methods for dynamic temporal power steering
JP5345990B2 (en) Method and computer for processing a specific process in a short time
JP2009140157A (en) Virtual computer system and control method for virtual computer and program
WO2011011673A1 (en) Throttling computational units according to performance sensitivity
US20110022356A1 (en) Determining performance sensitivities of computational units
Terzopoulos et al. Power-aware bag-of-tasks scheduling on heterogeneous platforms
Zou et al. Clip: Cluster-level intelligent power coordination for power-bounded systems
Sharifi et al. Courteous cache sharing: Being nice to others in capacity management
CN107636563B (en) Method and system for power reduction by empting a subset of CPUs and memory
March et al. A new energy-aware dynamic task set partitioning algorithm for soft and hard embedded real-time systems
Kim et al. Using DVFS and task scheduling algorithms for a hard real-time heterogeneous multicore processor environment
Zou et al. Contention aware workload and resource co-scheduling on power-bounded systems
Li et al. System-level, thermal-aware, fully-loaded process scheduling
Alsbatin et al. Efficient virtual machine placement algorithms for consolidation in cloud data centers
Qouneh et al. Optimization of resource allocation and energy efficiency in heterogeneous cloud data centers
Cioara et al. A dynamic power management controller for optimizing servers’ energy consumption in service centers

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOSE, PRADIP;BUYUKTOSUNOGLU, ALPER;KURSUN, EREN;REEL/FRAME:021171/0943

Effective date: 20080618

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20201023