« PrécédentContinuer »
FUNCTION UNIT ALLOCATION IN
RELATED APPLICATION DATA
This patent application is related to the following co-pending U.S. Patent applications, commonly assigned and filed concurrently with this application:
U.S. patent application Ser. No. 09/378,298, entitled PROGRAMMATIC SYNTHESIS OF PROCESSOR 10 ELEMENT ARRAYS, by Robert S. Schreiber, Bantwal Ramakrishna Rau, Shail Aditya Gupta, Vinod Kumar Kathail, and Sadun Anik; U.S. patent application Ser. No. 09/378,295 entitled INTERCONNECT MINIMIZATION IN PROCES- is SOR DESIGN, by Robert S. Schreiber. These patent applications are hereby incorporated by reference.
FIELD OF THE INVENTIONS 20
These inventions relate to processor design, and more specifically to allocation of function units in processor design, such as for systolic processors and application specific integrated processors (ASIPs).
Processor design is a very time intensive and expensive process. For new and unique processor designs, no automated design techniques exist for selecting and designing 30 the mix of processor components that would be incorporated into the final processor design. While there exist algorithms incorporated into software packages that can help in designing new processors, such software packages do not give a result which is a final design, let alone an optimal design. 35 Typically, those software packages provide approximate solutions to a design problem, typically leading to additional design effort and over-design to account for the lack of precision in those software packages. Additionally, the design process may start entirely from scratch, which would 40 result in substantial time being consumed analyzing possible design configurations before designing the details of the processor. On the other hand, designing a new processor using preexisting designs necessarily incorporates the design benefits and flaws of the preexisting design, which 45 may or may not be acceptable or optimal for the new design.
All conventional processor design software packages are heuristic in nature. In other words, they rely on design criteria and/or methods that in the past have proven more effective than other criteria or methods. However, in order to 50 apply to more than one processor design or design methodology, such design criteria and methods must be sufficiently general to provide predictable results. Therefore, such heuristic software packages provide relatively highlevel solutions without a complete contribution to details of 55 the design. Additionally, heuristic software packages necessarily lead to significant trial and error in an attempt to optimize the processor design. Consequently, design of new processors is time intensive and expensive.
Processors are often designed to incorporate pipelined 60 data paths to speed processing throughput, reduce initiation intervals and to optimize use of the various function units, such as adders, multipliers, comparators, dividers and the like. These data paths are formed from an interconnected assembly of function units and register files. The function 65 units and register files may be interconnected by busses. Because these data paths may include a large number of
function units, register files and bus segments, the job of selecting the function units, register files and bus segments is very difficult, to say the least. The task is made more difficult if one desires to find the optimal configuration of these units having the lowest or a low cost relative to other configurations. Because of the huge number of possible configurations, it is difficult if not impossible to find optimal configurations.
Pipelined data paths are particularly useful in processing iterative instructions, such as those found in instruction loops, and especially nested instruction loops. When considering a subset of situations where the instruction loops are known, such as those used with embedded processors, the task of designing the optimal, low-cost processor still exists because of the large number of different function units, register configurations and bus configurations that are possible. Heuristic software design solutions used for designing processors are not suitable for finding solutions to such multi-dimensional problems. Because there are so many variables to consider, it is too difficult to optimize all of variables to arrive at a suitable solution without great expenditure of time and effort.
The present inventions provide methods and apparatus for more easily and efficiently producing computing systems, for example those incorporating processor arrays having processor elements with function or execution units, register files, busses, and the like. These methods and apparatus reduce the time required for designing these processors, and reduce the amount of trial and error used in processor design. They also find a better combination of hardware components, as tested by one or more quantifiable parameters, than those developed using heuristic methods, leading to overall superior results. The methods and apparatus may provide solutions to function unit allocation problems, or at least a starting point for allocating function units in processor design. They also eliminate the costs associated with starting the design of a new processor from scratch, which often may be necessary in the design of embedded processors, and they allow the more time intensive design process to start later in the conventional processor design flow.
These and other aspects of the present inventions are provided by methods and apparatus for selecting operation devices or hardware components for a processor, such as for an embedded processor having pipelined data paths. The hardware components could be function units, registers, busses, or other items that could be incorporated into a processor. The apparatus can take any number of forms, including computers and other processors, such as mainframes, workstations, and the like, as well as apparatus containing instructions or data for use in controlling such processors, such as disk drives, removable storage media, and temporary storage. In one aspect of the present inventions, the process includes identifying a set of hardware components, such as function units, and a plurality of characteristics for those hardware components. A first set of characteristics may include a repertoire, such as the ability to add, subtract, multiply, and the like, and a second set of characteristics for the hardware components may include the number of cycles used for a given operation for the particular hardware component (data interval, i.e. the number of time slots or cycles in each unit of type i for an operation of type j), cost and the like. A plurality of these characteristics of the hardware components are incorporated into an algorithm, which is then solved for one or more desired parameters, such as type and number of hardware components.
In one preferred embodiment, the algorithm is an integer linear program which is loaded with a list of the set of function units available for incorporation into an embedded processor, the cost of each of the function units, the data intervals associated with the function units, as well as any 5 other necessary data associated with the function units. The integer linear program is also preferably given one or more constraints, such as the number of each of the function units must be an integer. Other constraints include the requirement that the sum over all of the function units of the number of cycles carried out in a function unit for its operations divided by the data intervals is at least the number of operations of a given type. The integer linear program may also be given bounds such as a maximum cost for the result, or other constraints based on input from a user or designer. Integer linear programs can quickly and efficiently provide 15 a desired solution or an approximation to a solution. Where the result is a desired solution, the result can be used to design a complete processor and provide sufficient information to produce a hardware description expressed in a hardware design language such as VHDL. Where the result 20 is an approximation to a solution, the designer can use that result to more quickly design a processor according to the defined design criteria for the processor.
Integer linear programs are particularly useful for providing solutions to multi-dimensional problems, and are par- 25 ticularly appropriate for allocating function units in processor elements of synchronous processor arrays. The possible combinations of function units are so numerous that optimal selection of function units for a particular processor design may be effectively impossible. This is especially the case 3Q with multi-function function units, where because of a particular combination of operations a multi-function function unit would serve better than a number of discrete function units carrying out the same operations. Algorithms used to solve an integer linear program are well known and produce reliable results.
In a further preferred form of the invention, the integer linear program is configured to minimize cost of the function units while still ensuring that all operations for a given set of instructions within a loop are included. Using an integer linear program to minimize cost is especially attractive for 40 allocating function units in an embedded processor using data pipelining. Embedded processors will be carrying out predefined operations, many of which will be repetitive loop operations. Consequently, there is not as much flexibility necessary in the function units to accommodate a variety of 45 different operations. Thus, the focus in solving the integer linear program may be on minimizing the cost of the allocation of function units knowing that a number of different function unit combinations are available for carrying out any given set of operations. While parameters other 5Q than cost can be optimized, minimizing cost for embedded processors is particularly attractive in view of the expected proliferation of embedded processors in equipment, appliances and other apparatus.
In another form of the inventions, an upper bound on the cost may be input to the integer linear program. Such additional input or constraint effectively limits the number of possible solutions to the integer linear program. If a given constraint would be violated by a particular set of combinations, the integer linear program can easily eliminate such combinations without otherwise testing whether or 60 not the particular set of combinations meet all other criteria. Other constraints can also be imposed on the integer linear program based on heuristics known to the designer.
BRIEF DESCRIPTION OF THE DRAWINGS 65
FIG. 1 is a schematic and assembly drawing of an apparatus for selecting hardware components for a processor
such as an embedded systolic processor, including apparatus for receiving input to and delivering results from the selection process.
FIG. 2 is a schematic diagram of a processor such as a pipelined processor that may be designed in accordance with the apparatus and methods of the present inventions.
FIG. 3 is a schematic of a sample software routine segment that may be executed in a processor designed using the designs and methods of the present inventions.
FIG. 4 is a schematic of a machine instruction level sample routine derived from the software routine of FIG. 3.
FIG. 5 is a flow chart depicting a process for selecting hardware components for a processor in accordance with one aspect of the present inventions.
FIG. 6 is a detailed flow chart depicting one process for selecting hardware components for a processor using an integer linear program.
FIG. 7 is a flow chart depicting a process for evaluating the results derived from an integer linear program to see if those results represent a complete solution to a function unit allocation problem.
The inventions, some of which are summarized above, and defined by the enumerated claims may be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawings. This detailed description of a particular preferred embodiment, set out below to enable one to build and use one particular implementation of the invention, is not intended to limit the enumerated claims, but to serve as a particular example thereof. The particular examples set out below are the preferred specific implementations of the function unit allocation system that can be used for a number of applications and implemented in a number of ways. The inventions, however, may also be applied to other systems as well.
In accordance with several aspects of the present inventions, apparatus and methods are disclosed for selecting hardware components for a processor such as an embedded processor which decreases the time and effort required for processor design. The methods and apparatus also reduce the amount of trial and error used during processor design, and produces a more predictable and definitive result than generic heuristic methods. The methods and apparatus also provide a starting point for final allocation of function units in embedded processors, especially those using pipelined data flow, and may even produce acceptable designs without any need for additional design work for allocating function units. Even if further design work is desired, the result from the apparatus and methods described herein provide a desirable starting point for further design work. Consequently, these methods and apparatus significantly reduce the cost of design. These methods and apparatus also can be used to produce an embedded processor design that has the least cost for carrying out a defined set of operations.
The term "cost" is used herein to represent a measurable quantity corresponding to the hardware component. In the most basic sense, it means the cost of inserting the hardware component into the processor, including the number of switches and the layout for each component. In a more general sense, it may also mean the cost in power consumption during operation of the processor. Alternatively, the cost may be in terms of the amount real estate or chip area occupied by the component. Similarly, the cost could be a