CN1295599C

CN1295599C - Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor

Info

Publication number: CN1295599C
Application number: CNB2004100467655A
Authority: CN
Inventors: 扈啸; 陈书明; 陈宝民; 张丹瑜; 胡定磊; 郭阳; 万江华; 刘祥远
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2004-09-17
Filing date: 2004-09-17
Publication date: 2007-01-17
Anticipated expiration: 2024-09-17
Also published as: CN1598761A

Abstract

The present invention discloses an instruction prefetching method for the dynamic and static binding of microprocessors in a very long instruction word structure. The technical problems that the efficiency of instruction prefetching can not be improved, and memory delay can not be hidden are solved. The present invention has the technical scheme that a prefetched controller is designed for charging with instruction prefetching, a simulation debugging environment is established for controlling instruction prefetching, and a prefetched instruction nop_pref and NOP are designed to be compatible. In compilation, the nop_pref replaces the NOP. In debugging, the prefetched controller dynamically adjusts a prefetched occasion, the change information of the prefetched occasion is fed back to a compiler to optimize a static prefetched policy, and a new program generated after the prefetched policy is adjusted by the compiler is downloaded to an RAM memory. An individuation prefetched prediction mechanism aiming at the application system is formed after feedback is carried out for serval times. On the premise that the simplicity and the low power consumption of hardware of the embedded microprocessor are satisfied, memory delay is hidden, and program execution efficiency is improved. The defect of long code length of static prefetching can be overcome, and the defect of low precision of dynamic prefetching can be overcome.

Description

The instruction prefetch method of very long instruction word structure microprocessor sound attitude combination

Technical field: the present invention relates to the method for instruction prefetch in very long instruction word (VLIW) the structure microprocessor Design, the method for instruction prefetch in the especially embedded vliw microprocessor design.

Background technology: present high-performance microprocessor has much adopted the VLIW structure, is characterized in: a plurality of functional unit concurrent workings, instruction concurrency and data transmit fully to be determined by compiler when compiling.This mechanism of VLIW instruction implements simply, and hardware complexity is low, and can develop more instruction level parallelism, thereby is widely used.

Because the performance gap of processor and storer is increasing, storage delay becomes system bottleneck.In order to hide storage delay, a lot of processor adopting prefetching technique, by special-purpose prefetched instruction or specialized hardware mechanism, carry out software scheduling or way of hardware and software combination by compiler the data of external memory storage are transferred in the middle of the internally cached cache of sheet in advance.Wherein the precision of hardware prefetch is often not high, and if the support of compiler has been arranged, only in suitable, just look ahead, then can raise the efficiency, reduce unnecessary looking ahead.IBM 370/168 and Amdahl 470V processor have all used prefetching technique.

Steven Paul VanderWiel has studied the auxiliary data pre-fetching controller of compiling in its PhD dissertation, be the data prefetching method of sound attitude combination, can obtain the raising of carrying out efficient about 50%.But it is studied at universal cpu, and the characteristics of embedded vliw microprocessor are not fully utilized, and the insertion prefetched instruction has increased code length.

General embedded vliw microprocessor system comprises CPU nuclear, Cache system, memory controller and four parts of storer, passes through instruction bus between them, data bus is connected with address bus.CPU nuclear is responsible for execution command, and execution result is write back storer; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of CPU nuclear or data were not in Cache, memory controller was responsible for instruction or data are read in Cache from storer; Storer is held instruction and data.

The executive routine of embedded vliw microprocessor can be divided into several fundamental blocks, fundamental block is the basic composition unit of program, it has only an inlet (being the first statement of fundamental block) and an outlet (being the last item statement of fundamental block), and the outlet of fundamental block is a branch instruction.The program code of embedded microprocessor generally is solidificated in the nonvolatile memory, but for reducing memory latency, program code is shifted among the RAM and moves.

Embedded vliw microprocessor is mainly used in computation-intensive or the data-intensive built-in applied system, move embedded real-time operating system or do not have operating system, the application code relative fixed of carrying out, code generally is solidificated in the nonvolatile memory, seldom change.

In the debug phase of built-in applied system, application system is operated in the real environment of trying one's best, and application code moves in embedded vliw microprocessor by emulator.This running state of programs is called artificial debugging, and debug host is by the RS of emulator visit vliw microprocessor.

Summary of the invention: the technical problem to be solved in the present invention is the characteristics that make full use of embedded vliw microprocessor, with the method for sound attitude combination, further improves the efficient of instruction prefetch, reaches the purpose of hiding storage delay.Technical scheme of the present invention is that prefetch controller of design is responsible for instruction prefetch specially in vliw microprocessor, set up an artificial debugging environment and be used for instruction prefetch control, design prefetched instruction nop_pref and NOP instruction are compatible, when compiling, the non-operation instruction NOP in the vliw microprocessor is replaced with prefetched instruction, in debug process, look ahead opportunity by dynamic adjustment of prefetch controller, and the change information that will look ahead opportunity feeds back to compiler to optimize static prefetch policy, the new procedures code that will generate after the static prefetch policy of compiler adjustment downloads to the RAM storer, form at the personalization of this application system forecasting mechanism of looking ahead through feedback back repeatedly, the realization of embedded microprocessor hardware is simple satisfying, under the prerequisite low in energy consumption, hide storage delay, improve executing efficiency, both overcome static state and looked ahead and increase the shortcoming of code length, overcome the low shortcoming of dynamic prefetch precision again.Still there is not the report that adopts this method prefetched instruction at present both at home and abroad.

The concrete scheme of the present invention is:

Set up an artificial debugging environment and be used for instruction prefetch control.This environment is by the compiler on the debug host, the built-in applied system that under true environment, moves, and the hardware emulator that connects vliw microprocessor in debug host and the built-in applied system is formed.Compiler reads back the adjustment information of looking ahead of preserving in the RAM storer by hardware emulator from built-in applied system, the program code that compiler will be adjusted behind the prefetched instruction downloads in the built-in applied system for the vliw microprocessor execution by emulator.

Design non-operation instruction (NOP) in the embedded vliw microprocessor instruction set, be used for filling the delay groove and the boundary alignment that is used for the parallel instruction bag of multi-cycle instructions.The NOP instruction accounts for the 5%-20% of total number of instructions in the general application program.

It is compatible with the NOP instruction that the present invention designs the nop_pref instruction, is exactly the NOP instruction in CPU nuclear nop_pref instruction.

The nop_pref instruction comprises effective field V of instruction and E field, the compatible part spec of address intrc that looks ahead and cst field and blank operation field.The V field shows whether this NOP can be replaced by the nop_pref instruction.V can be replaced by the nop_pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop_pref instruction.The E field shows whether this nop_pref instruction is effective.E can be performed for effectively representing this nop_pref instruction, and E is that this nop_pref instruction of invalid representation cannot be performed.The instruction address that the common expression of intrc field and cst field is looked ahead.The spec field is represented the implication of NOP instruction blank operation in the compatible former instruction set, comprises the order code of NOP, and whether the periodicity of blank operation walks abreast etc.

The present invention designs a prefetch controller and carries out prefetched instruction, analyzes the effect of looking ahead, adjusts and look ahead opportunity in embedded vliw microprocessor.It is made up of bus monitor module, NOP buffer and prefetch module three parts, the bus monitor module connects by instruction bus, address bus and cache system and CPU nuclear phase, and NOP buffer links to each other with prefetch module with the bus monitor module by data bus, address bus.

The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module.Monitoring submodule is comparer, monitors instruction bus and address bus between the CPU nuclear and instruction cache.The decoding submodule links to each other with NOP buffer with the monitoring submodule with address bus by data bus, and function has two: deposit the effective nop_pref instruction of all V fields in NOP buffer; The prefetch module execution is sent in the effective nop_pref instruction of E field looks ahead.Chronon module records implementation sub-module sends the moment t that looks ahead ₀, the prefetched instruction piece enters the moment t of program cache _p, this instruction block of CPU nuclear request moment t _r, this three times are sent into the prefetch module analysis effect of looking ahead by data bus.

NOP buffer is a buffer, adopt RAM to realize, adopt the first-in first-out structure, be responsible for the content and the address of the nop_pref instruction of buffer memory execution recently, its degree of depth is the N item, a write port and a read port are arranged, and the V field that write port will be carried out recently is the content of effective nop_pref instruction and the buffer item that the address writes the write pointer appointment, write pointer with 1,2,3...N, 1,2... recycle design increase progressively writing mode.Read port is controlled by prefetch module, and prefetch module is by read port retrieval NOP buffer.

Prefetch module is made up of implementation sub-module and analysis submodule.Implementation sub-module links to each other with command cache with the decoding submodule by data bus, is responsible for sending prefetched command to command cache.Analyze submodule and link to each other with memory controller with NOP buffer with address bus, according to t by data bus ₀, t _pAnd t _rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable: if t _p-t _r＜0 hysteresis of looking ahead needs with the nop_pref position in advance; If t _p-t _r＞Tm (Tm is an empirical value) then looks ahead too early, and the content of looking ahead occupies the cache space and can not used immediately, causes the waste in cache space, after the nop_pref position need being moved; If Tm＞t _p-t _r＞0 location-appropriate of looking ahead need not to adjust.The value of Tm is (t _p-t ₀) cache capacity and the capable capacity decision of cache.

The method of adjusting the nop_pref position is as follows: shift to an earlier date nop_pref if desired, analyze submodule retrieves last available nop_pref (the E field is invalid) of this nop_pref instruction in NOP buffer address, with this nop_pref instruction this address by the memory controller write store; And the E field of this nop_pref instruction is made as writes back the original address of this instruction after invalid.Move back nop_pref if desired, prefetch module retrieves this nop_pref and instructs a back available nop_pref address in NOP buffer, and this nop_pref instruction is write this address; This nop_pref instruction that is made as after invalid is write raw address.

When a program was moved in embedded vliw microprocessor, the concrete execution in step of instruction prefetch method of the present invention was:

1. compiler is according to cpu clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, change certain NOP (blank operation) instruction in the assembly code after the compiling into prefetched instruction nop_pref, make calling program carry out prefetched instruction earlier, in advance instruction block is taken out from storer and deposit command cache in, instruct to be ready in cache when this instruction block is performed, CPU endorses to carry out immediately and need not to wait for.Compiler is to the prediction of all looking ahead of all instruction blocks, and to replace corresponding NOP be nop_pref.

2. for improving the performance of looking ahead, in program process, dynamically adjust the position of nop_pref according to the practical programs operation result by prefetch controller.Detailed process is as follows:

1) the monitoring submodule of prefetch controller is monitored instruction bus and the address bus between the CPU nuclear and instruction cache, and the effective nop_pref instruction of V field gone up by the instruction bus (hereinafter to be referred as instruction bus) that the decoding submodule is constantly examined CPU command cache and the address deposits NOP buffer in.

2) monitor submodule and in a single day detect the effective nop_pref instruction of V field on the instruction bus, send prefetch request, and write down this and be t constantly to command cache ₀Constantly.When the program block of nop_pref appointment is read into command cache, writes down this and be t constantly _pConstantly.When this prefetched program block of CPU nuclear request, write down this and be t constantly _rConstantly, if t _rInfinity then this look ahead invalid, the note t _r=T _MAXAnalyze submodule according to t ₀, t _pAnd t _rJudge constantly whether the opportunity of looking ahead is suitable for these three.If t _p-t _r＜0 hysteresis of looking ahead needs with the nop_pref position in advance; If t _p-t _r＞Tm then looks ahead in advance, and cache is capable to cause waste if too much can occupy too early in advance, after need move the nop_pref position; If Tm＞t _p-t _r＞0 location-appropriate of looking ahead need not to adjust.

3) analyze submodule according to the record among the NOP buffer, this nop_pref instruction is adjusted to one satisfy Tm＞t _p-t _rThe nop_pref position that＞0 V field is invalid, and this nop_pref instruction is write in the address of the invalid nop_pref of this V field and go, realize dynamically adjusting the purpose of looking ahead.

3. after program is moved a period of time under simulated environment, debug host is by the operation of emulator shut down procedure, and with the debug host of reading back of the program code in the RAM storer, dynamically promptly all nop_pref instruction addresses and content feed are given compiler to the position adjustment information of nop_pref to extract prefetch controller.Compiler is adjusted its static prefetch policy promptly to adjust the position of nop_pref in program according to this information, the new procedures code that debug host will have been adjusted behind the position of nop_pref in program downloads in the RAM storer, and vliw microprocessor re-executes this program.Repeat said process, compiler is constantly revised static prefetch policy according to the adjusted position of looking ahead of hardware dynamic, revises through stopping after the repetition of certain number of times, obtains at the personalization of this application system forecasting mechanism of looking ahead.Last debug process finishes, and compiler provides the program code that abundant optimization is looked ahead, as final program Solidification in built-in applied system.

Adopt the present invention can produce following useful technique effect:

1. it is medium to be read in advance the required instruction bag of vliw microprocessor to be moved command cache from chip external memory according to program process, the time-delay of having hidden the processor access external memory storage.

2. vliw microprocessor is placed artificial debugging under the true environment, dynamically adjust the position of nop_pref according to the practical programs operation result, improved the accuracy of looking ahead greatly, reach the effect of well looking ahead by prefetch controller.

3. machine information feeds back to compiler by emulator during with the looking ahead of hardware dynamic adjustment, the static prefetch policy of compiler adjustment regenerates the program code execution and looks ahead, utilize the NOP instruction that exists originally in the program, under the situation that does not increase any code length, realized instruction prefetch.

Description of drawings:

Fig. 1 is a background technology vliw microprocessor overall logic structural drawing;

Fig. 2 is the simulation hardware debugging enironment synoptic diagram that the present invention sets up;

Fig. 3 is the order format of prefetched instruction nop_pref of the present invention;

Fig. 4 has adopted vliw microprocessor building-block of logic of the present invention;

Fig. 5 is the structural representation of NOP buffer in the prefetch controller of the present invention;

Fig. 6 is the effect contrast figure of the present invention and traditional forecasting method.

Embodiment:

Fig. 1 is a background technology overall logic structural drawing.Whole embedded vliw microprocessor system comprises that CPU nuclear, Cache system, memory controller and four part: CPU nuclears of storer are responsible for execution commands, and execution result write back storer, it is connected with the Cache system with address bus by the instruction/data bus; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of CPU nuclear or data were not in Cache, memory controller was responsible for Cache is read in instruction or data from storer; Storer is held instruction and data.Be connected with address bus by instruction bus, data bus between CPU nuclear, Cache system, memory controller and these four modules of storer.

Fig. 2 is the simulation hardware debugging enironment synoptic diagram that the present invention sets up.The simulation hardware debugging enironment comprises debug host, hardware emulator and built-in applied system three parts.The program code that moves in the debugging embedded vliw microprocessor of debug host, compiler moves on debug host; Built-in applied system is the working environment of embedded vliw microprocessor; Hardware emulator provides the debug communications passage for debug host and embedded vliw microprocessor, also provides the communication port of looking ahead for compiler and prefetch controller.

In the debug phase of built-in applied system, this application system is operated in the real environment of trying one's best, and application code moves in the vliw microprocessor of application system by hardware emulator.This running state of programs is called artificial debugging, and debug host is by the RS of the addressable vliw microprocessor of emulator.

Compiler is according to vliw microprocessor clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, certain NOP instruction is replaced with prefetched instruction nop_pref, by hardware emulator program code is downloaded in the RAM storer of built-in applied system and carry out.In program process, dynamically adjust the position of prefetched instruction by prefetch controller according to the practical programs operational process, and adjusted result is write back the RAM storer.Program code moves a period of time in true application system after, debug host control vliw microprocessor is out of service, with the memory content debug host of reading back, revised according to the right static prefetch policy of the adjusted prefetching information of hardware dynamic by compiler, the program code after again correction being looked ahead downloads in the application system to be carried out.So repeatedly, form at the personalization of this application system forecasting mechanism of looking ahead, realize high efficiency looking ahead with lower cost through feedback back compiler repeatedly.

Fig. 3 is the order format of prefetched instruction nop_pref of the present invention.The nop_pref instruction comprises instruction effective field (V and E field), the compatible part of the address of looking ahead (intrc and cst field) and blank operation (spec field).The V field shows whether this NOP can be replaced by the nop_pref instruction.Wherein V can be replaced by the nop_pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop_pref instruction.The E field shows whether this nop_pref instruction is effective.E can be performed for effectively representing this nop_pref instruction, and E is that this nop_pref instruction of invalid representation cannot be performed.The instruction address that the common expression of intrc field and cst field is looked ahead.The spec field is represented the implication of NOP instruction blank operation in the compatible former instruction set, comprises the order code of NOP, and whether the periodicity of blank operation walks abreast etc.

Fig. 4 is a prefetch controller building-block of logic of the present invention.Prefetch controller links to each other by bus with CPU nuclear, command cache, memory controller.Prefetch controller is made up of bus monitor module, NOP buffer and prefetch module three parts.

The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module.Monitoring submodule is comparer, is used to monitor instruction bus and address bus between the CPU nuclear and instruction cache.Decoding submodule function has two: deposit the effective nop_pref instruction of all V fields in NOP buffer; The prefetch module execution is sent in the effective nop_pref instruction of E field looks ahead.The chronon module records is sent the moment t that looks ahead ₀, the prefetched instruction piece enters the moment t of program cache _p, the moment t of this instruction block of CPU nuclear request _r, this three times are sent into the prefetch module analysis effect of looking ahead.

NOP buffer is responsible for the content and the address of the nop_pref instruction of buffer memory execution recently.Its degree of depth is N, writes fashionable employing circulation writing mode.Its content is for the prefetch module retrieval.

Prefetch module is made up of implementation sub-module and analysis submodule.Implementation sub-module sends prefetched command with effective nop_pref instruction decode to command cache.Analyze submodule according to t ₀, t _pAnd t _rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable; Adjusted prefetched instruction is write back in the storer by memory controller.

Fig. 5 is the structural representation of NOP buffer among the present invention.NOP buffer is RAM, is responsible for the content and the address of the nop_pref instruction of buffer memory execution recently, and width is the M position, and the degree of depth is the N item, and a write port and a read port are arranged.The V field that write port will be carried out recently is the content of 1 nop_pref instruction and the buffer item that the address writes the write pointer appointment, write pointer with 1,2,3...N, 1,2... recycle design increase progressively.Read port is controlled by prefetch module.

Fig. 6 is the effect contrast figure of the present invention and traditional forecasting method.Wherein figure (a) is the effect of not looking ahead, and figure (b) is the effect of looking ahead of classic method, and figure (c) adopts the effect of looking ahead of the present invention.Get for three times that r1, r2 and r3 representation program are carried out in CPU nuclear among the figure and refer to request.The transverse axis express time, the every little lattice on the transverse axis are represented a clock period.The little lattice of Dark grey represent that CPU nuclear is in calculating.Grayish little lattice are illustrated in and carry out memory access.The little lattice of white are represented that CPU nuclear is got and are referred to that visit cache hits, and obtains required instruction immediately.The little lattice of black are represented that CPU nuclear is got and are referred to that visit cache does not hit, and can not obtain required instruction immediately.The little lattice of pattern are represented CPU nuclear execution prefetched instruction, and storer begins prefetched instruction.

In scheming (a), CPU nuclear sends the finger request of getting constantly respectively at r1, r2 and r3, and owing to there not being prefetch mechanisms, visit cache does not hit, and CPU nuclear can not get instructing, and pipeline stalling is waited for the reference-to storage end.Obtain instructing the back streamline to work on.

In figure (b), CPU nuclear r1 constantly before one-period send prefetch request, but prefetch request is late excessively, causes CPU nuclear to send constantly at r1 and gets when referring to ask, the content of looking ahead also is not admitted to cache, streamline is suspended.It is similar that CPU nuclear sends the situation of getting when referring to request constantly at r2, and streamline still is suspended.And that CPU checks the moment that the 3rd prefetched instruction carry out is too early, sends constantly at r3 and gets when referring to request, and instruction has been preserved in cache three cycles, causes waste to a certain degree.

In figure (c), CPU nuclear is carried out the moment of prefetched instruction through prefetch controller and the repeatedly adjustment of compiler in simulated environment, carry out immediately after making call instruction be taken into cache, both do not caused pipeline stalling, also do not waste the cache space, the overall performance of processor is greatly improved.

The present invention has been applied in the YHFT series DSP that University of Science and Technology for National Defence develops voluntarily, and the NOPbuffer width during design in the prefetch controller is 64, and the degree of depth is 64, and Tm is 3 * (t _p-t ₀).

Claims

1. the instruction prefetch method of very long instruction word structure microprocessor sound attitude combination, it is characterized in that prefetch controller of design is responsible for instruction prefetch specially in vliw microprocessor, set up an artificial debugging environment and be used for instruction prefetch control, design prefetched instruction nop-pref and NOP instruction are compatible, when compiling, the non-operation instruction NOP in the vliw microprocessor is replaced with prefetched instruction, in debug process, look ahead opportunity by dynamic adjustment of prefetch controller, and the change information that will look ahead opportunity feeds back to compiler to optimize static prefetch policy, the new procedures code that will generate after the static prefetch policy of compiler adjustment downloads to the RAM storer, forms at the personalization of this application system forecasting mechanism of looking ahead through feedback back repeatedly.

2. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that the described method of setting up the artificial debugging environment is: by the compiler on the debug host, the built-in applied system that under true environment, moves, the hardware emulator that connects vliw microprocessor in debug host and the built-in applied system is formed the artificial debugging environment, compiler reads back the adjustment information of looking ahead of preserving in the RAM storer by hardware emulator from built-in applied system, the program code that compiler will be adjusted behind the prefetched instruction downloads in the built-in applied system for the vliw microprocessor execution by emulator.

3. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that described nop-pref instruction method for designing is: it comprises effective field V of instruction and E field, the compatible part spec of address intrc that looks ahead and cst field and blank operation field: the V field shows whether this NOP can be replaced by the nop-pref instruction, V can be replaced by the nop-pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop-pref instruction; The E field shows whether this nop-pref instruction is effective, and E can be performed for effectively representing this nop-pref instruction, and E is that this nop-pref instruction of invalid representation cannot be performed; The instruction address that the common expression of intrc field and cst field is looked ahead, spec field are represented the implication of NOP instruction blank operation in the compatible former instruction set, comprise the order code of NOP, and whether the periodicity of blank operation is parallel.

4. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that described prefetch controller is that prefetched instruction is carried out in design in vliw microprocessor, look ahead effect and dynamically adjust the parts of looking ahead opportunity of analysis, its method for designing is: it is by the bus monitor module, NOP buffer and prefetch module three parts are formed, the bus monitor module passes through instruction bus, address bus and cache system and CPU nuclear phase connect, and NOP buffer passes through data bus, address bus links to each other with prefetch module with the bus monitor module:

The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module, monitoring submodule is comparer, monitor instruction bus and address bus between the CPU nuclear and instruction cache, constantly CPU is examined on the instruction bus of command cache effective nop-pref instruction of V field and address thereof and deposit NOP buffer in; The decoding submodule links to each other with NOP buffer with the monitoring submodule with address bus by data bus, and function has two: deposit effective nop-pref instruction of all V fields and address thereof in NOP buffer; The prefetch module execution is sent in the effective nop-pref instruction of E field looks ahead; Chronon module records implementation sub-module sends the moment t that looks ahead ₀, the prefetched instruction piece enters the moment t of program cache _p, this instruction block of CPU nuclear request moment t _r, this three times are sent into the prefetch module analysis effect of looking ahead by data bus;

NOP buffer is a buffer, adopt the first-in first-out structure, realize with RAM, be responsible for the content and the address of the nop-pref instruction of buffer memory execution recently, its degree of depth is the N item, a write port and a read port are arranged, the V field that write port will be carried out recently is the content of effective nop-pref instruction and the buffer item that the address writes the write pointer appointment, write pointer is with 1,2,3 ... N, 1,2 ... recycle design increases progressively writing mode, read port is controlled by prefetch module, and prefetch module is by read port retrieval NOP buffer;

Prefetch module is made up of implementation sub-module and analysis submodule, and implementation sub-module links to each other with command cache with the decoding submodule by data bus, is responsible for sending prefetched command to command cache; Analyze submodule and link to each other with memory controller with NOP buffer with address bus, according to t by data bus ₀, t _pAnd t _rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable: if t _p-t _r＜0 hysteresis of looking ahead needs with the nop-pref position in advance; If t _p-t _r＞Tm then looks ahead too early, and the content of looking ahead occupies the cache space and can not used immediately, causes the waste in cache space, after the nop-pref position need being moved; If Tm＞t _p-t _r＞0 location-appropriate of looking ahead need not to adjust; The value of Tm is t _p-t ₀, the capable capacity decision of cache capacity and cache; The method of adjusting the nop-pref position is as follows: shift to an earlier date nop-pref if desired, analyze submodule retrieves the invalid nop-pref of last available E field of this nop-pref instruction in NOPbuffer address, with this nop-pref instruction this address by the memory controller write store; And the E field of this nop-pref instruction is made as writes back the original address of this instruction after invalid; Move back nop-pref if desired, prefetch module retrieves this nop-pref and instructs a back available nop-pref address in NOP buffer, and this nop-pref instruction is write this address; This nop-pref instruction that is made as after invalid is write raw address.

5. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination is characterized in that when a program is moved in embedded vliw microprocessor, and concrete execution in step is:

Compiler is according to cpu clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, change certain NOP instruction in the assembly code after the compiling into prefetched instruction nop-pref, make calling program carry out prefetched instruction earlier, in advance instruction block is taken out from storer and deposit command cache in, instruct to be ready in cache when this instruction block is performed, CPU endorses to carry out immediately and need not to wait for; Compiler is to the prediction of all looking ahead of all instruction blocks, and to replace corresponding NOP be nop-pref;

In program process, dynamically adjust the position of nop-pref according to the practical programs operation result by prefetch controller;

Program is moved a period of time under simulated environment after, debug host is by the operation of emulator shut down procedure, and with the debug host of reading back of the program code in the RAM storer, dynamically promptly all nop-pref instruction addresses and content feed are given compiler to the position adjustment information of nop-pref to extract prefetch controller, compiler is adjusted its static prefetch policy promptly to adjust the position of nop-pref in program according to this information, the new procedures code that debug host will have been adjusted behind the position of nop-pref in program downloads in the RAM storer, vliw microprocessor re-executes this program, repeat said process, compiler is constantly revised static prefetch policy according to the adjusted position of looking ahead of hardware dynamic, revise through stopping after the repetition of certain number of times, obtain at the personalization of this application system forecasting mechanism of looking ahead, last debug process finishes, compiler provides the program code that abundant optimization is looked ahead, as final program Solidification in built-in applied system.