CN1295599C - Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor - Google Patents

Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor Download PDF

Info

Publication number
CN1295599C
CN1295599C CNB2004100467655A CN200410046765A CN1295599C CN 1295599 C CN1295599 C CN 1295599C CN B2004100467655 A CNB2004100467655 A CN B2004100467655A CN 200410046765 A CN200410046765 A CN 200410046765A CN 1295599 C CN1295599 C CN 1295599C
Authority
CN
China
Prior art keywords
instruction
nop
pref
prefetch
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2004100467655A
Other languages
Chinese (zh)
Other versions
CN1598761A (en
Inventor
扈啸
陈书明
陈宝民
张丹瑜
胡定磊
郭阳
万江华
刘祥远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CNB2004100467655A priority Critical patent/CN1295599C/en
Publication of CN1598761A publication Critical patent/CN1598761A/en
Application granted granted Critical
Publication of CN1295599C publication Critical patent/CN1295599C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention discloses an instruction prefetching method for the dynamic and static binding of microprocessors in a very long instruction word structure. The technical problems that the efficiency of instruction prefetching can not be improved, and memory delay can not be hidden are solved. The present invention has the technical scheme that a prefetched controller is designed for charging with instruction prefetching, a simulation debugging environment is established for controlling instruction prefetching, and a prefetched instruction nop_pref and NOP are designed to be compatible. In compilation, the nop_pref replaces the NOP. In debugging, the prefetched controller dynamically adjusts a prefetched occasion, the change information of the prefetched occasion is fed back to a compiler to optimize a static prefetched policy, and a new program generated after the prefetched policy is adjusted by the compiler is downloaded to an RAM memory. An individuation prefetched prediction mechanism aiming at the application system is formed after feedback is carried out for serval times. On the premise that the simplicity and the low power consumption of hardware of the embedded microprocessor are satisfied, memory delay is hidden, and program execution efficiency is improved. The defect of long code length of static prefetching can be overcome, and the defect of low precision of dynamic prefetching can be overcome.

Description

The instruction prefetch method of very long instruction word structure microprocessor sound attitude combination
Technical field: the present invention relates to the method for instruction prefetch in very long instruction word (VLIW) the structure microprocessor Design, the method for instruction prefetch in the especially embedded vliw microprocessor design.
Background technology: present high-performance microprocessor has much adopted the VLIW structure, is characterized in: a plurality of functional unit concurrent workings, instruction concurrency and data transmit fully to be determined by compiler when compiling.This mechanism of VLIW instruction implements simply, and hardware complexity is low, and can develop more instruction level parallelism, thereby is widely used.
Because the performance gap of processor and storer is increasing, storage delay becomes system bottleneck.In order to hide storage delay, a lot of processor adopting prefetching technique, by special-purpose prefetched instruction or specialized hardware mechanism, carry out software scheduling or way of hardware and software combination by compiler the data of external memory storage are transferred in the middle of the internally cached cache of sheet in advance.Wherein the precision of hardware prefetch is often not high, and if the support of compiler has been arranged, only in suitable, just look ahead, then can raise the efficiency, reduce unnecessary looking ahead.IBM 370/168 and Amdahl 470V processor have all used prefetching technique.
Steven Paul VanderWiel has studied the auxiliary data pre-fetching controller of compiling in its PhD dissertation, be the data prefetching method of sound attitude combination, can obtain the raising of carrying out efficient about 50%.But it is studied at universal cpu, and the characteristics of embedded vliw microprocessor are not fully utilized, and the insertion prefetched instruction has increased code length.
General embedded vliw microprocessor system comprises CPU nuclear, Cache system, memory controller and four parts of storer, passes through instruction bus between them, data bus is connected with address bus.CPU nuclear is responsible for execution command, and execution result is write back storer; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of CPU nuclear or data were not in Cache, memory controller was responsible for instruction or data are read in Cache from storer; Storer is held instruction and data.
The executive routine of embedded vliw microprocessor can be divided into several fundamental blocks, fundamental block is the basic composition unit of program, it has only an inlet (being the first statement of fundamental block) and an outlet (being the last item statement of fundamental block), and the outlet of fundamental block is a branch instruction.The program code of embedded microprocessor generally is solidificated in the nonvolatile memory, but for reducing memory latency, program code is shifted among the RAM and moves.
Embedded vliw microprocessor is mainly used in computation-intensive or the data-intensive built-in applied system, move embedded real-time operating system or do not have operating system, the application code relative fixed of carrying out, code generally is solidificated in the nonvolatile memory, seldom change.
In the debug phase of built-in applied system, application system is operated in the real environment of trying one's best, and application code moves in embedded vliw microprocessor by emulator.This running state of programs is called artificial debugging, and debug host is by the RS of emulator visit vliw microprocessor.
Summary of the invention: the technical problem to be solved in the present invention is the characteristics that make full use of embedded vliw microprocessor, with the method for sound attitude combination, further improves the efficient of instruction prefetch, reaches the purpose of hiding storage delay.Technical scheme of the present invention is that prefetch controller of design is responsible for instruction prefetch specially in vliw microprocessor, set up an artificial debugging environment and be used for instruction prefetch control, design prefetched instruction nop_pref and NOP instruction are compatible, when compiling, the non-operation instruction NOP in the vliw microprocessor is replaced with prefetched instruction, in debug process, look ahead opportunity by dynamic adjustment of prefetch controller, and the change information that will look ahead opportunity feeds back to compiler to optimize static prefetch policy, the new procedures code that will generate after the static prefetch policy of compiler adjustment downloads to the RAM storer, form at the personalization of this application system forecasting mechanism of looking ahead through feedback back repeatedly, the realization of embedded microprocessor hardware is simple satisfying, under the prerequisite low in energy consumption, hide storage delay, improve executing efficiency, both overcome static state and looked ahead and increase the shortcoming of code length, overcome the low shortcoming of dynamic prefetch precision again.Still there is not the report that adopts this method prefetched instruction at present both at home and abroad.
The concrete scheme of the present invention is:
Set up an artificial debugging environment and be used for instruction prefetch control.This environment is by the compiler on the debug host, the built-in applied system that under true environment, moves, and the hardware emulator that connects vliw microprocessor in debug host and the built-in applied system is formed.Compiler reads back the adjustment information of looking ahead of preserving in the RAM storer by hardware emulator from built-in applied system, the program code that compiler will be adjusted behind the prefetched instruction downloads in the built-in applied system for the vliw microprocessor execution by emulator.
Design non-operation instruction (NOP) in the embedded vliw microprocessor instruction set, be used for filling the delay groove and the boundary alignment that is used for the parallel instruction bag of multi-cycle instructions.The NOP instruction accounts for the 5%-20% of total number of instructions in the general application program.
It is compatible with the NOP instruction that the present invention designs the nop_pref instruction, is exactly the NOP instruction in CPU nuclear nop_pref instruction.
The nop_pref instruction comprises effective field V of instruction and E field, the compatible part spec of address intrc that looks ahead and cst field and blank operation field.The V field shows whether this NOP can be replaced by the nop_pref instruction.V can be replaced by the nop_pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop_pref instruction.The E field shows whether this nop_pref instruction is effective.E can be performed for effectively representing this nop_pref instruction, and E is that this nop_pref instruction of invalid representation cannot be performed.The instruction address that the common expression of intrc field and cst field is looked ahead.The spec field is represented the implication of NOP instruction blank operation in the compatible former instruction set, comprises the order code of NOP, and whether the periodicity of blank operation walks abreast etc.
The present invention designs a prefetch controller and carries out prefetched instruction, analyzes the effect of looking ahead, adjusts and look ahead opportunity in embedded vliw microprocessor.It is made up of bus monitor module, NOP buffer and prefetch module three parts, the bus monitor module connects by instruction bus, address bus and cache system and CPU nuclear phase, and NOP buffer links to each other with prefetch module with the bus monitor module by data bus, address bus.
The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module.Monitoring submodule is comparer, monitors instruction bus and address bus between the CPU nuclear and instruction cache.The decoding submodule links to each other with NOP buffer with the monitoring submodule with address bus by data bus, and function has two: deposit the effective nop_pref instruction of all V fields in NOP buffer; The prefetch module execution is sent in the effective nop_pref instruction of E field looks ahead.Chronon module records implementation sub-module sends the moment t that looks ahead 0, the prefetched instruction piece enters the moment t of program cache p, this instruction block of CPU nuclear request moment t r, this three times are sent into the prefetch module analysis effect of looking ahead by data bus.
NOP buffer is a buffer, adopt RAM to realize, adopt the first-in first-out structure, be responsible for the content and the address of the nop_pref instruction of buffer memory execution recently, its degree of depth is the N item, a write port and a read port are arranged, and the V field that write port will be carried out recently is the content of effective nop_pref instruction and the buffer item that the address writes the write pointer appointment, write pointer with 1,2,3...N, 1,2... recycle design increase progressively writing mode.Read port is controlled by prefetch module, and prefetch module is by read port retrieval NOP buffer.
Prefetch module is made up of implementation sub-module and analysis submodule.Implementation sub-module links to each other with command cache with the decoding submodule by data bus, is responsible for sending prefetched command to command cache.Analyze submodule and link to each other with memory controller with NOP buffer with address bus, according to t by data bus 0, t pAnd t rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable: if t p-t r<0 hysteresis of looking ahead needs with the nop_pref position in advance; If t p-t r>Tm (Tm is an empirical value) then looks ahead too early, and the content of looking ahead occupies the cache space and can not used immediately, causes the waste in cache space, after the nop_pref position need being moved; If Tm>t p-t r>0 location-appropriate of looking ahead need not to adjust.The value of Tm is (t p-t 0) cache capacity and the capable capacity decision of cache.
The method of adjusting the nop_pref position is as follows: shift to an earlier date nop_pref if desired, analyze submodule retrieves last available nop_pref (the E field is invalid) of this nop_pref instruction in NOP buffer address, with this nop_pref instruction this address by the memory controller write store; And the E field of this nop_pref instruction is made as writes back the original address of this instruction after invalid.Move back nop_pref if desired, prefetch module retrieves this nop_pref and instructs a back available nop_pref address in NOP buffer, and this nop_pref instruction is write this address; This nop_pref instruction that is made as after invalid is write raw address.
When a program was moved in embedded vliw microprocessor, the concrete execution in step of instruction prefetch method of the present invention was:
1. compiler is according to cpu clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, change certain NOP (blank operation) instruction in the assembly code after the compiling into prefetched instruction nop_pref, make calling program carry out prefetched instruction earlier, in advance instruction block is taken out from storer and deposit command cache in, instruct to be ready in cache when this instruction block is performed, CPU endorses to carry out immediately and need not to wait for.Compiler is to the prediction of all looking ahead of all instruction blocks, and to replace corresponding NOP be nop_pref.
2. for improving the performance of looking ahead, in program process, dynamically adjust the position of nop_pref according to the practical programs operation result by prefetch controller.Detailed process is as follows:
1) the monitoring submodule of prefetch controller is monitored instruction bus and the address bus between the CPU nuclear and instruction cache, and the effective nop_pref instruction of V field gone up by the instruction bus (hereinafter to be referred as instruction bus) that the decoding submodule is constantly examined CPU command cache and the address deposits NOP buffer in.
2) monitor submodule and in a single day detect the effective nop_pref instruction of V field on the instruction bus, send prefetch request, and write down this and be t constantly to command cache 0Constantly.When the program block of nop_pref appointment is read into command cache, writes down this and be t constantly pConstantly.When this prefetched program block of CPU nuclear request, write down this and be t constantly rConstantly, if t rInfinity then this look ahead invalid, the note t r=T MAXAnalyze submodule according to t 0, t pAnd t rJudge constantly whether the opportunity of looking ahead is suitable for these three.If t p-t r<0 hysteresis of looking ahead needs with the nop_pref position in advance; If t p-t r>Tm then looks ahead in advance, and cache is capable to cause waste if too much can occupy too early in advance, after need move the nop_pref position; If Tm>t p-t r>0 location-appropriate of looking ahead need not to adjust.
3) analyze submodule according to the record among the NOP buffer, this nop_pref instruction is adjusted to one satisfy Tm>t p-t rThe nop_pref position that>0 V field is invalid, and this nop_pref instruction is write in the address of the invalid nop_pref of this V field and go, realize dynamically adjusting the purpose of looking ahead.
3. after program is moved a period of time under simulated environment, debug host is by the operation of emulator shut down procedure, and with the debug host of reading back of the program code in the RAM storer, dynamically promptly all nop_pref instruction addresses and content feed are given compiler to the position adjustment information of nop_pref to extract prefetch controller.Compiler is adjusted its static prefetch policy promptly to adjust the position of nop_pref in program according to this information, the new procedures code that debug host will have been adjusted behind the position of nop_pref in program downloads in the RAM storer, and vliw microprocessor re-executes this program.Repeat said process, compiler is constantly revised static prefetch policy according to the adjusted position of looking ahead of hardware dynamic, revises through stopping after the repetition of certain number of times, obtains at the personalization of this application system forecasting mechanism of looking ahead.Last debug process finishes, and compiler provides the program code that abundant optimization is looked ahead, as final program Solidification in built-in applied system.
Adopt the present invention can produce following useful technique effect:
1. it is medium to be read in advance the required instruction bag of vliw microprocessor to be moved command cache from chip external memory according to program process, the time-delay of having hidden the processor access external memory storage.
2. vliw microprocessor is placed artificial debugging under the true environment, dynamically adjust the position of nop_pref according to the practical programs operation result, improved the accuracy of looking ahead greatly, reach the effect of well looking ahead by prefetch controller.
3. machine information feeds back to compiler by emulator during with the looking ahead of hardware dynamic adjustment, the static prefetch policy of compiler adjustment regenerates the program code execution and looks ahead, utilize the NOP instruction that exists originally in the program, under the situation that does not increase any code length, realized instruction prefetch.
Description of drawings:
Fig. 1 is a background technology vliw microprocessor overall logic structural drawing;
Fig. 2 is the simulation hardware debugging enironment synoptic diagram that the present invention sets up;
Fig. 3 is the order format of prefetched instruction nop_pref of the present invention;
Fig. 4 has adopted vliw microprocessor building-block of logic of the present invention;
Fig. 5 is the structural representation of NOP buffer in the prefetch controller of the present invention;
Fig. 6 is the effect contrast figure of the present invention and traditional forecasting method.
Embodiment:
Fig. 1 is a background technology overall logic structural drawing.Whole embedded vliw microprocessor system comprises that CPU nuclear, Cache system, memory controller and four part: CPU nuclears of storer are responsible for execution commands, and execution result write back storer, it is connected with the Cache system with address bus by the instruction/data bus; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of CPU nuclear or data were not in Cache, memory controller was responsible for Cache is read in instruction or data from storer; Storer is held instruction and data.Be connected with address bus by instruction bus, data bus between CPU nuclear, Cache system, memory controller and these four modules of storer.
Fig. 2 is the simulation hardware debugging enironment synoptic diagram that the present invention sets up.The simulation hardware debugging enironment comprises debug host, hardware emulator and built-in applied system three parts.The program code that moves in the debugging embedded vliw microprocessor of debug host, compiler moves on debug host; Built-in applied system is the working environment of embedded vliw microprocessor; Hardware emulator provides the debug communications passage for debug host and embedded vliw microprocessor, also provides the communication port of looking ahead for compiler and prefetch controller.
In the debug phase of built-in applied system, this application system is operated in the real environment of trying one's best, and application code moves in the vliw microprocessor of application system by hardware emulator.This running state of programs is called artificial debugging, and debug host is by the RS of the addressable vliw microprocessor of emulator.
Compiler is according to vliw microprocessor clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, certain NOP instruction is replaced with prefetched instruction nop_pref, by hardware emulator program code is downloaded in the RAM storer of built-in applied system and carry out.In program process, dynamically adjust the position of prefetched instruction by prefetch controller according to the practical programs operational process, and adjusted result is write back the RAM storer.Program code moves a period of time in true application system after, debug host control vliw microprocessor is out of service, with the memory content debug host of reading back, revised according to the right static prefetch policy of the adjusted prefetching information of hardware dynamic by compiler, the program code after again correction being looked ahead downloads in the application system to be carried out.So repeatedly, form at the personalization of this application system forecasting mechanism of looking ahead, realize high efficiency looking ahead with lower cost through feedback back compiler repeatedly.
Fig. 3 is the order format of prefetched instruction nop_pref of the present invention.The nop_pref instruction comprises instruction effective field (V and E field), the compatible part of the address of looking ahead (intrc and cst field) and blank operation (spec field).The V field shows whether this NOP can be replaced by the nop_pref instruction.Wherein V can be replaced by the nop_pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop_pref instruction.The E field shows whether this nop_pref instruction is effective.E can be performed for effectively representing this nop_pref instruction, and E is that this nop_pref instruction of invalid representation cannot be performed.The instruction address that the common expression of intrc field and cst field is looked ahead.The spec field is represented the implication of NOP instruction blank operation in the compatible former instruction set, comprises the order code of NOP, and whether the periodicity of blank operation walks abreast etc.
Fig. 4 is a prefetch controller building-block of logic of the present invention.Prefetch controller links to each other by bus with CPU nuclear, command cache, memory controller.Prefetch controller is made up of bus monitor module, NOP buffer and prefetch module three parts.
The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module.Monitoring submodule is comparer, is used to monitor instruction bus and address bus between the CPU nuclear and instruction cache.Decoding submodule function has two: deposit the effective nop_pref instruction of all V fields in NOP buffer; The prefetch module execution is sent in the effective nop_pref instruction of E field looks ahead.The chronon module records is sent the moment t that looks ahead 0, the prefetched instruction piece enters the moment t of program cache p, the moment t of this instruction block of CPU nuclear request r, this three times are sent into the prefetch module analysis effect of looking ahead.
NOP buffer is responsible for the content and the address of the nop_pref instruction of buffer memory execution recently.Its degree of depth is N, writes fashionable employing circulation writing mode.Its content is for the prefetch module retrieval.
Prefetch module is made up of implementation sub-module and analysis submodule.Implementation sub-module sends prefetched command with effective nop_pref instruction decode to command cache.Analyze submodule according to t 0, t pAnd t rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable; Adjusted prefetched instruction is write back in the storer by memory controller.
Fig. 5 is the structural representation of NOP buffer among the present invention.NOP buffer is RAM, is responsible for the content and the address of the nop_pref instruction of buffer memory execution recently, and width is the M position, and the degree of depth is the N item, and a write port and a read port are arranged.The V field that write port will be carried out recently is the content of 1 nop_pref instruction and the buffer item that the address writes the write pointer appointment, write pointer with 1,2,3...N, 1,2... recycle design increase progressively.Read port is controlled by prefetch module.
Fig. 6 is the effect contrast figure of the present invention and traditional forecasting method.Wherein figure (a) is the effect of not looking ahead, and figure (b) is the effect of looking ahead of classic method, and figure (c) adopts the effect of looking ahead of the present invention.Get for three times that r1, r2 and r3 representation program are carried out in CPU nuclear among the figure and refer to request.The transverse axis express time, the every little lattice on the transverse axis are represented a clock period.The little lattice of Dark grey represent that CPU nuclear is in calculating.Grayish little lattice are illustrated in and carry out memory access.The little lattice of white are represented that CPU nuclear is got and are referred to that visit cache hits, and obtains required instruction immediately.The little lattice of black are represented that CPU nuclear is got and are referred to that visit cache does not hit, and can not obtain required instruction immediately.The little lattice of pattern are represented CPU nuclear execution prefetched instruction, and storer begins prefetched instruction.
In scheming (a), CPU nuclear sends the finger request of getting constantly respectively at r1, r2 and r3, and owing to there not being prefetch mechanisms, visit cache does not hit, and CPU nuclear can not get instructing, and pipeline stalling is waited for the reference-to storage end.Obtain instructing the back streamline to work on.
In figure (b), CPU nuclear r1 constantly before one-period send prefetch request, but prefetch request is late excessively, causes CPU nuclear to send constantly at r1 and gets when referring to ask, the content of looking ahead also is not admitted to cache, streamline is suspended.It is similar that CPU nuclear sends the situation of getting when referring to request constantly at r2, and streamline still is suspended.And that CPU checks the moment that the 3rd prefetched instruction carry out is too early, sends constantly at r3 and gets when referring to request, and instruction has been preserved in cache three cycles, causes waste to a certain degree.
In figure (c), CPU nuclear is carried out the moment of prefetched instruction through prefetch controller and the repeatedly adjustment of compiler in simulated environment, carry out immediately after making call instruction be taken into cache, both do not caused pipeline stalling, also do not waste the cache space, the overall performance of processor is greatly improved.
The present invention has been applied in the YHFT series DSP that University of Science and Technology for National Defence develops voluntarily, and the NOPbuffer width during design in the prefetch controller is 64, and the degree of depth is 64, and Tm is 3 * (t p-t 0).

Claims (5)

1. the instruction prefetch method of very long instruction word structure microprocessor sound attitude combination, it is characterized in that prefetch controller of design is responsible for instruction prefetch specially in vliw microprocessor, set up an artificial debugging environment and be used for instruction prefetch control, design prefetched instruction nop-pref and NOP instruction are compatible, when compiling, the non-operation instruction NOP in the vliw microprocessor is replaced with prefetched instruction, in debug process, look ahead opportunity by dynamic adjustment of prefetch controller, and the change information that will look ahead opportunity feeds back to compiler to optimize static prefetch policy, the new procedures code that will generate after the static prefetch policy of compiler adjustment downloads to the RAM storer, forms at the personalization of this application system forecasting mechanism of looking ahead through feedback back repeatedly.
2. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that the described method of setting up the artificial debugging environment is: by the compiler on the debug host, the built-in applied system that under true environment, moves, the hardware emulator that connects vliw microprocessor in debug host and the built-in applied system is formed the artificial debugging environment, compiler reads back the adjustment information of looking ahead of preserving in the RAM storer by hardware emulator from built-in applied system, the program code that compiler will be adjusted behind the prefetched instruction downloads in the built-in applied system for the vliw microprocessor execution by emulator.
3. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that described nop-pref instruction method for designing is: it comprises effective field V of instruction and E field, the compatible part spec of address intrc that looks ahead and cst field and blank operation field: the V field shows whether this NOP can be replaced by the nop-pref instruction, V can be replaced by the nop-pref instruction for effectively showing this NOP, and its address can be deposited in NOP buffer; V is that invalid this NOP that shows cannot be replaced by the nop-pref instruction; The E field shows whether this nop-pref instruction is effective, and E can be performed for effectively representing this nop-pref instruction, and E is that this nop-pref instruction of invalid representation cannot be performed; The instruction address that the common expression of intrc field and cst field is looked ahead, spec field are represented the implication of NOP instruction blank operation in the compatible former instruction set, comprise the order code of NOP, and whether the periodicity of blank operation is parallel.
4. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination, it is characterized in that described prefetch controller is that prefetched instruction is carried out in design in vliw microprocessor, look ahead effect and dynamically adjust the parts of looking ahead opportunity of analysis, its method for designing is: it is by the bus monitor module, NOP buffer and prefetch module three parts are formed, the bus monitor module passes through instruction bus, address bus and cache system and CPU nuclear phase connect, and NOP buffer passes through data bus, address bus links to each other with prefetch module with the bus monitor module:
The bus monitor module is formed by monitoring submodule, decoding submodule and chronon module, monitoring submodule is comparer, monitor instruction bus and address bus between the CPU nuclear and instruction cache, constantly CPU is examined on the instruction bus of command cache effective nop-pref instruction of V field and address thereof and deposit NOP buffer in; The decoding submodule links to each other with NOP buffer with the monitoring submodule with address bus by data bus, and function has two: deposit effective nop-pref instruction of all V fields and address thereof in NOP buffer; The prefetch module execution is sent in the effective nop-pref instruction of E field looks ahead; Chronon module records implementation sub-module sends the moment t that looks ahead 0, the prefetched instruction piece enters the moment t of program cache p, this instruction block of CPU nuclear request moment t r, this three times are sent into the prefetch module analysis effect of looking ahead by data bus;
NOP buffer is a buffer, adopt the first-in first-out structure, realize with RAM, be responsible for the content and the address of the nop-pref instruction of buffer memory execution recently, its degree of depth is the N item, a write port and a read port are arranged, the V field that write port will be carried out recently is the content of effective nop-pref instruction and the buffer item that the address writes the write pointer appointment, write pointer is with 1,2,3 ... N, 1,2 ... recycle design increases progressively writing mode, read port is controlled by prefetch module, and prefetch module is by read port retrieval NOP buffer;
Prefetch module is made up of implementation sub-module and analysis submodule, and implementation sub-module links to each other with command cache with the decoding submodule by data bus, is responsible for sending prefetched command to command cache; Analyze submodule and link to each other with memory controller with NOP buffer with address bus, according to t by data bus 0, t pAnd t rThe analysis effect of looking ahead judges whether the opportunity of looking ahead is suitable: if t p-t r<0 hysteresis of looking ahead needs with the nop-pref position in advance; If t p-t r>Tm then looks ahead too early, and the content of looking ahead occupies the cache space and can not used immediately, causes the waste in cache space, after the nop-pref position need being moved; If Tm>t p-t r>0 location-appropriate of looking ahead need not to adjust; The value of Tm is t p-t 0, the capable capacity decision of cache capacity and cache; The method of adjusting the nop-pref position is as follows: shift to an earlier date nop-pref if desired, analyze submodule retrieves the invalid nop-pref of last available E field of this nop-pref instruction in NOPbuffer address, with this nop-pref instruction this address by the memory controller write store; And the E field of this nop-pref instruction is made as writes back the original address of this instruction after invalid; Move back nop-pref if desired, prefetch module retrieves this nop-pref and instructs a back available nop-pref address in NOP buffer, and this nop-pref instruction is write this address; This nop-pref instruction that is made as after invalid is write raw address.
5. the instruction prefetch method of very long instruction word structure microprocessor sound attitude as claimed in claim 1 combination is characterized in that when a program is moved in embedded vliw microprocessor, and concrete execution in step is:
Compiler is according to cpu clock, memory bandwidth and cache structure, estimate the transmission time of an instruction block of CPU nuclear request, change certain NOP instruction in the assembly code after the compiling into prefetched instruction nop-pref, make calling program carry out prefetched instruction earlier, in advance instruction block is taken out from storer and deposit command cache in, instruct to be ready in cache when this instruction block is performed, CPU endorses to carry out immediately and need not to wait for; Compiler is to the prediction of all looking ahead of all instruction blocks, and to replace corresponding NOP be nop-pref;
In program process, dynamically adjust the position of nop-pref according to the practical programs operation result by prefetch controller;
Program is moved a period of time under simulated environment after, debug host is by the operation of emulator shut down procedure, and with the debug host of reading back of the program code in the RAM storer, dynamically promptly all nop-pref instruction addresses and content feed are given compiler to the position adjustment information of nop-pref to extract prefetch controller, compiler is adjusted its static prefetch policy promptly to adjust the position of nop-pref in program according to this information, the new procedures code that debug host will have been adjusted behind the position of nop-pref in program downloads in the RAM storer, vliw microprocessor re-executes this program, repeat said process, compiler is constantly revised static prefetch policy according to the adjusted position of looking ahead of hardware dynamic, revise through stopping after the repetition of certain number of times, obtain at the personalization of this application system forecasting mechanism of looking ahead, last debug process finishes, compiler provides the program code that abundant optimization is looked ahead, as final program Solidification in built-in applied system.
CNB2004100467655A 2004-09-17 2004-09-17 Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor Active CN1295599C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100467655A CN1295599C (en) 2004-09-17 2004-09-17 Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100467655A CN1295599C (en) 2004-09-17 2004-09-17 Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor

Publications (2)

Publication Number Publication Date
CN1598761A CN1598761A (en) 2005-03-23
CN1295599C true CN1295599C (en) 2007-01-17

Family

ID=34665699

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100467655A Active CN1295599C (en) 2004-09-17 2004-09-17 Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor

Country Status (1)

Country Link
CN (1) CN1295599C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043579B2 (en) 2012-01-10 2015-05-26 International Business Machines Corporation Prefetch optimizer measuring execution time of instruction sequence cycling through each selectable hardware prefetch depth and cycling through disabling each software prefetch instruction of an instruction sequence of interest
CN103377033B (en) * 2012-04-12 2016-01-13 无锡江南计算技术研究所 Arithmetic core and instruction management method thereof
US9330011B2 (en) * 2013-09-20 2016-05-03 Via Alliance Semiconductor Co., Ltd. Microprocessor with integrated NOP slide detector
CN113672555A (en) * 2021-07-13 2021-11-19 平头哥(杭州)半导体有限公司 Processor core, processor, system on chip and debugging system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363475B1 (en) * 1997-08-01 2002-03-26 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363475B1 (en) * 1997-08-01 2002-03-26 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor

Also Published As

Publication number Publication date
CN1598761A (en) 2005-03-23

Similar Documents

Publication Publication Date Title
Lin et al. Reducing DRAM latencies with an integrated memory hierarchy design
US10482203B2 (en) Method for simulating execution of an application on a multi-core processor
US9235393B2 (en) Statically speculative compilation and execution
EP3028149B1 (en) Software development tool
CN1177275C (en) Method and system for reducing write traffic in processors
JP3739491B2 (en) Harmonized software control of Harvard architecture cache memory using prefetch instructions
US7137111B2 (en) Aggressive prefetch of address chains
Raza et al. GPU-accelerated data management under the test of time
CN1732433A (en) Multithreaded processor capable of implicit multithreaded execution of a single-thread program
CN1287281C (en) Transform of single line routine code to conjecture preexecute starting code
CN101034345A (en) Control method for data stream and instruction stream in stream processor
CN1746865A (en) Method for realizing reconfiguration instruction cache part of digital signal processor
CN1650266A (en) Time-multiplexed speculative multi-threading to support single-threaded applications
CN1347525A (en) Optimized bytecode interpreter of virtual machine instructions
CN1295599C (en) Instruction prefetch method of association of dynamic and inactive of overlength instruction word structure microprocessor
CN102722451A (en) Device for accessing cache by predicting physical address
CN112148366A (en) FLASH acceleration method for reducing power consumption and improving performance of chip
Nicholson et al. Hetcache: Synergising nvme storage and GPU acceleration for memory-efficient analytics
CN1306433C (en) Method and device used for providing fast remote register access
CN1934542A (en) A cache mechanism
CN1248109C (en) Information processing unit and information processing method
CN103150197B (en) Based on the code Cache management method of static division
CN1652085A (en) Synchronous processing method and its device
Imhmed Understanding Performance of a Novel Local Memory Store Design through Compiler-Driven Simulation
CN1232911C (en) Method and configuration for adaptable accessing instruction and data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant