CN1560731A - 32-bit media digital signal processor - Google Patents

32-bit media digital signal processor Download PDF

Info

Publication number
CN1560731A
CN1560731A CNA2004100167538A CN200410016753A CN1560731A CN 1560731 A CN1560731 A CN 1560731A CN A2004100167538 A CNA2004100167538 A CN A2004100167538A CN 200410016753 A CN200410016753 A CN 200410016753A CN 1560731 A CN1560731 A CN 1560731A
Authority
CN
China
Prior art keywords
instruction
data
processor
register file
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100167538A
Other languages
Chinese (zh)
Other versions
CN1297888C (en
Inventor
鹏 刘
刘鹏
姚庆栋
李东晓
王维东
史册
陈晓毅
周莉
蔡钟
吴皓
郑伟
赖莉雅
琚小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2004100167538A priority Critical patent/CN1297888C/en
Publication of CN1560731A publication Critical patent/CN1560731A/en
Application granted granted Critical
Publication of CN1297888C publication Critical patent/CN1297888C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The invention discloses a microprocessor and computer system, aims to provide a 32 bits media digital signal processor. The processor includes plastic processor core, flow control unit, command taking unit, commands, operation number reading, data, command high speed buffer, data high speed buffer, on-chip memory, universal register file, media register file, bus interface unit, system bus, integer executing unit, signal executing unit, media executing unit, arithmetic logic unit, barrel shifter, integer multiplier and adder, and by-pass unit and system control coprocessor. The media digital signal processor command structure in the invention can be divided into register-register command facing memory operation, and register-memory command facing to memory operation. It is good at executing system program, and executing digital signal processing program, it ha characters of RISC processor and DSP processor, it is an integral of the RISC and DSP system structures.

Description

32 media digital signal processors
Technical field
The present invention relates to microprocessor and computer system, more particularly, the present invention relates to a kind of 32 media digital signal processors, its seating surface is to the RISC of register instruction, towards the DSP instruction of storer and the medium instruction of SIMD class schizotype operation, constituted the abundant addressing mode of media digital signal processor and the structural system of instruction manipulation.
Background technology
Traditional compacting instruction set processor RISC (Reduced Instruction Set Computer) instruction and digital signal processor DSP (Digital Signal Processor) instruction are respectively typical case's representatives of register-register and register-storer class instruction.RISC has acted on the cardinal principle of " simplifying " all the time, adopts the addressing mode towards register, instructs isometricly, and form is simple.The RISC instruction can only be carried out an operation in a clock period, and an instruction can only be visited storage unit at most one time.RISC instructs the topmost characteristics to be only to allow Load/Store instruction access storage unit, and other instruction can only be handled the data of taking out in the register.The benefit of doing so mainly is that instruction form and addressing mode are single, and the relevant hardware microstructure is also comparatively simple, and the design of SDK (Software Development Kit) is relatively easy.But, carry out at needs in the applications such as multimedia that mass data calculates, communication, this single instruction addressing mode and instruction manipulation, and the order property of register-register often becomes the bottleneck of restriction RISC data processing performance.So risc processor relatively is suitable for carrying out application programs such as the less operating system of data throughout, word processing.
The DSP instruction designs at data processing field, emphasis point has been placed on raising data-handling capacity aspect, combine abundant instruction addressing pattern and instruction manipulation, be good at carrying out a large amount of, real-time date arithmetic, so powerful and flexible when carrying out data processor.Dsp processor is except Load/Store instruction, and other instruction is direct storage unit access also, and an instruction can visit a plurality of storage unit simultaneously, has accelerated the access speed of data.Except towards memory addressing, addressing mode and address producing method that the DSP instruction is supported also are (as window addressing, bit reversals) versatile and flexible, instruction manipulation is than horn of plenty (as multiply accumulating, zero-overhead loop etc.), and an instruction can be carried out multi-mode operation in a clock period.But the diversity of addressing mode and instruction manipulation also makes order format comparatively complicated, and the dsp processor microstructure realizes that than the RISC complexity while has also proposed higher requirement to the design of Software tool.
In traditional System on Chip/SoC design, the instruction set processing element is generally selected RISC or DSP.RISC generally is used for system's general controls and bears the not high Processing tasks of computation complexity.DSP is exclusively used in and bears the higher digital signal processing task of computation complexity, is widely used in various fields: from signal Processing such as the voice of frequency spectrum low side, audio frequency, to signal Processing such as the high-end image of frequency spectrum, videos.For different information handling tasks, optimum ISP architecture is different, and the ISP that neither one is determined can satisfy the various demands of all application, comprises speed, power consumption, area etc.Obviously, RISC and the architecture that DSP merges mutually can provide more flexible more rational basic framework for the customized development of high performance-price ratio ASIP parts.In the fusion of RISC and DSP, relatively Tu Chu research work such as OMAP system and E1-32 system.The OMAP system mainly is made up of an ARM nuclear and a DSP nuclear.The main control and treatment of being responsible for of ARM nuclear, DSP nuclear then lays particular emphasis on some digital algorithm programs of execution, as FFT, FIR etc.But complicated exchanges data makes total system very complicated.And total system is that two nuclears are simply pieced together, and so just needs double resource, so exist the waste of resource.Also there is very big inconvenience in the programming aspect.The E1-32 structure merges general processor (GPP) and DSP, has added the DSP unit on the basis of GPP.Basic ALU is responsible for the operation of RISC instruction, and digital signal processing is responsible in the DSP unit, and both can executed in parallel.The E1-32 structure has been carried out organic fusion to RISC and DSP to a certain extent, has avoided owing to two simple many shortcomings that cause that merge of nuclear.
Flourish along with present embedded system, its application that contains more and more widely relates to multimedia and handles data communication, fields such as consumer electronics.Embedded system no longer is a single application system in the past, but the ability that will carry out multiple function and task combines.So Embedded Application constantly proposes higher requirement to many functional imperative such as processor structure, microstructure design, cost performance, data-handling capacity, task ability of regulation and control, SDK (Software Development Kit).And the order structure of RISC and DSP has his own strong points, and biases toward different systems separately and carries out the function and application field.If they can be merged, therefrom extract and a kind ofly meet existing mass data more and handle, have again real-time control, not only powerful but also meet the structural system that requires responsive embedded system to require to cost performance, can adapt to the needs in market so undoubtedly better.
Summary of the invention
The objective of the invention is to overcome deficiency of the prior art, a kind of media digital signal processor towards applications such as multimedias is provided.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
The present invention proposes a kind of 32 media digital signal processor, by flow control unit, get and refer to that unit, instruction, fetch operand, data, instruction cache, data cache, on-chip memory, general-purpose register file, medium register file, Bus Interface Unit, system bus, Integer Execution Units, signal execution unit, medium performance element, ALU, barrel shifter, integer adder and multiplier connect to form through circuit.Wherein instruction cache, data cache, data-carrier store connect with Bus Interface Unit, and general-purpose register file is connected with the same Integer Execution Units of medium register file, signal execution unit, medium performance element.Coprocessor is system's control coprocessor, and unusual control register group is used for assist process device nuclear control setting, condition managing and abnormality processing; The storage administration registers group is mainly used in the assistance memory management; Media signal processor virtual support storage system, integrated complete memory management unit in the coprocessor, instruction transformation are searched buffering is carried out instruction address and data address according to the setting of corresponding control bit in the configuration register actual situation static conversion or dynamic translation.Media digital signal processor adopts the memory organization mode of stratification to improve processor performance, the one-level of the most close integer processor core is an internal register, comprises the general-purpose register file of 32 32 bit registers formations and the medium register file of 8 64 bit register formations; The second level is data-carrier store on high-speed cache and the sheet on the sheet, the Harvard structure that high-speed cache adopts instruction cache to separate with data cache, for strengthening digital signal processing performance spy data-carrier store on the sheet is set, is applicable to the data of depositing frequent access such as coefficient; The third level is a chip external memory, can connect the storer of dissimilar, friction speed grade and capacity.Bus Interface Unit provides the interface of processor core and system bus, articulates peripheral hardware on external memory and the sheet on the system bus.
Among the present invention, the integer processor core adopts six stage pipeline structure, is respectively to get finger (IF), decoding (ID), calculated address (DA), memory access (DM), execution (EX), six pipelining-stages of write-back (WB).Getting finger mainly is the access instruction storer, relevant retransmission unit, instruction decode and the user's expansion module of data of the comparison of the decode stage execution command page, general-purpose register file, the data dependence of generation 1 of calculated address level executive address and address generation 2, medium register file is differentiated and retransmission unit 2, and jump-transfer unit is carried out the control of condition distinguishing redirect, programmable counter PC and selected; Memory access level accesses data memory; Execution level comprises page of data comparison, alignment of data, ALU, multiply accumulating unit and user's expansion; Write-back mainly is write-back general-purpose register file and medium register file.
Elementary instruction, signal Processing instruction and medium instruction have been comprised among the present invention.Get refer to level execute instruction the simultaneously read access of high-speed cache and the conversion lookup table conversion of instruction virtual address, decode stage execute instruction simultaneously decoding, instruct page comparison and general-purpose register source operand to read, this two-stage all is the same to all instructions.Control is shifted the class instruction and is calculated transfer address in the address computation level, the direction that decision-making process shifts.The instruction that needs accesses data memory, is carried out page of data at execution level and is compared and the partial words alignment function at memory access level reference-to storage (comprising data-carrier store on cache systems and the sheet) all in address level computing store address.All computing class instructions are carried out arithmetic operation at execution level, comprise arithmetic, logic, displacement, translation operation and multiplication, multiply-add operation.In write back stages, the result writes back to register file, comprises general-purpose register file and medium register file.
Adopted the notion of instruction component model among the present invention, constitute expression formula orthogonality principle according to instruction, designed comprise have the RISC characteristics towards the MDF of register instruction, have the MDD instruction towards storer of DSP characteristics, and a MDS instruction of SIMD class schizotype operation.
Having adopted each pipelining-stage of bypass logic respectively corresponding Bypass Control and data-signal to be given to the by-pass unit unification among the present invention differentiates and handles, precedence and control signal corresponding that it is carried out according to instruction, data dependence before and after detecting between the instruction is selected correct data from the plurality of data source.Testing result is delivered to flow control unit simultaneously and is carried out pipeline state control.Bypass logic is exactly will solve when instruction to need the execution result that instructs previously, and the front instruction does not also write back the result data collision contradiction of register file, and perhaps the certain control signal of data contradicts generation that exists at the front and back instruction helps the standstill state of streamline to handle.Bypass can make present instruction walk around the link that the waiting register file data reads, and directly obtains the action required number and enter next flow beat from streamline, carries out efficient so can help to improve processor.
Allow the nonidentity operation instruction to walk not isometric flow beat number among the present invention at execution level.The complete streamlined of complex calculation operation not only makes the clock frequency of streamline no longer be limited by operation time, and makes consecutive identical complex calculation instruction sequence realize monocycle computing handling capacity index in the mode of streamline expansion.Allow different instruction to walk not isometric flow beat number, improved processor performance at execution level.
Among the present invention, general-purpose register file adopts 32 * 32 register to organize form, and the medium register file is that 64 * 8 register is organized form.When carrying out elementary instruction, general-purpose register file is general two to read a general purpose register set of writing; Be 4 to read 2 registers group of writing when carrying out the signal Processing instruction; When carrying out the medium instruction, can carry out exchanges data with the medium register file.
According to handling on the unusual basis of streamline competition and processor, propose and realized a kind of streamline centralized control scheme among the present invention based on finite state machine (FSM).Be to accelerate the response speed of flow control unit, the status signal of current input is not added clock latch, determine the action of next bat of streamline at once, avoid and reduce streamline and cross operation, thereby improved operational efficiency.
Among the present invention reading of medium register source operand is placed on the address and generates level, reading of general-purpose register source operand is placed on decode stage, be owing to just may use in address generation level subsequently, and medium register source operand is not used in control or address computation, and the address that therefore can lag behind generates level and reads.Like this, can reduce the streamline competition that causes because of medium register manipulation data/coherency between instruction, optimize thereby can improve the pipelining-stage action.
Compared with prior art, the invention has the beneficial effects as follows:
Register-register class that the order structure of media digital signal processor of the present invention is divided into towards register manipulation is instructed and is instructed towards register-storer class of storage operation.This two classes instruction corresponds respectively to MDF (MD32 Fundament) and the instruction of MDD (MD32 DSP) class in the processor.Processor has also designed the MDS class instruction of single instruction multiple data SIMD (Single Instruction Multi-Data), and it had both comprised register-register, had comprised register-memory instructions type again, can carry out the schizotype operation to data.Media digital signal processor fully merges key elements such as the instruction manipulation of RISC and DSP, addressing mode, has formed the new order set of a class with own characteristic.It has determined the processor instruction structure both to be good at executive system routine, is good at the combine digital signal handler again, has risc processor and dsp processor design feature concurrently, is the body that organically blends of RISC and DSP architecture.
Description of drawings
Fig. 1 is a system construction drawing of the present invention.
Fig. 2 is pipeline organization figure of the present invention.
Fig. 3 is the implementation on the typical instructions streamline of the present invention.
Fig. 4 is the super pipeline expansion structure of execution level of the present invention.
Fig. 5 is the structure of ALU of the present invention.
Fig. 6 is a running status control chart of the present invention.
Embodiment
The system construction drawing of the specific embodiment of the invention as shown in Figure 1.Comprise Integer Execution Units (MDF), signal execution unit (MDD), medium performance element (MDS) three cover instruction systems.
The streamline of instruction is seen Fig. 2, comprises and gets finger, decoding, address, memory access, execution, six grades of flowing water of write-back.
Get finger: from the location of instruction data that get instruction, search TLB and Tag list item, instruction virtual address is converted to physical address
Decoding: instruction is deciphered, provided encoded control signal and data-signal, BPU and PCU utilize it to carry out partial status control; The access register file; Provide the information whether instruction cache hits.
The address: utilize two can concurrent working the address calculation calculating operation count the address, support multiple instruction addressing pattern.
Memory access: access data storage unit, search TLB and Tag list item (visit data high-speed cache), data virtual address is converted to physical address.
Carry out: add, subtract, instruction arithmetical operations such as displacement, logic, multiplication, and produce the information whether metadata cache hits.
Write-back: the result of calculation that address computation background register updating value that the address computation level is obtained and execution level produce writes back register file.
Fig. 3 is seen in the implementation arrangement of some typical instructions on processor pipeline.Processor is arranged in same pipelining-stage with conversion lookup table conversion and cache access and walks abreast and carry out, to simplify the optimization streamline.Correspondingly, the cache structure in the processor is based on the virtual address index.Get refer to level execute instruction the simultaneously read access of high-speed cache and the conversion lookup table conversion of instruction virtual address, decode stage execute instruction simultaneously decoding, instruct page comparison and general-purpose register source operand to read, this two-stage all is the same to all instructions.Operation of data operation is adjusted to after the data store access, to support directly operational order towards storer.The processor instruction set structural support is directly towards the operational order of storer, and supports abundant memory addressing mode.The address produces level two address-generation units 1 and address-generation unit 2 is set, and supports the multiple addressing mode to two memory operand.The memory access level correspondingly can read two memory operand, and one from cache systems, and another is from data-carrier store on the sheet.After data store access is a pipelining-stage of single phase clock, be control pipelining-stage time delay, page of data comparison and data are read alignment function and are separated, and are put into follow-up execution level.Control is shifted the class instruction and is calculated transfer address, the direction that decision-making process shifts in address generation level.Need the instruction of accesses data memory all to produce level computing store address,, carry out page of data at execution level and compare and the partial words alignment function at memory access level reference-to storage (comprising data-carrier store on cache systems and the sheet) in the address.All computing class instructions are carried out arithmetic operation at execution level, comprise arithmetic, logic, displacement, translation operation and multiplication, multiply-add operation.In write back stages, the result writes back to register file, comprises general-purpose register file and medium register file.
The calculation function parts of execution level mainly are ALU (ALU) and multiply accumulating (MAC).Multiplication, multiply-add operation are born by MAC, and remaining computing is all born by ALU, comprise arithmetical operation, logical operation and the shift operation of all MDF, MDD and MDS instruction, and MDS instructs distinctive data-switching (pack, unpack, transposition etc.) to handle.As the medium ALU structured flowchart of Fig. 5, calculation function is divided into 4 classes for design---arithmetical operation, logical operation, shift operation and data-switching.The principal feature of medium operational order is exactly to support the inferior word parallel work-flow of single instruction multiple data, operation result special processing and media-specific instruction.The realization of in the design of medium ALU PSADBD being instructed is divided into 2 and claps complete flowing water, and as shown in Figure 5, the 1st claps 8 parallel bit unsigned number subtractions of execution 8 tunnel single instruction multiple datas also takes absolute value, and the 2nd bat is summed into end product with 8 tunnel 8 bit unsigned numbers.As shown in Figure 4, operation and the computing of general mathematical logical block only need 1 timeticks; Absolute error and result are 2 bat flowing water; Parallel 16 bit multiplication (adding up) computings of 4 division formulas in the medium instructions are 2 to clap flowing water, and 16 bits take advantage of that to add instruction be 3 to clap flowing water; And 32 bit multiplication (adding) computing needs 4 are clapped flowing water.Such as in DSP uses, the convolution of 256 sampled values and constant coefficient, its its main operational is 256 continuous multiply accumulatings instructions, even adopt 32 bit word length computings, also can initial 3 clap postpone waits after, each timeticks is finished a multiply accumulating computing.Concentrate at the MDF of processor basic instruction set and MDD signal instruction, multiplication (adding up) instruction is carried out multiplication to the integer of 32 bit widths; And in MDS medium instruction set, multiplication (take advantage of and add) instruction PMULLSD (PMACLSD), PMULHSD (PMACHSD), PMULLUD (PMACLUD) and PMULHUD (PMACHUD) carry out 4 tunnel single instruction multiple data parallel multiplications (take advantage of and add) to the integer of 16 bit widths.Simultaneously, MAC supports inferior word concurrent operation and fissionable data channel.
MDS medium instruction set has 41 instructions, R-type coded format with reference to the MDF elementary instruction, take special main operation sign indicating number (Instr[31:26]=111111) the expression whole M DS instruction set of a sky, and distinguish each MDS instruction with the function code of 6 bits, and designed 4 kinds of machine code coded formats according to operand addressing mode, as shown in table 5.
Finite state machine (FSM) is divided into the running status of system: slide (Slip), pause (Stall), operation (Run), restart (Restart), (Reset) resets.Pause is meant that the instruction of pipelining-stage execution is carried out as usual in the back, and the instruction of front pipelining-stage suspends a kind of state of carrying out.Stop to be meant that whole streamline is all out of service, wait for that restarting control signal changes it over to other running status.FSM changes processor over to corresponding running status according to streamline different request signals at different levels.Relation is as shown in Figure 6 between each running status:
Have the data dependence relation when between the front and back instruction of carrying out, and bypass logic be can't solve this correlationship the time, system produces and slides (slip) state, has solved this data dependence up to system, and processor just can change other running status over to.
When processor access instruction or data space did not hit, system entered standstill state (stall), restarts stream line operation up to FSM.Because processor need go to visit sheet external memory space, so the clock periodicity that halted state expends is bigger.After FSM provided Restart Signal (restart), processor was jumped out the halted state of system, according to the implementation status of present instruction, changed over to and paused or running status.
After system provided reset signal (reset), entire process device state all returned to default value, through behind the certain hour, entered running status.Under the normal condition, system is in running status (run), unless the system request that enters sliding state (slip) or standstill state (stall) is arranged.
FSM is by to after the request signal, status signal and the comprehensive judgement to running status that come from retransmission unit and processor various piece, according to the relation between the running status conversion, produce certain control signal, send back in the corresponding module and go, thereby the running status of control entire process device makes the co-ordination of media signal processor system.
RD_A class instruction component model among the table 1RD-type,
RD_B among the table 2RD-type ~ RD_G class instruction component model
Table 3ID-type instruction component model
Table 4P-type instruction component model
Table 5S-type class instruction component model
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 ?8 7 6 ?5 ?4 ?3 ?2 ?1 ?0
??Opcode=000000 ????01 ??????ARm ????????????Rt ????????????Rd ??????????Disp ????SRA/SRL/SLL
????10 ????????????Modm
????11 ??????Direct1 ??????????????????????????Direct1
RD_A class instruction component model among the table 1RD-type
31 30 29 28 27 26 25 24 23 22 21 20 ?19 18 17 16 15 14 13 12 11 10 9 ?8 7 6 ?5 4 ?3 ?2 ?1 ?0
???Opcode=000000 ???Disp1 ??????ARm ???Disp1 ??????ARn ???E1=00 ??????Dst ???????Disp2 ??1 ???Function
???E2=00 ??????ARm ??????????Rt ???E1=01 ???????Modm
???E2=01 ??????????Immediate ???????Disp
????????????Rs ????E2=00 ??????ARn ???E1=10 ???????Modn
????E2=01 ???????Disp
???Modm ??????ARm ????Modm ??????ARn ???E1=11 ???????Modn
RD_B among the table 2RD-type ~ RD_G class instruction component model
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 ?9 ?8 ?7 ?6 ?5 ?4 ?3 ?2 ?1 ?0
????????????Load/Store ??????11111 ???????????Rt ?????Modm ?????ARm ??????????Disp
????????????ALUI-type ?????00 ?????Dst
?????01 ????????????????????????????????Immediate
Table 3ID-type instruction component model
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 ?9 ?8 ?7 ?6 ?5 ?4 ?3 ?2 ?1 ?0
?????????????Opcode ????Modm ???????Src1 ????Modm ??????Src2 B1 B2 ?????Dst1 B3 ?????????Modn ???????ARm ???????ARn
?????A ?????P D 0
Table 4P-type instruction component model
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 ?9 ?8 ?7 ?6 ?5 ?4 ?3 ?2 ?1 ?0
???Opcode=111111 ?????00 ??????MRs ?????00000 ?????Gg ???MRd ??????????????Sa ??????Function
?????01 ????00 ???????MRt ??????????????00000
?????10 ??????000 ??????????Rt
?????11 ??????MRs ????Disp ???????ARn ??????????????Modn
Table 5S-type class instruction component model
At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above examples of implementation, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims (10)

1, a kind of 32 media digital signal processor, it is characterized in that comprising the integer processor core, flow control unit, get the finger unit, instruction, fetch operand, data, instruction cache, data cache, on-chip memory, general-purpose register file, the medium register file, Bus Interface Unit, system bus, Integer Execution Units, signal execution unit, the medium performance element, ALU, barrel shifter, the integer adder and multiplier, connect to form through circuit, instruction cache wherein, data cache, data-carrier store connects with Bus Interface Unit, general-purpose register file and the same Integer Execution Units of medium register file, signal execution unit, the medium performance element connects; Also comprise by-pass unit and
System's control coprocessor, system's control coprocessor are searched buffering by unusual control register group, storage administration registers group and instruction transformation and are formed; Unusual control register group is used for assist process device nuclear control setting, condition managing and abnormality processing; The storage administration registers group is used to assist memory management; Media digital signal processor virtual support storage system, the integrated complete memory management unit of system's control coprocessor, instruction transformation are searched buffer memory carries out instruction address and data address according to the setting of corresponding control bit in the configuration register actual situation static conversion or dynamic translation.
2, processor as claimed in claim 1, it is characterized in that adopting the memory organization mode of stratification, the one-level of the most close integer processor core is an internal register, comprises the general-purpose register file of 32 32 bit registers formations and the medium register file of 8 64 bit register formations; The second level is data-carrier store on high-speed cache and the sheet on the sheet, the Harvard structure that high-speed cache adopts instruction cache to separate with data cache, for strengthening digital signal processing performance spy data-carrier store on the sheet is set, is used to deposit the data of frequent access such as coefficient; The third level is a chip external memory, can connect the storer of dissimilar, friction speed grade and capacity; Bus Interface Unit provides the interface of processor core and system bus, articulates peripheral hardware on external memory and the sheet on the system bus.
3, processor as claimed in claim 1, it is characterized in that described integer processor core adopts six stage pipeline structure, is respectively to get finger (IF), decoding (ID), calculated address (DA), memory access (DM), execution (EX), six pipelining-stages of write-back (WB); Getting finger mainly is the access instruction storer; Relevant retransmission unit, instruction decode and the user's expansion module of data of the comparison of the decode stage execution command page, general-purpose register; The data dependence of generation 1 of calculated address level executive address and address generation 2, medium register file is differentiated and retransmission unit 2, and jump-transfer unit is carried out the control of condition distinguishing redirect, programmable counter PC and selected; Memory access level accesses data memory; Execution level comprises page of data comparison, alignment of data, ALU, multiply accumulating unit and user's expansion; Write-back mainly is write-back general-purpose register file and medium register file.
4, processor as claimed in claim 3, the instruction that it is characterized in that processor comprise elementary instruction, signal Processing instruction and medium instruction, get to refer to level execute instruction the simultaneously read access of high-speed cache and the conversion lookup table conversion of instruction virtual address; Execute instruction simultaneously decoding, instruction page comparison and general-purpose register source operand of decode stage is read, and this two-stage all is the same to all instructions; Control is shifted the class instruction and is calculated transfer address in the address computation level, the direction that decision-making process shifts; The instruction that needs accesses data memory is all in address level computing store address, and data-carrier store on memory access level reference-to storage, cache systems and sheet is carried out page of data at execution level and compared and the partial words alignment function; All computing class instructions are carried out arithmetic operation at execution level, comprise arithmetic, logic, displacement, translation operation and multiplication, multiply-add operation; In write back stages, the result writes back to register file, comprises general-purpose register file and medium register file.
5, processor as claimed in claim 4, it is characterized in that described instruction constitutes the design of expression formula orthogonality principle according to instruction, comprise have the RISC characteristics towards the MDF instruction of register, have the DSP characteristics towards the MDD instruction of storer and MDS instruction with the operation of SIMD class schizotype.
6, processor as claimed in claim 3, it is characterized in that each pipelining-stage is given to corresponding Bypass Control and data-signal the by-pass unit unification respectively and differentiates and handle, precedence and control signal corresponding according to the instruction execution, data dependence before and after detecting between the instruction is selected correct data from the plurality of data source; Testing result sends to flow control unit simultaneously and carries out pipeline state control, makes present instruction walk around the link that the waiting register file data reads, and directly obtains the action required number and enter next flow beat from streamline.
7, processor as claimed in claim 3 is characterized in that nonidentity operation instruction walks not isometric flow beat number at execution level, allows different instruction to walk not isometric flow beat number at execution level.
8, processor as claimed in claim 4 is characterized in that described general-purpose register file adopts 32 * 32 register to organize form, and the medium register file is that 64 * 8 register is organized form; When carrying out elementary instruction, general-purpose register file is general two to read a general purpose register set of writing; Be 4 to read 2 registers group of writing when carrying out the signal Processing instruction; When carrying out the medium instruction, can carry out exchanges data with the medium register file.
9, processor as claimed in claim 4 is characterized in that reading of medium register source operand is placed on the address generates level, and reading of general-purpose register source operand is placed on decode stage.
10, processor as claimed in claim 3, it is characterized in that according to handling streamline competition and the unusual basis of processor, realization is based on the streamline centralized control scheme of finite state machine (FSM), the status signal of current input is not added clock latch, determine the action of next bat of streamline at once.
CNB2004100167538A 2004-03-03 2004-03-03 32-bit media digital signal processor Expired - Fee Related CN1297888C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100167538A CN1297888C (en) 2004-03-03 2004-03-03 32-bit media digital signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100167538A CN1297888C (en) 2004-03-03 2004-03-03 32-bit media digital signal processor

Publications (2)

Publication Number Publication Date
CN1560731A true CN1560731A (en) 2005-01-05
CN1297888C CN1297888C (en) 2007-01-31

Family

ID=34440633

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100167538A Expired - Fee Related CN1297888C (en) 2004-03-03 2004-03-03 32-bit media digital signal processor

Country Status (1)

Country Link
CN (1) CN1297888C (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739235A (en) * 2008-11-26 2010-06-16 中国科学院微电子研究所 Processor unit for seamless connection between 32-bit DSP and universal RISC CPU
CN101957743A (en) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 Parallel digital signal processor
CN102033737A (en) * 2010-06-13 2011-04-27 苏州和迈微电子技术有限公司 Embedded system oriented multi-stage flowing water digital signal processor system structure
CN101290565B (en) * 2007-03-30 2012-11-28 英特尔公司 Method and apparatus of executing multiplication function
CN103257884A (en) * 2013-05-20 2013-08-21 深圳市京华科讯科技有限公司 Virtualization processing method for equipment
CN103257885A (en) * 2013-05-20 2013-08-21 深圳市京华科讯科技有限公司 Media virtualization processing method
CN103309725A (en) * 2013-05-20 2013-09-18 深圳市京华科讯科技有限公司 Network virtualized processing method
CN104516718A (en) * 2013-10-07 2015-04-15 德克萨斯仪器德国股份有限公司 Pipeline finite state machine
CN105824696A (en) * 2016-03-18 2016-08-03 同济大学 Processor device with timed interruption function
CN106406812A (en) * 2015-10-02 2017-02-15 上海兆芯集成电路有限公司 Microprocessor, and method of executing fused composite arithmetical operation therein
CN108052348A (en) * 2006-09-22 2018-05-18 英特尔公司 For handling the instruction of text string and logic
CN108229668A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Operation implementation method, device and electronic equipment based on deep learning
CN108845829A (en) * 2018-07-03 2018-11-20 中国人民解放军国防科技大学 Method for executing system register access instruction
CN110245096A (en) * 2019-06-24 2019-09-17 苏州硅岛信息科技有限公司 A method of realizing that processor is directly connected to extension computing module
CN111026445A (en) * 2019-12-17 2020-04-17 湖南长城银河科技有限公司 Intelligent identification method and chip
CN112463723A (en) * 2020-12-17 2021-03-09 王志平 Method for realizing microkernel array

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189090B1 (en) * 1997-09-17 2001-02-13 Sony Corporation Digital signal processor with variable width instructions
US6678765B1 (en) * 2000-02-07 2004-01-13 Motorola, Inc. Embedded modem
US20030196072A1 (en) * 2002-04-11 2003-10-16 Chinnakonda Murali S. Digital signal processor architecture for high computation speed

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052348A (en) * 2006-09-22 2018-05-18 英特尔公司 For handling the instruction of text string and logic
US11537398B2 (en) 2006-09-22 2022-12-27 Intel Corporation Instruction and logic for processing text strings
CN101290565B (en) * 2007-03-30 2012-11-28 英特尔公司 Method and apparatus of executing multiplication function
CN101739235A (en) * 2008-11-26 2010-06-16 中国科学院微电子研究所 Processor unit for seamless connection between 32-bit DSP and universal RISC CPU
CN102033737A (en) * 2010-06-13 2011-04-27 苏州和迈微电子技术有限公司 Embedded system oriented multi-stage flowing water digital signal processor system structure
CN101957743A (en) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 Parallel digital signal processor
CN101957743B (en) * 2010-10-12 2012-08-29 中国电子科技集团公司第三十八研究所 Parallel digital signal processor
CN103257884A (en) * 2013-05-20 2013-08-21 深圳市京华科讯科技有限公司 Virtualization processing method for equipment
CN103257885A (en) * 2013-05-20 2013-08-21 深圳市京华科讯科技有限公司 Media virtualization processing method
CN103309725A (en) * 2013-05-20 2013-09-18 深圳市京华科讯科技有限公司 Network virtualized processing method
CN104516718A (en) * 2013-10-07 2015-04-15 德克萨斯仪器德国股份有限公司 Pipeline finite state machine
CN106406812A (en) * 2015-10-02 2017-02-15 上海兆芯集成电路有限公司 Microprocessor, and method of executing fused composite arithmetical operation therein
CN105824696B (en) * 2016-03-18 2019-07-05 同济大学 A kind of processor device with Interruption function
CN105824696A (en) * 2016-03-18 2016-08-03 同济大学 Processor device with timed interruption function
CN108229668A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Operation implementation method, device and electronic equipment based on deep learning
CN108229668B (en) * 2017-09-29 2020-07-07 北京市商汤科技开发有限公司 Operation implementation method and device based on deep learning and electronic equipment
CN108845829A (en) * 2018-07-03 2018-11-20 中国人民解放军国防科技大学 Method for executing system register access instruction
CN108845829B (en) * 2018-07-03 2021-06-25 中国人民解放军国防科技大学 Method for executing system register access instruction
CN110245096A (en) * 2019-06-24 2019-09-17 苏州硅岛信息科技有限公司 A method of realizing that processor is directly connected to extension computing module
CN111026445A (en) * 2019-12-17 2020-04-17 湖南长城银河科技有限公司 Intelligent identification method and chip
CN112463723A (en) * 2020-12-17 2021-03-09 王志平 Method for realizing microkernel array

Also Published As

Publication number Publication date
CN1297888C (en) 2007-01-31

Similar Documents

Publication Publication Date Title
CN1297888C (en) 32-bit media digital signal processor
EP3726389B1 (en) Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
CN103150146B (en) Based on ASIP and its implementation of scalable processors framework
US10417175B2 (en) Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator
US10891240B2 (en) Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US20190205263A1 (en) Apparatus, methods, and systems with a configurable spatial accelerator
EP3776229A1 (en) Apparatuses, methods, and systems for remote memory access in a configurable spatial accelerator
EP3776216A1 (en) Apparatus, methods, and systems for integrated performance monitoring in a configurable spatial accelerator
EP3776228A1 (en) Apparatuses, methods, and systems for unstructured data flow in a configurable spatial accelerator
Cho et al. Decoupling local variable accesses in a wide-issue superscalar processor
US20210200541A1 (en) Apparatuses, methods, and systems for configurable operand size operations in an operation configurable spatial accelerator
CN1142484C (en) Vector processing method of microprocessor
WO2020005447A1 (en) Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
CN101739235A (en) Processor unit for seamless connection between 32-bit DSP and universal RISC CPU
US20220100680A1 (en) Apparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits
EP3757814A1 (en) Apparatuses, methods, and systems for time-multiplexing in a configurable spatial accelerator
US6341300B1 (en) Parallel fixed point square root and reciprocal square root computation unit in a processor
CN108415728B (en) Extended floating point operation instruction execution method and device for processor
US11907713B2 (en) Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
CN100451951C (en) 5+3 levels pipeline structure and method in RISC CPU
US20020056034A1 (en) Mechanism and method for pipeline control in a processor
CN102436781B (en) Microprocessor order split device based on implicit relevance and implicit bypass
CN1410885A (en) Command pipeline system based on operation queue duplicating use and method thereof
CN101615113A (en) The microprocessor realizing method of one finishing one butterfly operation by one instruction
US6351803B2 (en) Mechanism for power efficient processing in a pipeline processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070131

Termination date: 20130303