CN102360281A - Multifunctional fixed-point media access control (MAC) operation device for microprocessor - Google Patents

Multifunctional fixed-point media access control (MAC) operation device for microprocessor Download PDF

Info

Publication number
CN102360281A
CN102360281A CN2011103369743A CN201110336974A CN102360281A CN 102360281 A CN102360281 A CN 102360281A CN 2011103369743 A CN2011103369743 A CN 2011103369743A CN 201110336974 A CN201110336974 A CN 201110336974A CN 102360281 A CN102360281 A CN 102360281A
Authority
CN
China
Prior art keywords
simd
multiplying
totalizer
module
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103369743A
Other languages
Chinese (zh)
Other versions
CN102360281B (en
Inventor
陈书明
李国强
万江华
李振涛
彭元喜
杨惠
陈胜刚
孙书为
陈海燕
王海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201110336974.3A priority Critical patent/CN102360281B/en
Publication of CN102360281A publication Critical patent/CN102360281A/en
Application granted granted Critical
Publication of CN102360281B publication Critical patent/CN102360281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a multifunctional fixed-point media access control (MAC) operation device for a microprocessor. The multifunctional fixed-point MAC operation device comprises an instruction distribution unit, an instruction decoding unit, a storage unit and an instruction operation unit, wherein the instruction operation unit comprises a quaternary pipeline operation structure for MAC operation and a result selection module which is used for acquiring an output result of the quaternary pipeline operation structure and writing the output result back to the storage unit; the quaternary pipeline operation structure, from an input end to an output end, sequentially comprises a secondary multiplier operation station, an adder operation station and a compound operation station which is used for performing complex operation, dot product operation and 32-bit multiplication operation; the secondary multiplier operation station comprises a plurality of single instruction multiple data (SIMD) multipliers which are distributed in parallel; and the adder operation station comprises a plurality of SIMD adders which are distributed in parallel. The multifunctional fixed-point MAC operation device supports various kinds of fixed-point accumulation multiplication and has the advantages of few occupation hardware resources, high hardware reusability, good expandability and small programming code amount.

Description

The multi-functional fixed point MAC arithmetic unit that is used for microprocessor
Technical field
The present invention relates to the arithmetic unit of microprocessor; Be specifically related to a kind of being applicable to and comprise single-instruction multiple-data stream (SIMD) (Single Instruction Multiple Data; SIMD) DSP is at multi-functional fixed point multiplicaton addition unit (Multiply Add Cell, the MAC) arithmetic unit of interior microprocessor.
Background technology
In applications such as Flame Image Process, Radar Signal Processing and modern communicationses, because the deal with data amount is bigger, precision and real-time that data are calculated require height, need to use the microprocessor of very-high performance to handle usually.Because these algorithms have high multiplying intensive and additive operation is intensive; Relate to and comprise that in a large number fixed point takes advantage of the fixed point multiply-accumulate that adds/subtract computing, dot-product operation and complex operation, so the fixed-point data processing power of microprocessor seems important all the more.
To above-mentioned application characteristic, proposed various operating mechanism and the hardware implementation structures that are used to realize above-mentioned fixed point multiply-accumulate in the existing research at present, made it support a large amount of multiplyings, like the M unit of TIC64 series.But the ubiquitous shortcoming of prior art is: the multiplication or the fixed point that 1) have only realized fixed point are taken advantage of to add and are waited some calculation functions, can not support computings such as addition and subtraction, function singleness; 2) take more hardware resource, the hardware multiplexing rate is low, poor expandability, and the programming code amount is big.
Summary of the invention
The present invention is directed to above-mentioned prior art problems, the multiple fixed point multiply-accumulate of a kind of support is provided, takies that hardware resource is few, the hardware multiplexing rate is high, extensibility is good, the multi-functional fixed point MAC arithmetic unit that is used for microprocessor that the programming code amount is little.
In order to solve the problems of the technologies described above, the technical scheme that the present invention adopts is:
A kind of multi-functional fixed point MAC arithmetic unit that is used for microprocessor; Comprise instruction dispatch unit, instruction decoding unit, storage unit and ordering calculation unit; Said ordering calculation unit comprises the level Four flowing water computing structure that is used for the MAC computing and is used to obtain the output result of said level Four flowing water computing structure and will exports the result that the result writes back storage unit and select module; Said level Four flowing water computing structure comprises secondary multiplier computing station, totalizer computing station and the compound operation station that is used to carry out plural number, dot product and 32 multiplyings successively from the input end to the output terminal; Said secondary multiplier computing station comprises the SIMD multiplier of a plurality of parallel distributed; Said totalizer computing station comprises the SIMD totalizer of a plurality of parallel distributed, and said secondary multiplier computing station, totalizer computing station select module to link to each other with said result respectively with the compound operation station.
Further improvement as technique scheme of the present invention:
Said SIMD multiplier comprises the one-level multiplying module and the secondary multiplying module that is used to accomplish sign extended, splicing computing that is used to realize traditional multiplying and SIMD multiplying, and said one-level multiplying module is contacted with secondary multiplying module mutually.
Said SIMD multiplier comprises the SIMD signal input end that is used to realize the control of SIMD multiplying pattern; Said SIMD signal input end links to each other with one-level multiplying module, secondary multiplying module respectively; Said one-level multiplying module and secondary multiplying module are carried out common multiplying with the operand of input when said SIMD signal input end is imported invalid signals, said one-level multiplying module and secondary multiplying module are carried out the SIMD multiplying with the operand of input when said SIMD signal input end is imported useful signal.
Said SIMD totalizer is 40 SIMD fixed point totalizers, and said SIMD totalizer comprises 58 additive operation modules of serial connection successively.
Said compound operation station comprises plural command process module, dot product command process module and 32 multiplying order processing modules.
Said instruction decoding unit comprises instruction area sub-module, 32 bit instruction decoding modules and 16 bit instruction decoding modules; The input end of the input end of said 32 bit instruction decoding modules, 16 bit instruction decoding modules links to each other through instruction area sub-module and instruction dispatch unit respectively; The output terminal of said 32 bit instruction decoding modules links to each other with storage unit, ordering calculation unit respectively, and the output terminal of said 16 bit instruction decoding modules links to each other with storage unit, ordering calculation unit respectively.
Said storage unit comprises local register and totalizer, and said local register links to each other with totalizer difference and instruction arithmetic element.
Said local register and totalizer are dual input dual output structure.
The present invention has following advantage:
1, ordering calculation of the present invention unit comprises level Four flowing water computing structure; Can SIMD be taken advantage of, SIMD plus-minus method, SIMD take advantage of add take advantage of subtract, 8 sites are long-pending, 16 sites are long-pending, 8 plural numbers and 16 plural numbers are fused in the architecture; Support efficient fixed point SIMD multiplying, SIMD addition, SIMD subtraction, SIMD to take advantage of to add, SIMD take advantage of subtract, dot product and complex operation; Both gone for the SIMD microprocessor; Go for the DSP microprocessor again, have support that fixed-point multiplication class computing kind is abundant, divide stack rationally, take that resource is little, rate of code reuse is high, operational performance is good, calculation function is many, the advantage that extensibility is strong, applied widely.
2, secondary multiplier computing of the present invention station comprises the SIMD multiplier of a plurality of parallel distributed, before multiplying order, takes advantage of to add to take advantage of and subtracts instruction, dot product instruction and plural number instruction and all pass through multiplexing SIMD multiplier realization; Totalizer computing station comprises the SIMD totalizer of a plurality of parallel distributed, and add instruction, subtraction instruction and taking advantage of adds to take advantage of and subtract instruction and dot product and plural number instruction and all realize through multiplexing SIMD totalizer, so the reusability of code height.
3, instruction decoding unit of the present invention further comprises instruction area sub-module, 32 bit instruction decoding modules and 16 bit instruction decoding modules, can be directed against the not instruction of isotopic number and treat with a certain discrimination, helps improving the utilization ratio of hardware.
4, storage unit of the present invention further comprises local register and totalizer; Local register links to each other with totalizer difference and instruction arithmetic element; Therefore can be competent at the fixed-point multiplication class computing of various DSP; The scheduling of operand and efficiency of selection be high, can accomplish the computing of the various multiplication class of single parts dependent instruction, and when register is not enough, can also dispatches and select operand and use as register when totalizer.
5, local register of the present invention and totalizer further are dual input dual output structure; Therefore parameter storage that both can the multiple SIMD addition of solid line; And can be used for the parameter storage of SIMD multiplication again, therefore have the advantage that the hardware multiplexing rate is high, function is many, extensibility is strong.
Description of drawings
Fig. 1 is an embodiment of the invention framed structure synoptic diagram.
Fig. 2 is the framed structure synoptic diagram of embodiment of the invention level Four flowing water computing structure.
Fig. 3 is the detailed circuit framed structure synoptic diagram of the embodiment of the invention.
Fig. 4 is the framed structure synoptic diagram of embodiment of the invention SIMD multiplier.
Fig. 5 is the framed structure synoptic diagram of embodiment of the invention SIMD totalizer.
Fig. 6 is the framed structure synoptic diagram of embodiment of the invention instruction decoding unit.
Marginal data: 1, instruction dispatch unit; 2, instruction decoding unit; 21, instruction area sub-module; 22,32 bit instruction decoding modules; 23,16 bit instruction decoding modules; 3, storage unit; 31, local register; 32, totalizer; 4, ordering calculation unit; 5, level Four flowing water computing structure; 51, secondary multiplier computing station; 511, SIMD multiplier; 512, one-level multiplying module; 513, secondary multiplying module; 52, totalizer computing station; 521, SIMD totalizer; 522, additive operation module; 53, compound operation station; 531, plural command process module; 532, dot product command process module; 533,32 multiplying order processing modules; 6, the result selects module; 71, one clap director data and control signal bus; 72, one clap and triple time instruction write-back result and address signal bus; 73, two clap instruction write-back result and address signal bus; 74, write back bus.
Embodiment
Like Fig. 1, Fig. 2 and shown in Figure 3; The multi-functional fixed point MAC arithmetic unit that the embodiment of the invention is used for microprocessor comprises instruction dispatch unit 1, instruction decoding unit 2, storage unit 3 and ordering calculation unit 4; Ordering calculation unit 4 comprises the level Four flowing water computing structure 5 that is used for the MAC computing and is used to obtain the output result of level Four flowing water computing structure 5 and will exports the result that the result writes back storage unit 3 and select module 6; Level Four flowing water computing structure 5 comprises secondary multiplier computing station 51, totalizer computing station 52 and the compound operation station 53 that is used to carry out plural number, dot product and 32 multiplyings successively from the input end to the output terminal; Secondary multiplier computing station 51 comprises the SIMD multiplier 511 of 4 parallel distributed; Totalizer computing station 52 comprises the SIMD totalizer 521 of 2 parallel distributed, and secondary multiplier computing station 51, totalizer computing station 52 and compound operation station 53 select module 6 to link to each other with the result respectively.
As shown in Figure 3; Present embodiment adopts 4 grades of streamlined designs; Level Four flowing water computing structure 5 comprises secondary multiplier computing station 51, totalizer computing station 52 and the compound operation station 53 that is used to carry out plural number, dot product and 32 multiplyings successively from the input end to the output terminal; Secondary multiplier computing station 51 comprises the SIMD multiplier 511 of 4 parallel distributed; Totalizer computing station 52 comprises the SIMD totalizer 521 of 2 parallel distributed, and secondary multiplier computing station 51, totalizer computing station 52 and compound operation station 53 select module 6 to link to each other with the result respectively.Wherein multiplying is accomplished at secondary multiplier computing station 51; The saturated processing of signed magnitude arithmetic(al) and correlated results is accomplished at totalizer computing station 52; Compound operation station 53 is used to accomplish displacement as a result, splicing, correction, the saturated and processing of rounding off, and compound operation station 53 comprises plural command process module 531, dot product command process module 532 and 32 multiplying order processing modules 533.The output terminal at secondary multiplier computing station 51 is clapped the instruction write-back result through two and is selected module 6 to link to each other with address signal bus 73 with the result; The input end at totalizer computing station 52 is clapped director data through one and is linked to each other with storage unit 3 with control signal bus 71 and instruction decoding units 2, and the output terminal at totalizer computing station 52 is clapped through one and selected module 6 to link to each other with address signal bus 72 with the result with triple time instruction write-back result.In addition, it is more that secondary multiplier computing station 51 comprises that the quantity of SIMD multiplier 511 and the quantity that totalizer computing station 52 comprises SIMD totalizer 521 also can adopt, and its principle is identical with present embodiment.
In the present embodiment, storage unit 3 comprises local register 31 and totalizer 32, and local register 31 links to each other with totalizer 32 difference and instruction arithmetic elements 4.In the present embodiment; Local register 31 is dual input dual output structure with totalizer 32; Local register 31 links to each other through two-way input port and instruction decoding unit 2 respectively with totalizer 32, and local register 31 links to each other with and instruction arithmetic element 4 through the two-way output port respectively with totalizer 32.With 8 SIMD additions is example; 8 SIMD addition is that Src1 and Src2 are respectively with n 8 corresponding additions; Because two SIMD totalizers 521 are arranged in totalizer computing station 52; Therefore can realize two kinds 8 SIMD addition through dual input dual output structure: adopt two input ports to realize 8 additions of 4 correspondences, perhaps adopt 4 input ports to realize 8 additions of 8 correspondences.Equally, dual input dual output structure also goes for the SIMD multiplying, and dual input dual output structure has hardware multiplexing rate height, diverse in function, advantage that extensibility is strong.
As shown in Figure 4; SIMD multiplier 511 comprises the one-level multiplying module 512 and the secondary multiplying module 513 that is used to accomplish sign extended, splicing computing that is used to realize traditional multiplying and SIMD multiplying; One-level multiplying module 512 is contacted with secondary multiplying module 513 mutually; The input end of one-level multiplying module 512 and instruction decoding unit 2, storage unit 3 respectively links to each other, and the output terminal of secondary multiplying module 513 links to each other with totalizer computing station 52.
SIMD multiplier 511 comprises the SIMD signal input end that is used to realize the control of SIMD multiplying pattern; The SIMD signal input end links to each other with one-level multiplying module 512, secondary multiplying module 513 respectively; One-level multiplying module 512 is carried out common multiplying with the operand of input with secondary multiplying module 513 when the SIMD signal input end is imported invalid signals, one-level multiplying module 512 is carried out SIMD multiplying with the operand of input with secondary multiplying module 513 when the SIMD signal input end is imported useful signal.In the present embodiment, after the SIMD control signal of SIMD signal input end input, the SIMD control signal is distinguished the treatment step of Attended Operation numerical symbol expansion and processing, symbol pre-service, latches in one-level multiplying module 512; The SIMD control signal is participated in the treatment step that multiplex adapter (MUX) carries out multiplexing in secondary multiplying module 513.Except that SIMD control signal (SIMD), the input signal of SIMD multiplier 511 also comprises two-way operand Src1 and Src1, two-way symbolic number signal Sign1 and Sign2.SIMD multiplier 511 is controlled through the SIMD control signal and is accomplished SIMD computing and common computing.
(1) under the SIMD pattern, 28 SIMD multiplyings of the parallel completion of this multiplier:
Dst[15:0]=?Src1[7:0]×Src2[7:?0]
Dst[31:16]=?Src1[15:8]×Src2[15:?8]
Still obtain 1 32 product, 16 results that deposit 28 multiplication respectively of its height.This moment, two 16 * 8 multiplier was accomplished one 8 * 8 computing respectively.
Calculate lowly 16 as a result the time, only need least-significant byte with Src1 carry out sign extended to 16 and send into first 16 * 8 multiplier with the least-significant byte of Src2,24 of gained as a result low 16 of Dst_L be our required SIMD low level result.
Calculate highly 16 as a result the time, the most-significant byte of at this moment least-significant byte of Src1 being got zero back and Src2 is sent into second 16 * 8 multiplier, 24 of gained as a result high 16 of Dst_H be our required SIMD high position result.
(2) under general mode, this multiplier is accomplished traditional multiplying of one 16 as traditional multiplier:
Dst[31:0]=?Src1[15:0]×Src2[15:?0]
Two 16 * 8 multiplier is accomplished the computing of Src1 [15:0] * Src2 [15:8] and Src1 [15:0] * Src2 [7:0] respectively, and operation result is Dst_H and Dst_L.It is 24 that Dst_L [23:8] is carried out sign extended, again with the Dst_H addition, obtains one 24 result.This result and Dst_L [7:0] splicing is our required Dst [31:0].
As shown in Figure 5; SIMD totalizer 521 is 40 SIMD fixed point totalizers; SIMD totalizer 521 comprises 58 additive operation modules 522 of serial connection successively, and SIMD totalizer 521 can be accomplished 48 and have or not symbol plus-minus method or 2 16 to have or not symbol plus-minus method or 1 32 to have or not symbol plus-minus method or one 40 to have or not the symbol plus-minus method in one claps.
As shown in Figure 6; Instruction decoding unit 2 comprises instruction area sub-module 21,32 bit instruction decoding modules 22 and 16 bit instruction decoding module 23 and latchs; The input end of the input end of 32 bit instruction decoding modules 22,16 bit instruction decoding modules 23 links to each other through instruction area sub-module 21 and instruction dispatch unit 1 respectively; The output terminal of 32 bit instruction decoding modules 22 links to each other with storage unit 3, ordering calculation unit 4 respectively, and the output terminal of 16 bit instruction decoding modules 23 links to each other with storage unit 3, ordering calculation unit 4 respectively.Instruction decoding unit 2 is treated with a certain discrimination through being directed against the not instruction of isotopic number, helps improving the utilization ratio of hardware.Instruction area sub-module 21 is used for distinguishing the instruction that length is 16 and 32, and illegal command.32 bit instruction decoding modules 22 are used for deciphering 32 bit instructions, and 16 bit instruction decoding modules 23 are used for deciphering 16 bit instructions, decipher the read signal that obtains and read the address to send to local register 31 or totalizer 32.Latch is exported to ordering calculation unit 4 after control signal is latched a bat.The instruction that instruction decoding unit 2 will instruct dispatch unit 1 to distribute is deciphered, and sends read operation to local register 31 with totalizer 32 and counts request and read the address, and will carry out to the instruction that ordering calculation unit 4 sends and send after control signal latchs a bat.Ordering calculation unit 4 is used for said operand and control signal are carried out various computings, obtains operation result and operation result is write back local register 31 or the totalizer 32 etc. of storage unit 3.
Be the concrete course of work of example illustrative embodiment below with the specific algorithm.
1, SIMD multiplying order.
The SIMD multiplying order is 8,16 and 32 multiplying orders, and multiplying order is divided into no symbol and takes advantage of no symbol, no symbol to take advantage of symbol, has symbol to take advantage of no symbol and have symbol to take advantage of symbol.
8 and 16 multiplying orders are all accomplished computing through in one-level multiplying module 512 and the secondary multiplying module 513.
The algorithm thought of 32 multiplication is: Dst=(HH < < 32)+(HL < < 16)+(LH < < 16)+LL, wherein Dst is that 64 bit arithmetic results, HH are that high 16 32 results, HL that multiply each other of high 16 and the operand 2 of operand 1 are that low 16 32 results, LH that multiply each other of high 16 and the operand 2 of operand 1 are that high 16 32 results, LL that multiply each other of low 16 and the operand 2 of operand 1 are low 16 32 results that multiply each other of low 16 and the operand 2 of operand 1.32 multiplying orders are at first carried out multiplying in 4 SIMD multipliers 511, in totalizer computing station 52, accomplish shifter-adder for the first time then, in compound operation station 53, accomplish last shifter-adder and correction again, and its concrete implementation method is following:
1) in 4 SIMD multipliers 511, carry out following computing:
HH=Src1_H*Src2_H, HL=Src1_H*Src2_L, LH=Src1_L*Src2_H, LL=Src1_L*Src2_L; Wherein Src1_H is the high 16 of operand 1; Src2_H is the high 16 of operand 2, and Src1_L is low 16 of operand 1, and Src2_L is low 16 of operand 2.
2) in totalizer computing station 52, high 16 bit signs of HL expand to 32, in first SIMD totalizer, accomplish addition with 32 HH again, obtain one 32 result, and low 16 splicings of this result and HL obtain one 48 result.In second SIMD totalizer, adopting uses the same method has realized (LH 16)+LL, get 32 results.
3) 32 the result who in compound operation station 53, second SIMD totalizer 521 in the totalizer computing station 52 is obtained expands to 48, again with totalizer computing station 52 in 48 results added.This process is accomplished in the multiplying order processing module, adopts 48 totalizers just passable specifically, and low 16 splicings of its result and LL can obtain 64 result of 32 multiplication.
32 take advantage of the algorithm thought of 16 multiplication to be: Dst=(HL < < 16)+LL, wherein Dst be 48 bit arithmetics as a result HL be that high 16 low 16 32 results, LL that multiply each other with operand 2 of operand 1 are low 16 32 results that multiply each other of low 16 and the operand 2 of operand 1.In 2 SIMD multipliers, calculate and can accomplish HL and LL computing, high 16 with LL in the totalizer station expand to 32, accomplish addition with HL and obtain one 32 result, splice to such an extent that arrive last 48 Dst as a result with low eight of LL again.16 take advantage of the thought of 32 multiplication analogize it.
2, SIMD add instruction.
The SIMD subtraction instruction is divided into 8,16,32 and 40 add instructions, and supporting respectively has symbol and no symbol addition, and counts addition immediately.Instruction manipulation comes out from instruction decoding unit 2, and source operand is seen off from storage unit 3, and both directly are sent to totalizer computing station 52 and accomplish computing in SIMD totalizer 521.
3, SIMD subtraction instruction.
The SIMD subtraction instruction is divided into 8,16,32 and 40 subtraction instructions, and supporting respectively has symbol and no symbol subtraction, and counts subtraction immediately.Instruction manipulation comes out from instruction decoding unit 2, and source operand is seen off from storage unit 3, and both directly are sent to totalizer computing station 52 and accomplish computing in SIMD totalizer 521.
4, SIMD takes advantage of and adds instruction.
SIMD takes advantage of and subtracts instruction and be divided into 8 and 16 and take advantage of and add instruction.SIMD take advantage of add the instruction be 3 operand instruction; Operand comprises multiplier, multiplicand and addend; Wherein multiplier and multiplicand are from local register 31; Addend is from totalizer 32, and the multiplication of multiplier and multiplicand part is accomplished in SIMD multiplier 511, and its result accomplishes addition section in the SIMD totalizer 521 at totalizer computing station 52; Be sent to the result through 3 bat instruction write-back buses in a bat and triple time instruction write-back result and the address signal bus 72 and select module 6, write back totalizer 32 through writing back bus 74 again.
5, SIMD takes advantage of and subtracts instruction.
SIMD takes advantage of and subtracts instruction and be divided into 8 and 16 and take advantage of and subtract instruction.SIMD take advantage of subtract the instruction be 3 operand instruction; Operand comprises multiplier, multiplicand and subtrahend; Wherein multiplier and multiplicand are from local register 31; Subtrahend is from totalizer 32, and the multiplication of multiplier and multiplicand part is accomplished in SIMD multiplier 511, and its result accomplishes the subtraction part in the SIMD totalizer 521 at totalizer computing station 52; Be sent to the result through 3 bat instruction write-back buses in a bat and triple time instruction write-back result and the address signal bus 72 and select module 6, write back totalizer 32 through writing back bus 74 again.
6, dot product class instruction.
The instruction of dot product class is divided into long-pending the amassing with 16 sites in 8 sites and instructs.The multiplication of dot product class ordering calculation part is accomplished in SIMD multiplier 511, and the summation part is accomplished in totalizer computing station 52, and whether round off retouch etc. of result operates in compound operation station 53 and accomplishes.
7, plural number type instruction.
A plural number type instruction is divided into 8 plural numbers and 16 plural number instructions.The multiplication part of plural number type ordering calculation is accomplished in SIMD multiplier 511, and the summation part is accomplished in totalizer computing station 52, and whether round off retouch etc. of result operates in compound operation station 53 and accomplishes.
Wherein, SIMD addition and SIMD subtraction are 1 clap to accomplish, and in the SIMD multiplication 8 is 2 to clap and instruct with 16 multiplication; 32 to take advantage of 16 multiplication and 16 to take advantage of 32 multiplication be 3 clap to accomplish; 32 multiplying orders are 4 to clap and accomplish, and SIMD takes advantage of to add to take advantage of with SIMD and is kept to 3 and claps and accomplish, and instruction is 4 to clap and accomplish to dot product with plural number.The result selects 6 completion of module that the instruction of accomplishing is calculated at each station and selects processing; Avoid causing the instruction of different beats to flow out simultaneously; Be sent to afterwards and write back bus 74, write back bus 74 and insert local register 31 and totalizer 32, thereby the result is write back local register 31 or totalizer 32.
The above only is a preferred implementation of the present invention, and protection scope of the present invention also not only is confined to the foregoing description, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art in the some improvement and the retouching that do not break away under the principle of the invention prerequisite, these improvement and retouching also should be regarded as protection scope of the present invention.

Claims (8)

1. multi-functional fixed point MAC arithmetic unit that is used for microprocessor; Comprise instruction dispatch unit (1), instruction decoding unit (2), storage unit (3) and ordering calculation unit (4); It is characterized in that: said ordering calculation unit (4) comprises the level Four flowing water computing structure (5) that is used for the MAC computing and is used to obtain the output result of said level Four flowing water computing structure (5) and will exports the result that the result writes back storage unit (3) and select module (6); Said level Four flowing water computing structure (5) comprises secondary multiplier computing station (51), totalizer computing station (52) and the compound operation station (53) that is used to carry out plural number, dot product and 32 multiplyings successively from the input end to the output terminal; Said secondary multiplier computing station (51) comprises the SIMD multiplier (511) of a plurality of parallel distributed; Said totalizer computing station (52) comprises the SIMD totalizer (521) of a plurality of parallel distributed, and said secondary multiplier computing station (51), totalizer computing station (52) select module (6) to link to each other with said result respectively with compound operation station (53).
2. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 1; It is characterized in that: said SIMD multiplier (511) comprises the secondary multiplying module (513) that is used to realize the one-level multiplying module (512) of traditional multiplying and SIMD multiplying and is used to accomplish sign extended, splicing computing, and said one-level multiplying module (512) is contacted with secondary multiplying module (513) mutually.
3. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 2; It is characterized in that: said SIMD multiplier (511) comprises the SIMD signal input end that is used to realize the control of SIMD multiplying pattern; Said SIMD signal input end links to each other with one-level multiplying module (512), secondary multiplying module (513) respectively; Said one-level multiplying module (512) is carried out common multiplying with the operand of input with secondary multiplying module (513) when said SIMD signal input end is imported invalid signals, said one-level multiplying module (512) is carried out SIMD multiplying with the operand of input with secondary multiplying module (513) when said SIMD signal input end is imported useful signal.
4. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 1; It is characterized in that: said SIMD totalizer (521) is 40 SIMD fixed point totalizers, and said SIMD totalizer (521) comprises 58 additive operation modules (522) of serial connection successively.
5. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 1, it is characterized in that: said compound operation station (53) comprises plural command process module (531), dot product command process module (532) and 32 multiplying order processing modules (533).
6. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 1; It is characterized in that: said instruction decoding unit (2) comprises instruction area sub-module (21), 32 bit instruction decoding modules (22) and 16 bit instruction decoding modules (23); The input end of the input end of said 32 bit instruction decoding modules (22), 16 bit instruction decoding modules (23) links to each other through instruction area sub-module (21) and instruction dispatch unit (1) respectively; The output terminal of said 32 bit instruction decoding modules (22) links to each other with storage unit (3), ordering calculation unit (4) respectively, and the output terminal of said 16 bit instruction decoding modules (23) links to each other with storage unit (3), ordering calculation unit (4) respectively.
7. according to any described multi-functional fixed point MAC arithmetic unit that is used for microprocessor in the claim 1~6; It is characterized in that: said storage unit (3) comprises local register (31) and totalizer (32), and said local register (31) links to each other with totalizer (32) difference and instruction arithmetic element (4).
8. the multi-functional fixed point MAC arithmetic unit that is used for microprocessor according to claim 7, it is characterized in that: said local register (31) and totalizer (32) are dual input dual output structure.
CN201110336974.3A 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor Active CN102360281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110336974.3A CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110336974.3A CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Publications (2)

Publication Number Publication Date
CN102360281A true CN102360281A (en) 2012-02-22
CN102360281B CN102360281B (en) 2014-04-02

Family

ID=45585615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110336974.3A Active CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Country Status (1)

Country Link
CN (1) CN102360281B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243131A (en) * 2014-09-29 2014-12-24 瑞斯康达科技发展股份有限公司 Clock synchronizing method and device
CN104901651A (en) * 2015-06-25 2015-09-09 福州瑞芯微电子有限公司 Realizing circuit and method of digital filter
CN106775579A (en) * 2016-11-29 2017-05-31 北京时代民芯科技有限公司 Floating-point operation accelerator module based on configurable technology
CN108475188A (en) * 2017-07-31 2018-08-31 深圳市大疆创新科技有限公司 Data processing method and equipment
WO2020015075A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium
WO2020015076A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227508A (en) * 2016-07-25 2016-12-14 中国科学院计算技术研究所 A kind of without back edge data stream round-robin method, system, device, chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1147307A (en) * 1994-05-03 1997-04-09 先进Risc机器有限公司 Data processing with multiple instruction sets
CN1366234A (en) * 2000-12-19 2002-08-28 国际商业机器公司 Operation circuit and operation method
CN1481526A (en) * 2000-12-13 2004-03-10 �����ɷ� Cryptographic processor
CN1598757A (en) * 2004-09-02 2005-03-23 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
US20050187997A1 (en) * 2004-02-20 2005-08-25 Leon Zheng Flexible accumulator in digital signal processing circuitry
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1147307A (en) * 1994-05-03 1997-04-09 先进Risc机器有限公司 Data processing with multiple instruction sets
CN1481526A (en) * 2000-12-13 2004-03-10 �����ɷ� Cryptographic processor
CN1366234A (en) * 2000-12-19 2002-08-28 国际商业机器公司 Operation circuit and operation method
US20050187997A1 (en) * 2004-02-20 2005-08-25 Leon Zheng Flexible accumulator in digital signal processing circuitry
CN1598757A (en) * 2004-09-02 2005-03-23 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243131A (en) * 2014-09-29 2014-12-24 瑞斯康达科技发展股份有限公司 Clock synchronizing method and device
CN104901651A (en) * 2015-06-25 2015-09-09 福州瑞芯微电子有限公司 Realizing circuit and method of digital filter
CN106775579A (en) * 2016-11-29 2017-05-31 北京时代民芯科技有限公司 Floating-point operation accelerator module based on configurable technology
CN106775579B (en) * 2016-11-29 2019-06-04 北京时代民芯科技有限公司 Floating-point operation accelerator module based on configurable technology
CN108475188A (en) * 2017-07-31 2018-08-31 深圳市大疆创新科技有限公司 Data processing method and equipment
WO2020015075A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium
WO2020015076A1 (en) * 2018-07-18 2020-01-23 平安科技(深圳)有限公司 Facial image comparison method and apparatus, computer device, and storage medium

Also Published As

Publication number Publication date
CN102360281B (en) 2014-04-02

Similar Documents

Publication Publication Date Title
CN102360281B (en) Multifunctional fixed-point media access control (MAC) operation device for microprocessor
US10445451B2 (en) Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10372668B2 (en) Hardware processors and methods for tightly-coupled heterogeneous computing
US10514912B2 (en) Vector multiplication with accumulation in large register space
EP3772000A1 (en) Variable format, variable sparsity matrix multiplication instruction
TWI610229B (en) Apparatus and method for vector broadcast and xorand logical instruction
CN102231102B (en) Method for processing RSA password based on residue number system and coprocessor
CA2400647C (en) Digital signal processor with coupled multiply-accumulate units
TWI537823B (en) Methods, apparatus, instructions and logic to provide vector population count functionality
TW526450B (en) Cryptographic processor
JP2006107463A (en) Apparatus for performing multiply-add operations on packed data
US9355068B2 (en) Vector multiplication with operand base system conversion and re-conversion
US20200334042A1 (en) Method and device (universal multifunction accelerator) for accelerating computations by parallel computations of middle stratum operations
EP3238022B1 (en) Method and apparatus for performing big-integer arithmetic operations
WO2013095629A1 (en) Apparatus and method for vector instructions for large integer arithmetic
CN104011658A (en) Instructions and logic to provide vector linear interpolation functionality
KR100453230B1 (en) Hyperelliptic curve crtpto processor hardware apparatus
CN102012802B (en) Vector processor-oriented data exchange method and device
CN101477456B (en) Self-correlated arithmetic unit and processor
CN101615113A (en) The microprocessor realizing method of one finishing one butterfly operation by one instruction
TWI599953B (en) Method and apparatus for performing big-integer arithmetic operations
Srini et al. Parallel DSP with memory and I/O processors
CN104011674A (en) Digital signal processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant