DE10137457B4

DE10137457B4 - Procedure for polynomial calculation with 1-bit coefficients

Info

Publication number: DE10137457B4
Application number: DE2001137457
Authority: DE
Inventors: Wolfram Dipl.-Ing. Drescher
Original assignee: Systemonic AG
Current assignee: NXP BV
Priority date: 2001-08-02
Filing date: 2001-08-02
Publication date: 2004-09-23
Anticipated expiration: 2021-08-03
Also published as: DE10137457A1

Abstract

Verfahren zur Polynomberechnung mit 1-Bit-Koffizienten, wobei die Berechnung in einem Multiplizier-Accumulator (MAC) eines Prozessors mit einem Rechenwerk, welches in Slices fraktioniert ist, mit darin implementierten Datenpfaden, in denen eine schaltbare 1-Bit-Schiebe-Funktion vorhanden ist, vorgenommen wird, dadurch ge kennzeichnet, dass die Polynomberechnung zusammen mit den zugehörigen 1-Bit Koeffizienten in sliceübergreifender Datenwortbreite parallel in den Datenpfaden erfolgt, wobei diese Berechnung innerhalb einer ersten Verarbeitungsstufe, in welcher eine 1-Bit-Verschiebung des gesamten Datenwortes in Richtung höherwertige Bits des Datenwortes ausgeführt wird, und einer zweiten Verarbeitungsstufe, welche eine bitweise Multiplikation des Datenwortes mit den vorliegenden 1-Bit-Koeffizienten und nachfolgend die Akkumulation des Produkts zum vorangespeicherten Wert im Accumulator umfasst, vorgenommen wird.Method for polynomial calculation with 1-bit coefficients, the calculation in a multiplier (MAC) of a processor with an arithmetic unit, which is fractionated into slices, with data paths implemented therein in which a switchable 1-bit shift function is present is carried out, characterized in that the polynomial calculation is carried out together with the associated 1-bit coefficients in cross-slice data word width in parallel in the data paths, this calculation within a first processing stage in which a 1-bit shift of the entire data word in the direction high-order bits of the data word is executed, and a second processing stage, which comprises a bit-wise multiplication of the data word with the present 1-bit coefficients and subsequently the accumulation of the product to the previously stored value in the accumulator.

Description

Die Erfindung betrifft ein Verfahren zur Polynomberechnung mit 1-Bit-Koffizienten, wobei die Berechnung in einem Multiziplier-Accumulator (MAC) eines Prozessors mit einem Rechenwerk, welches in Slices fraktioniert ist, mit darin implementierten Datenpfaden, in denen eine schaltbare 1-Bit-Schiebe-Funktion vorhanden ist, vorgenommen wird.The invention relates to a method for polynomial calculation with 1-bit coefficients, the calculation in a multiziplier accumulator (MAC) of a processor with an arithmetic unit which fractionates into slices is, with data paths implemented therein, in which a switchable 1-bit shift function is available.

Die Polynomberechnungen mit 1-Bit-Koffizienten werden vorzugsweise bei der Realisierung der Algorithmen von Finite-Impulse-Response-Filtern (FIR) angewandt.The polynomial calculations with 1-bit coefficients are preferred when implementing the algorithms of finite impulse response filters (FIR) applied.

Solch ein Filter-Algorithmus findet am häufigsten in Hardware-Encodern mit polynomer Division Verwendung. Dieser Anwendungsfall ist beim Stand der Technik am meisten bekannt. Der Algorithmus erzeugt einen systematischen Code durch Hinzufügen von n-k Paritäts-Prüfsymbolen zur Folge der Datensymbole.Such a filter algorithm finds most frequently in hardware encoders with polynomeric division use. This use case is with Most well known in the art. The algorithm creates one systematic code by adding n-k parity check symbols to the sequence of data symbols.

Die polynome Schreibweise des Code-Wortes c(x) ist daher: c(x) = p(x) + xn-kd(x)wobei die Datensymbole d0, d1, ... dk-1 als Koeffizienten eines Polynoms angesehen werden. Die Paritätssymbole werden gegenüber dem Datenwort um n-k Stellenwerte verschoben.The polynomial spelling of the code word c (x) is therefore: c (x) = p (x) + x nk d (x) where the data symbols d0, d1, ... dk-1 are regarded as coefficients of a polynomial. The parity symbols are shifted by nk places compared to the data word.

p(x) ist dadurch vorberechnet, indem ein generiertes Polynom g(x) das Code-Wort c(x) so dividiert, dass kein Rest bleibt (Rest-Klasse 0):

p (x) is precalculated in that a generated polynomial g (x) divides the code word c (x) so that no remainder remains (remainder class 0):

Die Polynomberechnungen nach dem Stand der Technik erfolgen in den Prozessoren, welche dafür Datenpfade bestimmter Wortbreite bereithalten und die als Multiziplierer-Accumulator (MAC) in Slices konfiguriert sind. Es enthalten diese Slices mit den implementierten Datenpfaden logische Schieberegister (Shifter), Addierer und Multiplizierer, wobei letztere entweder in Integer Betriebsart oder einer Betriebsart mit abtrennbaren Übertrag arbeiten.The polynomial calculations after State of the art takes place in the processors, which are data paths have a certain word width and that as a multi-multiplier accumulator (MAC) are configured in slices. It contains these slices with the implemented data paths logical shift registers (shifters), Adders and multipliers, the latter either in integers Operating mode or an operating mode with detachable carry work.

Die Betriebsart mit abtrennbarem Übertrag wird beim Stand der Technik bei den in den Datenpfaden enthaltenen Multiplizierern bevorzugt angewendet, um die Berechnungsvorgänge zu beschleunigen und Einsparungen bei der Hardware zu erreichen. Dies bedeutet, dass bei der in den Datenpfaden eingestellten Betriebsart mit abtrennbarem Übertrag der Multiplizierer und Addierer im Rechenweg die anfallenden Übertragsinformationen der einzelnen Datenpfade nicht ausgewertet werden müssen.The operating mode with detachable carry is in the prior art for the multipliers contained in the data paths preferred applied to speed up the calculation processes and savings to achieve in hardware. This means that in the Data paths set operating mode with detachable transfer the multiplier and adder in the computing path the carry information of the individual data paths need not be evaluated.

Sollen zur schnellen Polynomberechnung Datenworte mit großen Bitbreiten verarbeitet werden, muss bei der Verwendung solcher Slices, in denen die Datenpfade in bestimmten festen Verarbeitungsbreiten konfiguriert sind, eine in sequenzielle Rechenschritte aufgeteilte Polynomberechnung vorgenommen werden. Dieses erweist sich als unbedingt notwendig, wenn die Größe der zu verarbeitenden Bitbreite des Datenwortes die Verarbeitungsbreite der Slices übersteigt.Data words for fast polynomial calculation with big Bit widths must be processed when using such slices, in which the data paths in certain fixed processing widths are configured, one divided into sequential arithmetic steps Polynomial calculation can be made. This proves to be absolutely necessary if the size of the too processing bit width of the data word the processing width of the Slices exceeds.

Diese parallel in unabhängigen Slices organisierte, jeweils innerhalb der Slices durchgeführte sequentielle Teilverarbeitung der Datenworte, verlangt vom Prozessor viel Steuerleistung beim Verwalten und Zuordnen der je Slice einzeln verarbeiteten und nachfolgend zusammengefügten Wortteile ab.These parallel in independent slices organized sequential, each within the slices Part processing of the data words requires a lot of control power from the processor when managing and assigning each slice processed and merged below Parts of the word.

Dies erweist sich als beim Stand der Technik vorherrschender grundsätzlicher Nachteil, welcher für die Ausführung der Polynomberechnung im Prozessor mittels Koeffizienten mit geringer Bit-Breite, z.B. 1-Bit-Koeffizienten, schwerwiegend ist. Daraus resultiert, dass die Verarbeitungsgeschwindigkeit eingeschränkt und begrenzt, der Aufwand an Hard- und Software groß ist.This turns out to be the case at the stand the fundamental disadvantage of technology, which is essential for the implementation of the Polynomial calculation in the processor using coefficients with lower Bit width, e.g. 1-bit coefficient is severe. This results, that the processing speed is limited and limited, the effort of hardware and software.

Zwar ist beim Stand der Technik durch die US-Patentschrift US 6 140 839 eine Lösung für den Technologiebereich der (complexen) feldprogrammierbaren Architektur (CFPA, FPA) bekannt, bei der die in dieser Technologie notwendigen CLB (Configurable Logic Block) spezielle PASM (partial add subtract multiply) verkörpern.Although in the prior art by the US patent US 6,140,839 a solution for the technology area of (complex) field programmable architecture (CFPA, FPA) is known, in which the CLB (Configurable Logic Block) necessary in this technology embody special PASM (partial add subtract multiply).

Auf diese Weise können rechen-intensive Hardwarebereiche auf Schaltkreisen innerhalb dieses Technologiebereiches relativ flächeneffizient realisiert werden.This allows computing-intensive hardware areas relative to circuits within this technology area area efficient will be realized.

Diese spezielle Lösung ist aber nicht verallgemeinerbar und sie ist den vielfach höheren Anforderungen an Bauelementedichte und Verarbeitungsgeschwindigkeit z.B. beim VLSI-Design von Prozessorschaltkreisen nicht gewachsen.However, this special solution cannot be generalized and it is the many times higher Requirements for component density and processing speed e.g. not growing in VLSI design of processor circuits.

Eine andere Lösung, wie sie in der Patentschrift WO 01/39378 A1 offenbart ist, kann unabhängig von der Schaltkreistechnologie Hard- und Softwareressourcen sparend realisiert werden.Another solution, as in the patent WO 01/39378 A1 is disclosed, can be independent of the circuit technology Hardware and software resources can be saved.

Diese in der Druckschrift offenbarte Schaltungsanordnung betrifft aber nur die Gattung der Decoder für block-basierende Fehler-Korrektur Codes und hier speziell der Reed-Solomon Fehler-Korrekturtechnik. Die dargelegte spezielle Anordnung ist für allgemeine Polynomberechnung nicht nutzbar.This disclosed in the publication Circuit arrangement only affects the type of decoder for block-based Error correction codes and here in particular the Reed-Solomon error correction technology. The particular arrangement presented is for general polynomial computation not usable.

Somit besteht nunmehr die Aufgabenstellung, bei der Polynomberechnung mittels 1-Bit-Koeffizienten von Datenworten mit großer Bit-Breite die Verarbeitungsgeschwindigkeit im Prozessor zu erhöhen und den dabei notwendigen Aufwand an Hard- und Software zu vermindern.So now there is the task at the polynomial calculation using 1-bit coefficients of data words with great Bit width to increase the processing speed in the processor and to reduce the necessary hardware and software effort.

Die verfahrensmäßige Lösung dieser Aufgabenstellung sieht vor, dass die Polynomberechnung zusammen mit den zugehörigen 1-Bit Koeffizienten in slice-übergreifender Datenwortbreite parallel in den Datenpfaden erfolgt. Diese Berechnung wird innerhalb einer ersten Verarbeitungsstufe, in welcher eine 1-Bit-Verschiebung des gesamten Datenwortes in Richtung höherwertige Bits des Datenwortes ausgeführt wird, eingeleitet. Dem schließt sich eine zweite Verarbeitungsstufe an, welche eine bitweise Multiplikation des Datenwortes mit dem vorliegenden 1-Bit-Koeffizienten und nachfolgend die Akkumulation des Produkts zum vorangespeicherten Wert im Accumulator umfasst.The procedural solution to this task provides that the polynomial calculation together with the associated 1-bit coefficients in parallel slice-wide data word width in the Data paths are done. This calculation is initiated within a first processing stage, in which a 1-bit shift of the entire data word in the direction of the most significant bits of the data word is carried out. This is followed by a second processing stage, which comprises a bit-wise multiplication of the data word with the present 1-bit coefficient and subsequently the accumulation of the product to the previously stored value in the accumulator.

Bei dieser Lösung wird deutlich, dass, abweichend vom Stand der Technik, die Datenworte durch slice-übergreifende und somit eine gleichzeitige, parallele Verarbeitung in den MAC repräsentierenden Datenpfaden vorgenommen wird.With this solution it becomes clear that, deviating from the prior art, the data words by slice-spanning and thus simultaneous, parallel processing in the MAC representing data paths is made.

Eine Erweiterung der verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass die Datenpfade, welche jeweils zu einem Slice gehören, wahlweise in einer Betriebsart mit abtrennbarem Übertrag, ohne Auswertung von Überträgen des Multiplizierers/Addierers, oder in einer Integer-Betriebsart mit Übertragsauswertung des Multiplizierers/Addierers konfiguriert werden.An extension of the procedural solution to the Task provides that the data paths, which each belong to a slice, optionally in an operating mode with detachable carry, without evaluation of carry Multipliers / Adders, or in an integer mode with carry evaluation of the multiplier / adder can be configured.

Hierbei führt diese Erweiterung zu einer Programmierbarkeit der Datenpfade. Somit kann die Hardware optimal ausgenutzt und an die zu lösende Berechnungsaufgabe angepasst werden.This extension leads to a Programmability of the data paths. So the hardware can be optimal exploited and to solve the calculation task be adjusted.

Eine vorteilhafte verfahrensmäßige Lösung dieser Aufgabenstellung sieht vor, dass die in den Datenpfaden vorhandene schaltbare 1-Bit-Schiebe-Funktion durch zusätzliche slice übergreifende schaltbare Zwischenverbindungen erweitert werden. von den Ausgängen der Slices, die durch den zughörigen jeweils höherwertigsten Datenpfad realisiert werden, zu den Eingängen der jeweils vereinbarten niederwertigsten Datenpfade der zumindest mittelbar benachbarten höherwertigen Slices werden diese schaltbaren Zusatzverbindungen eingeführt.An advantageous procedural solution to this Task provides that the existing in the data paths Switchable 1-bit shift function thanks to additional switchable cross-slice Interconnections are expanded. from the exits of the Slices by the associated each of the highest order Data path to be realized, to the inputs of the respectively agreed least significant Data paths of the at least indirectly neighboring higher-order ones These switchable additional connections are introduced into slices.

Die Lösung der Aufgabestellung zielt darauf ab, dass die jeweils innerhalb der Slices vorhandene 1-Bit-Schiebefunktion zwischen benachbarten Datenpfaden durch zusätzliche schaltbare Verbindungen zwischen den Slices in Form von Multiplexern zur Gewährleistung der slice-übergreifenden 1-Bit-Schiebefunktion von den niederwertigen Slices zu den zumindest mittelbar benachbarten höherwertigen Slices erweitert wird.The solution to the task is aimed depends on the fact that the 1-bit shift function available within the slices between adjacent data paths through additional switchable connections between the slices in the form of multiplexers to ensure the cross-slice 1-bit shift function from the low-order slices to the at least indirectly neighboring ones high-order Slices is expanded.

Die Ausnutzung der innerhalb der Slices vorhandenen 1-Bit-Schiebefunktion trägt zu der angestrebten Hardware-Aufwandsminimierung und außerdem zur Zyklenminimierung in der Software bei.Exploitation of within the Slice's existing 1-bit shift function contributes the desired hardware effort minimization and also for Cycle minimization in the software.

Eine Variante der verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass die zweistufige Polynomberechnung mit 1-Bit-Koffizienten in den Datenpfaden eines Slices damit begonnen wird, dass die erste Verarbeitungsstufe mit einer Grundstellung des ersten und zweiten Multiplexers korrespondiert. Es wird hierbei jeweils ein erster Tor1MUX1- und Tor1MUX2-Eingang des ersten sowie eines zweiten Multiplexers durchgeschaltet (1). Damit wird einerseits der Ausgangswert des Akkumulators an den ersten Eingang des Multiplizierers und anderseits über die Zwischenverbindung der Multiplizierer-Ausgangswert an den ersten Eingang des Addierers eines zumindest mittelbar benachbarten niederwertigsten Datenpfades eines höherwertigen Slices m angelegt. Es wird weiterhin hierbei jeweils ein erster Tor2MUX1- und Tor2MUX2-Eingang des ersten sowie des zweiten Multiplexers ebenfalls durchgeschaltet, so dass am zweiten Eingang des Multiplizierers der arithmetische Wert EINS anliegt (1).A variant of the procedural solution to the task provides that the two-stage polynomial calculation with 1-bit coefficients in the data paths of a slice is started so that the first processing stage corresponds to a basic position of the first and second multiplexers. A first Tor1MUX1 and Tor1MUX2 input of the first and a second multiplexer are switched through ( 1 ). Thus, on the one hand, the output value of the accumulator is applied to the first input of the multiplier, and on the other hand, via the interconnection, the multiplier output value is applied to the first input of the adder of an at least indirectly adjacent least significant data path of a more significant slice m. A first Tor2MUX1 and Tor2MUX2 input of the first and of the second multiplexer is also also switched through, so that the arithmetic value ONE is present at the second input of the multiplier ( 1 ).

Außerdem wird mit dem durchgeschalteten ersten Tor2MUX2-Eingang des zweiten Multiplexers am zweiten Eingang des Addieres eine arithmetische NULL angelegt. Es wird weiterhin realisiert, dass die zweiten Tor1MUX1- und Tor1MUX2-Eingänge sowie zweiten Tor2MUX1 und Tor2MUX2-Eingänge des ersten und zweiten Multiplexers antivalent gesperrt sind.In addition, with the first switched through Tor2MUX2 input of the second multiplexer at the second input of the Add an arithmetic zero. It will continue to be realized that the second Tor1MUX1 and Tor1MUX2 inputs as well as the second Tor2MUX1 and Tor2MUX2 inputs of the first and second multiplexers are blocked antivalent.

Der ersten Verarbeitungsstufe schließt sich eine zweite Verarbeitungsstufe an. Dabei korrespondieren die Einstellungen des jeweils ersten und zweiten Multiplexers mit einer Folgestellung. Die bitweise Multiplikation wird begonnen, indem einerseits der bereitgestellte 1-Bit-Koeffizient über den zweiten Tor1MUX1-Eingang an den ersten Eingang des Multiplizierers angelegt wird und anderseits das über den CBUS anliegende Datenwort-Bit über den zweiten Tor2MUX1-Eingang an den zweiten Eingang des Multiplizierers gelangt. Im Multiplizierer wird eine Multiplikation des 1-Bit Koeffizienten mit dem eingegebenen Datenwort-Bits ausgeführt.The first processing stage is followed by a second processing stage. The settings correspond of the first and second multiplexers with a subsequent position. The bit-wise multiplication is started by the one provided 1-bit coefficient over the second Tor1MUX1 input to the first input of the multiplier is created and on the other hand the data word bit applied via the CBUS via the second Tor2MUX1 input to the second input of the multiplier arrives. The multiplier is a multiplication of the 1-bit coefficient executed with the entered data word bits.

Nachfolgend wird das hierbei am Ausgang des Multiplizierers erzeugte Produkt durch den durchgeschalteten zweiten Tor2MUX2-Eingang des sich ebenfalls in Folgestellung befindlichen zweiten Multiplexers an den zweiten Eingang des zum MAC gehörenden Addierers angelegt. Weiterhin wird durch den in Folgestellung befindlichen zweiten Tor1MUX2-Eingang der am Ausgang des Accumulators bereitgestellte Rechenwert einer vorhergehenden Polynomberechnung an den ersten Eingang des Addierers angelegt und es wird nunmehr, nach der Addition im Addierer, über den Eingang des Accumulators der neue Rechenwert im Accumulator eingespeichert.This will be shown at the exit of the Multiplier generated product by the second connected Tor2MUX2 input of the second multiplexer also in the following position applied to the second input of the adder belonging to the MAC. Furthermore, the second Tor1MUX2 input is in the following position the arithmetic value provided at the output of the accumulator previous polynomial calculation to the first input of the adder created and it is now, after the addition in the adder, over the Input of the accumulator, the new calculated value is stored in the accumulator.

Außerdem wird realisiert, dass der jeweils erste Tor1MUX1- und Tor1MUX2-Eingang sowie der erste Tor2MUX1- und Tor2MUX2-Eingang in der Folgestellung antivalent zu den Schaltzuständen der Grundstellung gesperrt sind.It is also realized that the first Tor1MUX1 and Tor1MUX2 inputs and the first Tor2MUX1 and Tor2MUX2 input in the following position are equivalent to the switching states the basic position are locked.

Eine weitere verfahrensmäßige Lösung der Aufgabenstellung sieht vor, dass die Polynomberechnung nur mit einem Teil der verfügbaren Slices ausgeführt wird.Another procedural solution to the problem stipulates that the polynomial calculation only with a part of the available slices accomplished becomes.

Eine besondere weitere verfahrensmäßige Lösung der Aufgabenstellung sieht vor, dass die Polynomberechnung nur mit einem Teil der Verarbeitungsbreite des Slices mit einer bestimmten Anzahl von Datenpfaden ausgeführt wird.A special further procedural solution to the task provides that the polynomial calculation only with a part of the processing width of the slice with a certain number of data paths.

Eine Ausführung der weiteren verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass eine Teil-Verarbeitungsbreiten-Logik bei der Polynomberechnung auftretende Überläufe über die vorgesehene Verarbeitungsbreite erkennt und die Weiterverarbeitung bei der Polynomberechnung in den zulässigen Slices in den vorgesehenen Verarbeitungs-Bereichen gewährleistet.An execution of the further procedural solution of the The task envisages a partial processing width logic Overflows occurring during the polynomial calculation over the intended processing range recognizes and further processing in the polynomial calculation in the permissible Guaranteed slices in the intended processing areas.

Die Erfindung soll nachfolgend anhand eines Ausführungsbeispieles näher erläutert werden. In den zugehörigen Zeichnungen zeigtThe invention is based on the following of an embodiment are explained in more detail. In the associated Shows drawings

1 eine Teilstruktur des Multiplizier-Accumulators im Prozessor 1 a substructure of the multiplier accumulator in the processor

2 ein Blockschaltbild eines Bereiches des Multiplizier-Accumulators mit implementierter Teil-Verarbeitungsbreiten-Logik 2 a block diagram of an area of the multiplier accumulator with implemented partial processing width logic

In 1 wird eine Teilstruktur des im Prozessor vorliegenden Multiplizier-Accumulators (MAC) dargestellt, wobei die Fraktionierung in Slices beispielhaft durch die Darstellung von Slice m 23 und Slice m-1 24 verdeutlicht wird.In 1 a partial structure of the multiplier accumulator (MAC) present in the processor is shown, the fractionation into slices being exemplified by the representation of slice m 23 and Slice m-1 24 is made clear.

Diese Slices sind wiederum in Datenpfaden, die durch die dargestellten niederwertigsten Bitstreifen des Slice m 4 und vereinbarten höchstwertigen Bitstreifen des Slice m-1 5 repräsentiert werden, organisiert.These slices are in turn in data paths that are represented by the least significant bit strips of the slice m 4 and agreed the most significant bit stripe of the slice m-1 5 be represented, organized.

Die wahlweise Einstellung der Betriebsart mit abtrennbaren Übertrag oder der Integer-Betriebsart der Multiplizierer/Addierer wird durch die jeweilige Verarbeitung ihrer Übertragsausgänge zur Eingabe in die Übertragseingänge der Multiplizierer/Addierer des nächsthöherwertigen Datenpfades des Slices voreingestellt. Bei vorliegender Integer-Betriebsart sind die Übertragsausgangs-Multiplexer für die Übertragsausgangssignale durchgeschaltet.The optional setting of the operating mode with detachable carry or the integer mode of the multiplier / adder is by the respective processing of their carry outputs for Entry in the carry inputs of Multiplier / adder of the next higher order Data path of the slice preset. In the present integer mode the carry output multiplexers for the carry output signals connected through.

Bei vorgewählter Betriebsart mit abtrennbarem Übertrag sind anstatt der Durchschaltungen die Übertragsausgangs-Multiplexer für die Übertragsausgangssignale der Multiplizierer/Addierer gesperrt und es wird statt dessen jeweils ein Nullsignal durchgeschaltet, d.h. eine weitere Verarbeitung der Übertragsausgänge der Multiplizierer/Addierer wird vermieden.With pre-selected operating mode with detachable carry are the carry output multiplexers instead of the interconnections for the carry output signals the multiplier / adder is locked and it is instead instead switched through a zero signal, i.e. further processing of the carry outputs of the Multiplier / adder is avoided.

Bei der in 1 dargestellten Teilstruktur des Multiplizier-Accumulators im Prozessor liegt die Betriebsart mit abtrennbarem Übertrag vor. Damit ist jeweils der Multiplizier- und der Addierer-Übertragsausgangs-MUx 25, 26 so geschalten, dass jeweils die anliegende NULL an die Übertragseingänge des Multiplizierers 1 und Addierers 2 angelegt werden.At the in 1 Substructure of the multiplier accumulator shown in the processor is the operating mode with detachable carry. This is the multiplier and adder carry output MUx, respectively 25 . 26 switched in such a way that the NULL applied to the carry inputs of the multiplier 1 and adders 2 be created.

Die zu verarbeitenden einzelnen Bitstellen des 1-Bit-Koeffizienten und des Datenwortes werden an den Eingängen der ersten Multiplexer der Bitstreifen bereitgestellt. So wird im Slice m-1 24 die 2¹⁵-Bit-Stelle des 1-Bit-Koeffizienten, die 2¹⁵-Bit-Stelle des 1-Bit-Koeffizienten für Slice m-1 19, am zweiten Tor1MUX1-Eingang 14 und die 2¹⁵-Bitstelle des Datenwortes für Slice m-1, der Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1 21, am zweiten Tor2MUX1-Eingang 15 jeweils angelegt. Die zweistufige Polynomberechnung des Datenwortes mit dem 1-Bit-Koffizienten in den Datenpfaden eines Slices wird mit einer ersten Verarbeitungsstufe, welche mit einer Grundstellung des ersten Multiplexers 6 und des zweiten Multiplexers 7 korrespondiert, begonnen.The individual bit positions of the 1-bit coefficient and the data word to be processed are provided at the inputs of the first multiplexers of the bit strips. So in the slice m-1 24 the 2 ^15- bit position of the 1-bit coefficient, the 2 ^15- bit position of the 1-bit coefficient for slice m-1 19 , at the second Tor1MUX1 input 14 and the 2 ¹⁵ bit position of the data word for slice m-1, the data value of the 2 ¹⁵ bit position of the data word for slice m-1 21 , at the second Tor2MUX1 input 15 each created. The two-stage polynomial calculation of the data word with the 1-bit coefficient in the data paths of a slice is carried out with a first processing stage, which involves a basic position of the first multiplexer 6 and the second multiplexer 7 corresponds, started.

Hierbei wird jeweils ein erster Tor1MUX1- und Tor1MUX2-Eingang 10, 12 des ersten sowie zweiten Multiplexers 6, 7 durchgeschalten. Dadurch wird einerseits der Ausgangswert des Akkumulators 3 an den ersten Eingang des Multiplizierers 1 angelegt. Anderseits wird über die Zwischenverbindung 27 der Ausgangswert des Multiplizierers 1 an den ersten Eingang des Addierers des zumindest mittelbar benachbarten niederwertigsten Datenpfades eines höherwertigen Slices m 23 angelegt.Here, a first Tor1MUX1 and Tor1MUX2 input 10 . 12 of the first and second multiplexers 6 . 7 switched through. On the one hand, this makes the output value of the accumulator 3 to the first input of the multiplier 1 created. On the other hand, the interconnection 27 the output value of the multiplier 1 to the first input of the adder of the at least indirectly adjacent least significant data path of a more significant slice m 23 created.

Es wird weiterhin jeweils ein erster Tor2MUX1- und Tor2MUX2-Eingang 11, 13 des ersten sowie zweiten Multiplexers 6,7 ebenfalls durchgeschalten, so dass einerseits am zweiten Eingang des Multiplizierers 1 der arithmetische Wert EINS anliegt. Anderseits wird mit dem durchgeschaltenen ersten Tor2MUX2-Eingang 13 des zweiten Multiplexers 7 am zweiten Eingang des Addieres 2 eine arithmetische NULL angelegt.There will still be a first Tor2MUX1 and Tor2MUX2 input 11 . 13 of the first and second multiplexers 6 . 7 also switched through, so that on the one hand at the second input of the multiplier 1 the arithmetic value ONE is present. On the other hand, with the first Tor2MUX2 input switched through 13 of the second multiplexer 7 at the second entrance of the adder 2 created an arithmetic zero.

Weiterhin wird im ersten Verarbeitungszustand durch den hierfür eingestellten Grundzustand des ersten und zweiten Multiplexer 6,7 gewährleistet, dass der jeweils zweite Tor1MUX1- und Tor1MUX2-Eingang 14, 16 sowie zweite Tor2MUX1- und Tor2MUX2-Eingang 15, 17 antivalent gesperrt ist.Furthermore, in the first processing state, the basic state of the first and second multiplexers set for this purpose 6 . 7 ensures that the second Tor1MUX1 and Tor1MUX2 inputs 14 . 16 and second Tor2MUX1 and Tor2MUX2 inputs 15 . 17 is antivalent locked.

In einer sich an die erste Verarbeitungsstufe anschließenden zweiten Verarbeitungsstufe, welche mit einer zum Grundzustand antivalenten Folgestellung des jeweils ersten und zweiten Multiplexers 6, 7 korrespondiert, wird die bitweise Multiplikation des 1-Bit Koeffizienten mit dem eingegebenen Datenwort-Bit, hierbei für den Slice m-1, ausgeführt.In a second processing stage following the first processing stage, which has a subsequent position of the respective first and second multiplexer which is antivalent to the basic state 6 . 7 corresponds, the bit-wise multiplication of the 1-bit coefficient with the entered data word bit is carried out, here for the slice m-1.

Dies geschieht einerseits durch Eingabe der bereitgestellten 2¹⁵-Bit-Stelle des 1-Bit-Koeffizienten für Slice m-1 19 über den nunmehr durchgeschaltenen zweiten Tor1MUX1-Eingang 14 an den ersten Eingang des Multiplizierers 1. Anderseits wird die Eingabe der vom CBUS 22 bereitgestellten 2¹⁵-Bitstelle für Slice m-1 des Datenwortes, der Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1 21, über den durchgeschaltenen zweiten Tor2MUX1-Eingang 15 an den zweiten Eingang des Mul tiplizierers 1 ausgeführt.This is done on the one hand by entering the 2 ^15- bit position of the 1-bit coefficient for slice m-1 19 via the now connected second Tor1MUX1 input 14 to the first input of the multiplier 1 , On the other hand, the input of the CBUS 22 provided 2 ¹⁵ bit position for slice m-1 of the data word, the data value of the 2 ¹⁵ bit position of the data word for slice m-1 21 , via the connected second Tor2MUX1 input 15 to the second entrance of the multiplier 1 executed.

Nachfolgend wird das am Ausgang des Multiplizierers 1 erzeugte Produkt durch den durchgeschalteten zweiten Tor2MUX2-Eingang 17 des sich ebenfalls in Folgestellung befindlichen zweiten Multiplexers 7 in den zweiten Eingang des zum MAC gehörenden Addierers 2 angelegt und es wird durch den in Folgestellung befindlichen zweiten Tor1MUX2-Eingang 16 der am Ausgang des Accumulators 3 bereitgestellte Rechenwert einer vorhergehenden Polynomberechnung an den ersten Eingang des Addierers 2 angelegt.This is what happens at the output of the multiplier 1 generated product through the connected second Tor2MUX2 input 17 of the second multiplexer also in the following position 7 into the second input of the adder belonging to the MAC 2 created and it is through the second Tor1MUX2 input in the following position 16 the one at the output of the accumulator 3 provided calculation value of a previous polynomial calculation to the first input of the adder 2 created.

Nach der Addition im Addierer 2 wird über den Eingang des Accumulators 3 der neue Rechenwert im Accumulator 3 eingespeichert.After the addition in the adder 2 is via the input of the accumulator 3 the new calculation value in the accumulator 3 stored.

Weiterhin wird hierbei in dem ersten und zweiten Multiplexer 6, 7 gewährleistet, dass der jeweilig erste Tor1MUX1- und der Tor1MUX2-Eingang 10, 12 sowie der erste Tor2MUX1- und der Tor2MUX2-Eingang 11, 13 jeweils den gesperrten Schaltzustand einnimmt. Diese Schaltzustände sind entsprechend der eingenommenen Folgestellung zu den Schaltzuständen der Grundstellung antivalent.Furthermore, the first and second multiplexers 6 . 7 ensures that the respective first Tor1MUX1 and Tor1MUX2 inputs 10 . 12 as well as the first Tor2MUX1 and the Tor2MUX2 input 11 . 13 each takes the locked switching state. These switching states are antivalent to the switching states of the basic position in accordance with the subsequent position assumed.

In dem in 2 dargestellten Blockschaltbild eines Bereiches des Multiplizier-Accumulators mit implementierter Teil-Verarbeitungsbreiten-Logik 36 ist eine Teilstruktur eines MAC, bestehend aus den Slice k 30, Slice k-1 31, Slice 0 und den zugehörigen Koeffizienten-Register k 33, Koeffizienten-Register k-1 34, Koeffizienten-Register 0 35, dargestellt.In the in 2 block diagram of an area of the multiplier accumulator with implemented partial processing width logic 36 is a substructure of a MAC, consisting of the slice k 30 , Slice k-1 31 , Slice 0 and the associated coefficient register k 33 , Coefficient register k-1 34 , Coefficient register 0 35 , shown.

Die Belegungswerte der einzelnen Bitstellen der jeweiligen 1-Bit-Koeffizienten sind sliceweise in den zugehörigen Koeffizienten-Registern gespeichert und liegen, zusammen mit den vom CBUS 22 bereitgestellten Belegungswerten der Bitstellen des Datenwortes, zur Verarbeitung an den jeweiligen Slices an.The assignment values of the individual bit positions of the respective 1-bit coefficients are stored one by one in the associated coefficient registers and lie together with those from the CBUS 22 provided assignment values of the bit positions of the data word for processing at the respective slices.

Die Verarbeitung mit der Polynomberechnung wird in den Slices so vorgenommen, dass sie in einer solchen Verarbeitungsbreite erfolgt, bei der nicht alle zu den Slices zugehörigen Datenpfade zur Berechnung herangezogen werden.Processing with the polynomial calculation is made in the slices so that they have such a processing range in which not all of the data paths associated with the slices are used for the calculation be used.

Auch werden nicht alle Slices bei der Verarbeitung dadurch eingesetzt, in dem die Zwischenverbindung 27 benachbarte Slices umgeht. Die Teil-Verarbeitungsbreiten-Logik 36 erkennt auftretende Überläufe und richtet durch das Bus-Steuersignal 37 die vorab gewählte Verarbeitungsbreite am CBUS 22 ein.Also, not all slices are used in the processing by the interconnection 27 bypasses neighboring slices. The part processing width logic 36 detects overflows and directs them via the bus control signal 37 the pre-selected processing range on the CBUS 22 on.

11: Multiplizierermultipliers
22: Addiereradder
33: AccumulatorAccumulator
44: niederwertigster Bitstreifen des Slice mof lowest Bit strips of the slice m
55: vereinbarter höchstwertiger Bitstreifen des Slice m-1agreed most significant Bit strips of the slice m-1
66: erster Multiplexerfirst multiplexer
77: zweiter Multiplexersecond multiplexer
88th: erstes MUX1-Torfirst MUX1 Gate
99: zweites MUx1-Torsecond Mux1 Gate
1010: erster Tor1MUX1-Eingangfirst Tor1MUX1 input
1111: erster Tor2MUX1-Eingangfirst Tor2MUX1 input
1212: erster Tor1MUX2-Eingangfirst Tor1MUX2 input
1313: erster Tor2MUX2-Eingangfirst Tor2MUX2 input
1414: zweiter Tor1MUX1-Eingangsecond Tor1MUX1 input
1515: zweiter Tor2MUX1-Eingangsecond Tor2MUX1 input
1616: zweiter Tor1MUX2-Eingangsecond Tor1MUX2 input
1717: zweiter Tor2MUX2-Eingangsecond Tor2MUX2 input
1818: 2⁰-Bit-Stelle des 1-Bit-Koeffizienten für Slice m2 ^0- bit position of the 1-bit coefficient for slice m
1919: 2¹⁵-Bit-Stelle des 1-Bit-Koeffizient für Slice m-12 ^15- bit digit of the 1-bit coefficient for slice m-1
2020: Datenwert der 2⁰-Bitstelle des Datenwortes für Slice mData value of the 2 ⁰ bit position of the data word for slice m
2121: Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1Data value of the 2 ¹⁵ bit position of the data word for slice m-1
2222: CBUSCBUS
2323: Slice mSlice m
2424: Slice m-1Slice m-1
2525: Multiplizier-Übertragsausgangs-MUXMultiply the carry output MUX
2626: Addierer-Übertragsausgangs-MUXAdder carry output MUX
2727: Zwischenverbindungintercommunication
2828: erstes MUX2-TORfirst MUX2-TOR
2929: zweites MUX2-TORsecond MUX2-TOR
3030: Slice kSlice k
3131: Slice k-1Slice k-1
3232: Slice 0Slice 0
3333: Koeffizienten Register kcoefficients Register k
3434: Koeffizienten Register k-1coefficients Register k-1
3535: Koeffizienten Register 0coefficients Register 0
3636: Teil-Verarbeitungsbreiten-LogikPart processing widths logic
3737: Bus-SteuersignalBus control signal

Claims

Method for polynomial calculation with 1-bit coefficients, the calculation in a multiplier (MAC) of a processor with an arithmetic unit, which is fractionated into slices, with data paths implemented therein in which a switchable 1-bit shift function is present is, is carried out, characterized in that the polynomial computation is carried out together with the corresponding 1-bit coefficient in slice border data word width parallel in the data paths, wherein this calculation is within a first processing stage, in which a 1-bit shift of the entire data word in the direction high-order bits of the data word is executed, and a second processing stage, which comprises a bit-wise multiplication of the data word with the present 1-bit coefficients and subsequently the accumulation of the product to the previously stored value in the accumulator.

A method according to claim 1, characterized in that the data paths that each belong to a slice are optional in an operating mode with detachable carry without evaluating the carry of the multiplier or in an integer mode with carry evaluation of the multiplier can be configured.

Method according to Claims 1 and 2, characterized in that the switchable 1-bit shift function present in the data paths is provided by an additional cross-slice function switchable interconnection ( 27 ), is expanded, whereby the switchable interconnections (from the output of each slice, which is implemented by the output of the associated most significant data path, to the input of the least significant data path agreed for each slice) 27 ) is performed, and thereby the 1-bit shift function is guaranteed at least to indirectly neighboring higher-value slices.

Method according to Claims 1 to 3, characterized in that the two-stage polynomial calculation of the data word with the 1-bit coefficient in the data paths of a slice is started by the first processing stage having a basic position of the first multiplexer ( 6 ) and second multiplexer ( 7 ) corresponds to a first Tor1MUX1 and Tor1MUX2 input ( 10 ); ( 12 ) of the first and second multiplexer ( 6 ); ( 7 ) are switched through and thus on the one hand the output value of the accumulator ( 3 ) to the first input of the multiplier ( 1 ) and on the other hand via the intermediate connection ( 27 ) the output value of the multiplier ( 1 ) to the first input of the adder of the at least indirectly adjacent least significant data path of a more significant slice m ( 23 ) that a first Tor2MUX1 and Tor2MUX2 input ( 11 ); ( 13 ) of the first and second multiplexer ( 6 ); ( 7 ) is also switched through, so that at the second input of the multiplier ( 1 ) the arithmetic value ONE is present and also with the switched through first Tor2MUX2 input ( 13 ) of the second multiplexer ( 7 ) at the second entrance of the Addiere ( 2 ) an arithmetic ZERO is created so that the second Tor1MUX1 and Tor1MUX2 inputs ( 14 ); ( 16 ) and second Tor2MUX1 and Tor2MUX2 input ( 15 ); ( 17 ) of the first and second multiplexers ( 6 ); ( 7 ) is blocked in an equivalent way that in a second processing stage following the first processing stage, which is followed by the respective first and second multiplexers ( 6 ); ( 7 ) corresponds to the bitwise multiplication by entering the 1-bit coefficient provided via the second Tor1MUX1 input ( 14 ) to the first input of the multiplier ( 1 ) as well as via the CBUS ( 22 ) Data word bit present via the second Tor2MUX1 input ( 15 ) into the second input of the multiplier ( 1 ) that in the multiplier ( 1 ) A multiplication of the 1-bit coefficient with the entered data word bit is carried out, that in this case at the output of the multiplier ( 1 ) the generated product subsequently through the connected second Tor2MUX2 input ( 17 ) of the second multiplexer, which is also in the following position ( 7 ) into the second input of the adder belonging to the MAC ( 2 ) and that the second Tor1MUX2 input in the following position ( 16 ) at the output of the accumulator ( 3 ) provided calculation value of a preceding polynomial calculation at the first input of the adder ( 2 ) is created and now subsequently, after addition in the adder ( 2 ), via the input of the accumulator ( 3 ) the new calculation value in the accumulator ( 3 ) is saved that the respective first Tor1MUX1 and Tor1MUX2 input ( 10 ); ( 12 ) and first Tor2MUX1 and Tor2MUX2 input ( 11 ); ( 13 ) are blocked in the following position, equivalent to the switching states of the basic position.

Method according to claims 1 to 4, characterized in that the polynomial computation with part of the slices, these Selection of slices are grouped variably, is executed.

Method according to claims 1 to 5, characterized in that that the polynomial calculation is only part of the processing range of the slice and is executed with a certain number of data paths.

A method according to claim 6, characterized in that a partial processing width logic ( 36 ) recognizes overflows occurring during the polynomial calculation over the intended processing width and that the further processing during the polynomial calculation in the permitted slices in their processing areas by a bus control signal 37 is guaranteed.