US20110060892A1 - Speculative forwarding of non-architected data format floating point results - Google Patents

Speculative forwarding of non-architected data format floating point results Download PDF

Info

Publication number
US20110060892A1
US20110060892A1 US12/820,662 US82066210A US2011060892A1 US 20110060892 A1 US20110060892 A1 US 20110060892A1 US 82066210 A US82066210 A US 82066210A US 2011060892 A1 US2011060892 A1 US 2011060892A1
Authority
US
United States
Prior art keywords
adf
result
instruction
floating
point unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/820,662
Inventor
G. Glenn Henry
Terry Parks
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Priority to US12/820,662 priority Critical patent/US20110060892A1/en
Assigned to VIA TECHNOLOGIES, INC. reassignment VIA TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HENRY, G. GLENN, PARKS, TERRY
Priority to TW099128795A priority patent/TWI450191B/en
Priority to CN201010270067.9A priority patent/CN101916182B/en
Publication of US20110060892A1 publication Critical patent/US20110060892A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3812Devices capable of handling different types of numbers
    • G06F2207/3824Accepting both fixed-point and floating-point numbers

Definitions

  • the present invention relates in general to the field of pipelined microprocessor architectures, and particularly to the forwarding of floating-point results from one instruction to another.
  • the x86 architecture specifies multiple data formats for floating point operands, namely, single-precision, double-precision, and extended double-precision. This implies that the floating point units have a different multiplier, adder, etc. for each architected data format. This is an inefficient use of space and power. So, to reduce the number of multipliers, adders, etc., the floating point units include a single multiplier, adder, etc. each capable of operating on operands that are in a single non-architected data format.
  • the floating point units convert the received source operands from their architected data format to the non-architected data format, perform the operation on the non-architected data format operands to generate a result in the non-architected data format, and then convert the result back to the architected data format.
  • the architected data format results are then forwarded to the floating point units as source operands, as illustrated by the conventional floating point units 112 shown in FIG. 4 .
  • the present invention provides a microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands.
  • the microprocessor includes first and second floating-point units.
  • the first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction.
  • the second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction.
  • the second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result.
  • the microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • the present invention provides a method for processing floating-point instructions in a microprocessor having first and second floating-point units, wherein the microprocessor has an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands.
  • the method includes speculatively forwarding a non-ADF result generated by the first floating-point unit from the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction.
  • the method also includes the second floating-point unit using the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction.
  • the method also includes determining whether the non-ADF result creates an exception condition when converted to an ADF result.
  • the method also includes canceling the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • the present invention provides a computer program product encoded in at least one computer readable medium for use with a computing device, the computer program product comprising computer readable program code embodied in said medium for specifying a microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands.
  • the computer readable program code includes first program code for specifying a first floating-point unit and second program code for specifying a second floating-point unit.
  • the first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction.
  • the second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction.
  • the second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result.
  • the microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • FIG. 1 is a block diagram illustrating a microprocessor that incorporates latency-reducing non-architectural data format result forwarding.
  • FIG. 2 is a block diagram illustrating in more detail the floating point units of FIG. 1 .
  • FIG. 3 is a flowchart illustrating an example of operation of the microprocessor of FIG. 1 .
  • FIG. 4 is a block diagram illustrating related art floating point units that do not forward non-architectural data format results.
  • inventions described herein include modified floating point units that forward the non-architected data format (NADF) result without converting to the architected data format (ADF) and are capable of receiving and operating directly on the NADF operands without converting them from the ADF to the NADF. This reduces the latency by removing the conversion time in and out of the floating point units from the critical path.
  • NADF non-architected data format
  • ADF architected data format
  • the NADF includes additional exponent bits beyond the number of exponent bits specified by the largest ADF.
  • the largest ADF is the 80-bit double-precision format, which includes a 15-bit exponent field, and the NADF includes a 17-bit exponent field to accommodate overflows and underflows.
  • the microprocessor 100 includes a plurality of floating point units (FPU) 112 .
  • the floating point units 112 include a first floating point unit 112 A that includes a floating point multiplier 226 (see FIG. 2 ) that generates a first ADF result 162 , and a second floating point unit 112 B that includes a floating point adder 236 (see FIG. 2 ) that generates a second ADF result 164 .
  • the floating point units 112 receive ADF source operands 152 from a multiplexer 116 that receives ADF source operands from general purpose registers (GPRs) 118 , from temporary registers of a reorder buffer (ROB) 114 , and the ADF results 162 / 164 from the floating point units 112 themselves. Additionally, the floating point units 112 generate respective exception signals 172 / 174 to the ROB 114 to indicate that an instruction created an exception condition, such as an overflow or underflow, as described in more detail below.
  • GPRs general purpose registers
  • ROB reorder buffer
  • the microprocessor 100 is an x86 (also referred to as IA-32) architecture microprocessor 100 ; however, other microprocessor architectures may be employed.
  • a microprocessor is an x86 architecture processor if it can correctly execute a majority of the application programs that are designed to be executed on an x86 microprocessor. An application program is correctly executed if its expected results are obtained.
  • the microprocessor 100 executes instructions of the x86 instruction set and includes the x86 user-visible register set.
  • Floating point unit 112 A includes a converter 222 , coupled to a mux 224 , coupled to a NADF multiplier 226 , coupled to a second converter 228 .
  • Floating point unit 112 B includes a converter 232 , coupled to a mux 234 , coupled to a NADF adder 236 , coupled to a second converter 238 .
  • the converter 222 converts the ADF operands 152 into NADF operands 272 that are provided to the mux 224 .
  • the mux 224 also receives a NADF result 252 forwarded from the NADF multiplier 226 and a NADF result 254 forwarded from the NADF adder 236 . From its inputs, the mux 224 selects NADF operands 266 for provision to the NADF multiplier 226 , which multiplies the operands 266 to generate the NADF result 252 .
  • the converter 228 converts the NADF result 252 to the ADF result 162 of FIG. 1 . Additionally, the converter 228 generates an exception indicator 172 of FIG.
  • the NADF may have accommodated the result 252 without creating an underflow or overflow; however, the smaller ADF may not sufficiently accommodate the NADF result 252 such that the conversion from the NADF to the ADF creates an exception condition.
  • the converter 232 converts the ADF operands 152 into NADF operands 274 that are provided to the mux 234 .
  • the mux 234 also receives the NADF result 252 forwarded from the NADF multiplier 226 and the NADF result 254 forwarded from the NADF adder 236 . From its inputs, the mux 234 selects NADF operands 268 for provision to the NADF adder 236 , which adds the operands 268 to generate the NADF result 254 .
  • the converter 238 converts the NADF result 254 to the ADF result 164 of FIG. 1 . Additionally, the converter 238 generates an exception indicator 174 of FIG. 1 if it detects that the ADF result 164 created an exception condition, such as an underflow or overflow.
  • the floating point units 112 of FIG. 2 advantageously potentially reduce instruction execution latency by directly forwarding to one another their NADF results 252 / 254 .
  • This is in contrast to the conventional floating point units 112 of FIG. 4 , which incur the latency of converting the NADF results to ADF results, forwarding the converted ADF results, and then reconverting to NADF operands.
  • Floating point operations may generate exception conditions, such as overflow or underflow.
  • a side-effect of the NADF is that some results that would overflow/underflow in the ADF would not do so in the NADF, e.g., because of the larger exponent, as discussed above. Consequently, the forwarding of the NADF results 252 / 254 is speculative because the programmer may not want the instruction that receives the forwarded NADF result 252 / 254 to execute with a value that would cause an exception when converted to ADF.
  • the converters 228 / 238 also perform the conversion to ADF, and if the conversion yields an overflow/underflow, then they generate an exception 172 / 174 on the forwarding instruction and the microprocessor 100 kills the instruction that executed using the speculatively forwarded NADF result, as described in more detail with respect to FIG. 3 .
  • FIG. 3 a flowchart illustrating an example of operation of the microprocessor 100 of FIG. 1 is shown. Flow begins at block 302 .
  • floating point unit 112 A receives an instruction-B for execution.
  • the mux 224 detects that one of the source operands is the NADF result 254 of a previous instruction-A that has been forwarded from the NADF adder 236 and accordingly selects the forwarded NADF result 254 .
  • the mux 224 may also select as the other operand the forwarded NADF result 252 from the NADF multiplier 226 or the converted NADF operands 272 .
  • Flow proceeds to block 304 .
  • the NADF multiplier 226 multiplies the NADF operands 266 to generate the NADF result 252 for instruction-B. Flow proceeds concurrently from block 304 to blocks 306 and 326 .
  • the forwarding buses forward the NADF result 252 of instruction-B to the NADF adder 236 .
  • Flow proceeds to block 308 .
  • floating point unit 112 B receives an instruction-C for execution.
  • the mux 234 detects that one of the source operands is the NADF result 252 of instruction-B that has been forwarded at block 306 from the NADF multiplier 226 and accordingly selects the forwarded NADF result 252 .
  • the mux 234 may also select as the other operand the forwarded NADF result 254 from the NADF adder 236 or the converted NADF operands 274 .
  • Flow proceeds to block 312 .
  • the NADF adder 236 adds the NADF operands 268 to generate the NADF result 254 for instruction-C.
  • Flow ends at block 312 although it is understood that the forwarding of NADF results 252 and/or 254 may advantageously continue for a long sequence of instructions, thereby reducing latency and speeding up the execution of the sequence of instructions relative to the conventional floating point units 112 of FIG. 4 that include the ADF-to-NADF conversion and NADF-to-ADF conversion in the forwarding paths.
  • the converter 228 converts the NADF result 252 of instruction-B to ADF result 162 .
  • Flow proceeds to decision block 324 .
  • the converter 228 determines whether the NADF result 252 of instruction-B creates an exception condition when converting to ADF. If so, flow proceeds to block 326 ; otherwise, flow proceeds to block 328 .
  • the converter 228 asserts the exception indicator 172 to the ROB 114 . Consequently, the microprocessor 100 will take an exception, and the ROB 114 will flush instruction-C since instruction-C is newer in program sequence than instruction-B that caused the exception. This is necessary since the NADF result 252 of instruction-B was speculatively forwarded to the NADF adder 236 without knowledge of whether the NADF result 252 was a good operand, i.e., without knowledge of whether the NADF result 252 was a non-underflowed/overflowed value from an ADF perspective. That is, the programmer may not have desired instruction-C to execute with a non-good operand. However, advantageously the NADF results 252 / 254 are speculatively forwarded to potentially reduce the latency of instruction execution and in most cases both the forwarding and the receiving instructions will complete successfully. Flow ends at block 326 .
  • floating point unit 112 A provides the ADF result 162 to the ROB 114 for storage in a temporary register therein. Flow proceeds to block 332 .
  • the ROB 114 retires the ADF result 162 from the temporary register to the appropriate GPR 118 . Flow ends at block 332 .
  • software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein. This can be accomplished through the use of general programming languages (e.g., C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs.
  • general programming languages e.g., C, C++
  • HDL hardware description languages
  • Such software can be disposed in any known computer usable medium such as magnetic tape, semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.), a network, wire line, wireless or other communications medium.
  • Embodiments of the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits.
  • the apparatus and methods described herein may be embodied as a combination of hardware and software.
  • the present invention should not be limited by any of the exemplary embodiments described herein, but should be defined only in accordance with the following claims and their equivalents.
  • the present invention may be implemented within a microprocessor device which may be used in a general purpose computer.
  • a microprocessor device which may be used in a general purpose computer.

Abstract

A microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands includes first and second floating-point units. The first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit. The non-ADF result is associated with a first instruction. The second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result. The microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.

Description

    CROSS REFERENCE TO RELATED APPLICATION(S)
  • This application claims priority based on U.S. Provisional Application Ser. No. 61/240,753, filed Sep. 9, 2009, entitled FAST FLOATING POINT RESULT FORWARDING USING NON-ARCHITECTED DATA FORMAT, which is hereby incorporated by reference in its entirety.
  • This application is related to U.S. Non-Provisional Application TBD, filed concurrently herewith, entitled FAST FLOATING POINT RESULT FORWARDING USING NON-ARCHITECTED DATA FORMAT, which is incorporated by reference herein in its entirety, and which is subject to an obligation of assignment to common assignee VIA Technologies, Inc.
  • FIELD OF THE INVENTION
  • The present invention relates in general to the field of pipelined microprocessor architectures, and particularly to the forwarding of floating-point results from one instruction to another.
  • BACKGROUND OF THE INVENTION
  • The x86 architecture specifies multiple data formats for floating point operands, namely, single-precision, double-precision, and extended double-precision. This implies that the floating point units have a different multiplier, adder, etc. for each architected data format. This is an inefficient use of space and power. So, to reduce the number of multipliers, adders, etc., the floating point units include a single multiplier, adder, etc. each capable of operating on operands that are in a single non-architected data format. The floating point units convert the received source operands from their architected data format to the non-architected data format, perform the operation on the non-architected data format operands to generate a result in the non-architected data format, and then convert the result back to the architected data format. The architected data format results are then forwarded to the floating point units as source operands, as illustrated by the conventional floating point units 112 shown in FIG. 4.
  • BRIEF SUMMARY OF INVENTION
  • In one aspect the present invention provides a microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands. The microprocessor includes first and second floating-point units. The first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction. The second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result. The microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • In another aspect, the present invention provides a method for processing floating-point instructions in a microprocessor having first and second floating-point units, wherein the microprocessor has an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands. The method includes speculatively forwarding a non-ADF result generated by the first floating-point unit from the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction. The method also includes the second floating-point unit using the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The method also includes determining whether the non-ADF result creates an exception condition when converted to an ADF result. The method also includes canceling the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • In yet another aspect, the present invention provides a computer program product encoded in at least one computer readable medium for use with a computing device, the computer program product comprising computer readable program code embodied in said medium for specifying a microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands. The computer readable program code includes first program code for specifying a first floating-point unit and second program code for specifying a second floating-point unit. The first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction. The second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result. The microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating a microprocessor that incorporates latency-reducing non-architectural data format result forwarding.
  • FIG. 2 is a block diagram illustrating in more detail the floating point units of FIG. 1.
  • FIG. 3 is a flowchart illustrating an example of operation of the microprocessor of FIG. 1.
  • FIG. 4 is a block diagram illustrating related art floating point units that do not forward non-architectural data format results.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The forwarding of architected data format results described above with respect to FIG. 4, or more specifically the data format conversions performed, is time-wasteful in the sense that it adds additional latency in cases where the result-generating and result-consuming instructions are scheduled back-to-back for execution. To reduce latency, embodiments described herein include modified floating point units that forward the non-architected data format (NADF) result without converting to the architected data format (ADF) and are capable of receiving and operating directly on the NADF operands without converting them from the ADF to the NADF. This reduces the latency by removing the conversion time in and out of the floating point units from the critical path. The amount of latency reduced may be particularly significant when there is a sequence of back-to-back result-generating and result-consuming instructions such that the modified floating point units are able to forward the NADF results. In one embodiment, the NADF includes additional exponent bits beyond the number of exponent bits specified by the largest ADF. For example, in one embodiment the largest ADF is the 80-bit double-precision format, which includes a 15-bit exponent field, and the NADF includes a 17-bit exponent field to accommodate overflows and underflows.
  • Referring now to FIG. 1, a block diagram illustrating a microprocessor 100 that incorporates the latency-reducing NADF result forwarding described above is shown. The microprocessor 100 includes a plurality of floating point units (FPU) 112. In one embodiment, the floating point units 112 include a first floating point unit 112A that includes a floating point multiplier 226 (see FIG. 2) that generates a first ADF result 162, and a second floating point unit 112B that includes a floating point adder 236 (see FIG. 2) that generates a second ADF result 164. The floating point units 112 receive ADF source operands 152 from a multiplexer 116 that receives ADF source operands from general purpose registers (GPRs) 118, from temporary registers of a reorder buffer (ROB) 114, and the ADF results 162/164 from the floating point units 112 themselves. Additionally, the floating point units 112 generate respective exception signals 172/174 to the ROB 114 to indicate that an instruction created an exception condition, such as an overflow or underflow, as described in more detail below.
  • In one embodiment, the microprocessor 100 is an x86 (also referred to as IA-32) architecture microprocessor 100; however, other microprocessor architectures may be employed. A microprocessor is an x86 architecture processor if it can correctly execute a majority of the application programs that are designed to be executed on an x86 microprocessor. An application program is correctly executed if its expected results are obtained. In particular, the microprocessor 100 executes instructions of the x86 instruction set and includes the x86 user-visible register set.
  • Referring now to FIG. 2, a block diagram illustrating in more detail the floating point units 112 of FIG. 1 is shown. Floating point unit 112A includes a converter 222, coupled to a mux 224, coupled to a NADF multiplier 226, coupled to a second converter 228. Floating point unit 112B includes a converter 232, coupled to a mux 234, coupled to a NADF adder 236, coupled to a second converter 238.
  • The converter 222 converts the ADF operands 152 into NADF operands 272 that are provided to the mux 224. The mux 224 also receives a NADF result 252 forwarded from the NADF multiplier 226 and a NADF result 254 forwarded from the NADF adder 236. From its inputs, the mux 224 selects NADF operands 266 for provision to the NADF multiplier 226, which multiplies the operands 266 to generate the NADF result 252. The converter 228 converts the NADF result 252 to the ADF result 162 of FIG. 1. Additionally, the converter 228 generates an exception indicator 172 of FIG. 1 if it detects that the ADF result 162 created an exception condition, such as an underflow or overflow. That is, the NADF may have accommodated the result 252 without creating an underflow or overflow; however, the smaller ADF may not sufficiently accommodate the NADF result 252 such that the conversion from the NADF to the ADF creates an exception condition.
  • The converter 232 converts the ADF operands 152 into NADF operands 274 that are provided to the mux 234. The mux 234 also receives the NADF result 252 forwarded from the NADF multiplier 226 and the NADF result 254 forwarded from the NADF adder 236. From its inputs, the mux 234 selects NADF operands 268 for provision to the NADF adder 236, which adds the operands 268 to generate the NADF result 254. The converter 238 converts the NADF result 254 to the ADF result 164 of FIG. 1. Additionally, the converter 238 generates an exception indicator 174 of FIG. 1 if it detects that the ADF result 164 created an exception condition, such as an underflow or overflow.
  • As may be observed by comparing FIGS. 2 and 4, the floating point units 112 of FIG. 2 advantageously potentially reduce instruction execution latency by directly forwarding to one another their NADF results 252/254. This is in contrast to the conventional floating point units 112 of FIG. 4, which incur the latency of converting the NADF results to ADF results, forwarding the converted ADF results, and then reconverting to NADF operands.
  • Floating point operations may generate exception conditions, such as overflow or underflow. A side-effect of the NADF is that some results that would overflow/underflow in the ADF would not do so in the NADF, e.g., because of the larger exponent, as discussed above. Consequently, the forwarding of the NADF results 252/254 is speculative because the programmer may not want the instruction that receives the forwarded NADF result 252/254 to execute with a value that would cause an exception when converted to ADF. Therefore, in parallel with the speculative forwarding of NADF results 252/254, the converters 228/238 also perform the conversion to ADF, and if the conversion yields an overflow/underflow, then they generate an exception 172/174 on the forwarding instruction and the microprocessor 100 kills the instruction that executed using the speculatively forwarded NADF result, as described in more detail with respect to FIG. 3.
  • Referring now to FIG. 3, a flowchart illustrating an example of operation of the microprocessor 100 of FIG. 1 is shown. Flow begins at block 302.
  • At block 302, floating point unit 112A receives an instruction-B for execution. The mux 224 detects that one of the source operands is the NADF result 254 of a previous instruction-A that has been forwarded from the NADF adder 236 and accordingly selects the forwarded NADF result 254. The mux 224 may also select as the other operand the forwarded NADF result 252 from the NADF multiplier 226 or the converted NADF operands 272. Flow proceeds to block 304.
  • At block 304, the NADF multiplier 226 multiplies the NADF operands 266 to generate the NADF result 252 for instruction-B. Flow proceeds concurrently from block 304 to blocks 306 and 326.
  • At block 306, the forwarding buses forward the NADF result 252 of instruction-B to the NADF adder 236. Flow proceeds to block 308.
  • At block 308, floating point unit 112B receives an instruction-C for execution. The mux 234 detects that one of the source operands is the NADF result 252 of instruction-B that has been forwarded at block 306 from the NADF multiplier 226 and accordingly selects the forwarded NADF result 252. The mux 234 may also select as the other operand the forwarded NADF result 254 from the NADF adder 236 or the converted NADF operands 274. Flow proceeds to block 312.
  • At block 312, the NADF adder 236 adds the NADF operands 268 to generate the NADF result 254 for instruction-C. Flow ends at block 312, although it is understood that the forwarding of NADF results 252 and/or 254 may advantageously continue for a long sequence of instructions, thereby reducing latency and speeding up the execution of the sequence of instructions relative to the conventional floating point units 112 of FIG. 4 that include the ADF-to-NADF conversion and NADF-to-ADF conversion in the forwarding paths.
  • At block 322, the converter 228 converts the NADF result 252 of instruction-B to ADF result 162. Flow proceeds to decision block 324.
  • At decision block 324, the converter 228 determines whether the NADF result 252 of instruction-B creates an exception condition when converting to ADF. If so, flow proceeds to block 326; otherwise, flow proceeds to block 328.
  • At block 326, the converter 228 asserts the exception indicator 172 to the ROB 114. Consequently, the microprocessor 100 will take an exception, and the ROB 114 will flush instruction-C since instruction-C is newer in program sequence than instruction-B that caused the exception. This is necessary since the NADF result 252 of instruction-B was speculatively forwarded to the NADF adder 236 without knowledge of whether the NADF result 252 was a good operand, i.e., without knowledge of whether the NADF result 252 was a non-underflowed/overflowed value from an ADF perspective. That is, the programmer may not have desired instruction-C to execute with a non-good operand. However, advantageously the NADF results 252/254 are speculatively forwarded to potentially reduce the latency of instruction execution and in most cases both the forwarding and the receiving instructions will complete successfully. Flow ends at block 326.
  • At block 328, floating point unit 112A provides the ADF result 162 to the ROB 114 for storage in a temporary register therein. Flow proceeds to block 332.
  • At block 332, the ROB 114 retires the ADF result 162 from the temporary register to the appropriate GPR 118. Flow ends at block 332.
  • While various embodiments of the present invention have been described herein, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant computer arts that various changes in form and detail can be made therein without departing from the scope of the invention. For example, software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein. This can be accomplished through the use of general programming languages (e.g., C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs. Such software can be disposed in any known computer usable medium such as magnetic tape, semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.), a network, wire line, wireless or other communications medium. Embodiments of the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the exemplary embodiments described herein, but should be defined only in accordance with the following claims and their equivalents. Specifically, the present invention may be implemented within a microprocessor device which may be used in a general purpose computer. Finally, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims.

Claims (20)

We claim:
1. A microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands, the microprocessor comprising:
first and second floating-point units;
wherein the first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction;
wherein the second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction;
wherein the second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result;
wherein the microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
2. The microprocessor of claim 1, wherein the exception condition comprises an underflow or overflow.
3. The microprocessor of claim 1, wherein the ISA is an x86 ISA.
4. The microprocessor of claim 1, wherein the non-ADF comprises a larger number of bits than the ADF for specifying a floating-point operand exponent.
5. The microprocessor of claim 1, wherein the first floating-point unit is configured to use a speculatively forwarded second non-ADF result as a source operand to generate the speculatively forwarded non-ADF result associated with the first instruction, wherein the second non-ADF result is associated with a third instruction, wherein the microprocessor is further configured to cancel the first instruction, in response to determining that the second non-ADF result creates an exception condition when converted to an ADF result.
6. The microprocessor of claim 1, wherein the non-ADF result associated with the first instruction is a product and the non-ADF result associated with the second instruction is a sum.
7. The microprocessor of claim 1, wherein the non-ADF result associated with the first instruction is a sum and the non-ADF result associated with the second instruction is a product.
8. The microprocessor of claim 1, wherein the microprocessor is further configured to retire the ADF result to an architected register of the microprocessor, in response to determining that the non-ADF result does not create an exception condition when converted to the ADF result.
9. The microprocessor of claim 1,
wherein the second floating-point unit is further configured to speculatively forward a second non-ADF result generated by the second floating-point unit from the second floating-point unit to the second floating-point unit, wherein the second non-ADF result is associated with a third instruction, wherein the second floating-point unit is configured to speculatively forward the second non-ADF result associated with the third instruction concurrently with the first floating-point unit speculatively forwarding the non-ADF result associated with the first instruction to the second floating-point unit;
wherein the second floating-point unit is further configured to use the speculatively forwarded non-ADF result associated with the first instruction and the speculatively forwarded second non-ADF result associated with the third instruction as source operands to generate the result of the second instruction;
wherein the second floating-point unit is further configured to determine whether the second non-ADF result associated with the third instruction creates an exception condition when converted to an ADF result; and
wherein the microprocessor is further configured to cancel the second instruction, in response to determining that either of the non-ADF results associated with the first and third instructions creates an exception condition when converted to an ADF result.
10. A method for processing floating-point instructions in a microprocessor having first and second floating-point units, wherein the microprocessor has an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands, the method comprising:
speculatively forwarding a non-ADF result generated by the first floating-point unit from the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction;
using, by the second floating-point unit, the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction;
determining whether the non-ADF result creates an exception condition when converted to an ADF result; and
canceling the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
11. The method of claim 10, wherein the exception condition comprises an underflow or overflow.
12. The method of claim 10, wherein the ISA is an x86 ISA.
13. The method of claim 10, wherein the non-ADF comprises a larger number of bits than the ADF for specifying a floating-point operand exponent.
14. The method of claim 10, further comprising:
using, by the first floating-point unit, a speculatively forwarded second non-ADF result as a source operand to generate the speculatively forwarded non-ADF result associated with the first instruction, wherein the second non-ADF result is associated with a third instruction; and
canceling the first instruction, in response to determining that the second non-ADF result creates an exception condition when converted to an ADF result.
15. The method of claim 10, wherein the non-ADF result associated with the first instruction is a product and the non-ADF result associated with the second instruction is a sum.
16. The method of claim 10, wherein the non-ADF result associated with the first instruction is a sum and the non-ADF result associated with the second instruction is a product.
17. The method of claim 10, further comprising:
retiring the ADF result to an architected register of the microprocessor, in response to determining that the non-ADF result does not create an exception condition when converted to the ADF result.
18. The method of claim 10, further comprising:
speculatively forwarding a second non-ADF result generated by the second floating-point unit from the second floating-point unit to the second floating-point unit, wherein the second non-ADF result is associated with a third instruction, wherein said speculatively forwarding the second non-ADF result associated with the third instruction is performed concurrently with said speculatively forwarding the non-ADF result associated with the first instruction;
using, by the second floating-point unit, the speculatively forwarded non-ADF result associated with the first instruction and the speculatively forwarded second non-ADF result associated with the third instruction as source operands to generate the result of the second instruction;
determining whether the second non-ADF result associated with the third instruction creates an exception condition when converted to an ADF result; and
canceling the second instruction, in response to determining that either of the non-ADF results associated with the first and third instructions creates an exception condition when converted to an ADF result.
19. A computer program product encoded in at least one computer readable medium for use with a computing device, the computer program product comprising:
computer readable program code embodied in said medium, for specifying a microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands, the computer readable program code comprising:
first program code for specifying a first floating-point unit; and
second program code for specifying a second floating-point unit;
wherein the first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit, wherein the non-ADF result is associated with a first instruction;
wherein the second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction;
wherein the second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result;
wherein the microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result.
20. The computer program product of claim 19, wherein the at least one computer readable medium is selected from the set of a disk, tape, or other magnetic, optical, or electronic storage medium and a network, wire line, wireless or other communications medium.
US12/820,662 2009-09-09 2010-06-22 Speculative forwarding of non-architected data format floating point results Abandoned US20110060892A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US12/820,662 US20110060892A1 (en) 2009-09-09 2010-06-22 Speculative forwarding of non-architected data format floating point results
TW099128795A TWI450191B (en) 2009-09-09 2010-08-27 Microprocessor and methods for processing floating-point instructions
CN201010270067.9A CN101916182B (en) 2009-09-09 2010-08-30 Transmission of fast floating point result using non-architected data format

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US24075309P 2009-09-09 2009-09-09
US12/820,662 US20110060892A1 (en) 2009-09-09 2010-06-22 Speculative forwarding of non-architected data format floating point results

Publications (1)

Publication Number Publication Date
US20110060892A1 true US20110060892A1 (en) 2011-03-10

Family

ID=43648501

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/820,662 Abandoned US20110060892A1 (en) 2009-09-09 2010-06-22 Speculative forwarding of non-architected data format floating point results
US12/820,578 Active 2031-09-20 US8375078B2 (en) 2009-09-09 2010-06-22 Fast floating point result forwarding using non-architected data format

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/820,578 Active 2031-09-20 US8375078B2 (en) 2009-09-09 2010-06-22 Fast floating point result forwarding using non-architected data format

Country Status (2)

Country Link
US (2) US20110060892A1 (en)
TW (1) TWI450191B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110060785A1 (en) * 2009-09-09 2011-03-10 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format
US20150309799A1 (en) * 2014-04-25 2015-10-29 Broadcom Corporation Stunt box

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168682B (en) 2011-12-23 2021-01-26 英特尔公司 Instruction for determining whether a value is within a range

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619664A (en) * 1994-01-04 1997-04-08 Intel Corporation Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms
US5687106A (en) * 1995-03-31 1997-11-11 International Business Machines Corporation Implementation of binary floating point using hexadecimal floating point unit
US5878266A (en) * 1995-09-26 1999-03-02 Advanced Micro Devices, Inc. Reservation station for a floating point processing unit
US5996065A (en) * 1997-03-31 1999-11-30 Intel Corporation Apparatus for bypassing intermediate results from a pipelined floating point unit to multiple successive instructions
US20020095451A1 (en) * 2001-01-18 2002-07-18 International Business Machines Corporation Floating point unit for multiple data architectures
US20060179100A1 (en) * 2005-02-09 2006-08-10 International Business Machines Corporation System and method for performing floating point store folding
US20070226288A1 (en) * 2006-03-23 2007-09-27 Fujitsu Limited Processing method and computer system for summation of floating point data
US20110060785A1 (en) * 2009-09-09 2011-03-10 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5191335A (en) * 1990-11-13 1993-03-02 International Business Machines Corporation Method and apparatus for floating-point data conversion with anomaly handling facility
US7529912B2 (en) * 2002-02-12 2009-05-05 Via Technologies, Inc. Apparatus and method for instruction-level specification of floating point format
US8595279B2 (en) * 2006-02-27 2013-11-26 Qualcomm Incorporated Floating-point processor with reduced power requirements for selectable subprecision
GB2447968B (en) * 2007-03-30 2010-07-07 Transitive Ltd Improvements in and relating to floating point operations

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619664A (en) * 1994-01-04 1997-04-08 Intel Corporation Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms
US5687106A (en) * 1995-03-31 1997-11-11 International Business Machines Corporation Implementation of binary floating point using hexadecimal floating point unit
US5878266A (en) * 1995-09-26 1999-03-02 Advanced Micro Devices, Inc. Reservation station for a floating point processing unit
US5996065A (en) * 1997-03-31 1999-11-30 Intel Corporation Apparatus for bypassing intermediate results from a pipelined floating point unit to multiple successive instructions
US20020095451A1 (en) * 2001-01-18 2002-07-18 International Business Machines Corporation Floating point unit for multiple data architectures
US20060179100A1 (en) * 2005-02-09 2006-08-10 International Business Machines Corporation System and method for performing floating point store folding
US20070226288A1 (en) * 2006-03-23 2007-09-27 Fujitsu Limited Processing method and computer system for summation of floating point data
US20110060785A1 (en) * 2009-09-09 2011-03-10 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format
US8375078B2 (en) * 2009-09-09 2013-02-12 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110060785A1 (en) * 2009-09-09 2011-03-10 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format
US8375078B2 (en) 2009-09-09 2013-02-12 Via Technologies, Inc. Fast floating point result forwarding using non-architected data format
US20150309799A1 (en) * 2014-04-25 2015-10-29 Broadcom Corporation Stunt box
US10713049B2 (en) * 2014-04-25 2020-07-14 Avago Technologies International Sales Pte. Limited Stunt box to broadcast and store results until retirement for an out-of-order processor

Also Published As

Publication number Publication date
TWI450191B (en) 2014-08-21
TW201110019A (en) 2011-03-16
US20110060785A1 (en) 2011-03-10
US8375078B2 (en) 2013-02-12

Similar Documents

Publication Publication Date Title
US8386755B2 (en) Non-atomic scheduling of micro-operations to perform round instruction
US6487575B1 (en) Early completion of iterative division
JP4938844B2 (en) Mode-based multiply-add processor for denormalized operands
US8074060B2 (en) Out-of-order execution microprocessor that selectively initiates instruction retirement early
US8214417B2 (en) Subnormal number handling in floating point adder without detection of subnormal numbers before exponent subtraction
US7917568B2 (en) X87 fused multiply-add instruction
TW201617857A (en) Non-atomic split-path fused multiply-accumulate
US8046400B2 (en) Apparatus and method for optimizing the performance of x87 floating point addition instructions in a microprocessor
US8838665B2 (en) Fast condition code generation for arithmetic logic unit
US5884062A (en) Microprocessor with pipeline status integrity logic for handling multiple stage writeback exceptions
Raveendran et al. A RISC-V instruction set processor-micro-architecture design and analysis
US5991863A (en) Single carry/borrow propagate adder/decrementer for generating register stack addresses in a microprocessor
Quinnell et al. Bridge floating-point fused multiply-add design
US8375078B2 (en) Fast floating point result forwarding using non-architected data format
US7523152B2 (en) Methods for supporting extended precision integer divide macroinstructions in a processor
US8620983B2 (en) Leading sign digit predictor for floating point near subtractor
US6237085B1 (en) Processor and method for generating less than (LT), Greater than (GT), and equal to (EQ) condition code bits concurrent with a logical or complex operation
JP3122420B2 (en) Processor and condition code / bit calculation method
US8495343B2 (en) Apparatus and method for detection and correction of denormal speculative floating point operand
US7234044B1 (en) Processor registers having state information
Gilani et al. Virtual floating-point units for low-power embedded processors
EP3118737B1 (en) Arithmetic processing device and method of controlling arithmetic processing device
CN101916182B (en) Transmission of fast floating point result using non-architected data format
Andorno Design of the frontend for LEN5, a RISC-V Out-of-Order processor
JP2024025407A (en) Arithmetic processing device and processing method

Legal Events

Date Code Title Description
AS Assignment

Owner name: VIA TECHNOLOGIES, INC., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HENRY, G. GLENN;PARKS, TERRY;REEL/FRAME:024796/0881

Effective date: 20100729

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION