Implementing Floating Point Algorithms in FPGA or ASICS

2025-04-02 10:22:13 744

Floating point is the most popular data type for modeling and simulating algorithms with high computational accuracy. Traditionally, when you want to deploy these floating-point algorithms to FPGA or ASIC hardware, your only option is to convert each data type in the algorithm to fixed point to save hardware resources and speed up computation. Converting to fixed-point reduces mathematical precision, and sometimes it can be difficult to strike the right balance between word length and mathematical precision of a data type during the conversion process. For calculations requiring high dynamic range or high precision (e.g., designed with feedback loops), fixed-point conversion can take weeks or months of engineering time. In addition, in order to achieve digital accuracy, designers must use large fixed glyphs.

In this paper, we will introduce the workflow of the math work inherent floating-point application ir filter design ASIC/FPGA. Then, we will review the challenges of using fixed-point, and we will compare the use of single-precision floating-point with frequency tradeoffs. Fixed point. We will also show how the combination of floating-point and fixed-point can give you higher accuracy while reducing conversion and implementation time in real-world designs. You'll see how modeling directly in floating point is important, and how it can significantly reduce area and increase speed in real-world designs with high dynamic range requirements, contrary to the common belief that fixed point is always more efficient than floating point.

Native Floating-Point Implementation: Under the Hood

The HDL encoder implements the single-precision algorithm, simulating the underlying math on FPGA or ASIC resources (Figure 1). The generated logic decodes the input floating-point signal into sign, exponent, and mantissa-single integers of 1, 8, and 23 bits wide, respectively.

Figure 1 How the hdl encoder maps a single-precision floating-point multiplication to fixed-point hardware resources

The generated vhdl or Viilog logic then performs a floating-point computation (multiplication in the case shown in Figure 1) by calculating the sign bits generated by the input sign bits, the number multiplication, and the exponent required to compute the result and the corresponding normalization. The final stage of the logic packs the sign, exponent, and mantissa back into the floating-point data type.

 

Solving Dynamic Range Problems with Fixed-Point Conversions

A simple expression, such as (1-A)/(1+A), that needs to be implemented over a high dynamic range can be translated naturally by using single-precision floating-point (Figure 2).

Figure 2. Single-precision implementation of (1-a)/(1+a)

However, implementing the same equation at a fixed-point requires many steps and numerical considerations (Figure 3).

Figure 3. Fixed-point implementation of (1-a)/(1+a)

For example, you must divide division into multiplication and reciprocity, perform nonlinear reciprocal operations using approximation methods such as Newton-Raphson or LUT (look-up table), use different data types to carefully control bit growth, choose appropriate numerator and denominator types, and use specific output types and accumulator types for expansions and subtractions.

 

Exploring CIR implementation options

Let's look at an example of an infinite impulse response filter. An ir filter requires high dynamic range computation with a feedback loop, making it difficult to converge to a fixed point of quantization. Figure 4A shows a test environment that compares three versions of the same IRR filter with a noisy sine wave input. The amplitude of the sine wave is 1, and the added noise increases the amplitude slightly.

Figure 4A Realization of three ir filters with noisy sine wave inputs.

The first version of the filter is double precision (Figure 4B). The second version is single precision. The third version is a fixed-point realization (Fig. 4C). This implementation results in a data type with up to 22 bits, one of which is assigned to the sign and 21 to the fraction. This particular data type leaves 0 bits to represent integer values, which is reasonable since, given a stimulus, it will always have a value range between -1 and 1. If the design must use different input values, this needs to be taken into account during fixation quantization.

Figure 4B Iir_filter implementation, shown with double exact data type Figure 4c Iir_filter_fixed implementation using fixed-point data type

A test environment was set up to compare the results of the single-precision and fixed-point filters with the double-precision filter, which is considered the golden reference. In both cases, the loss of precision produces some error. The question is whether that error is within the acceptable tolerance of our application.

When we ran Fixed Point Designer to perform the conversion, we specified an error tolerance of 1%. Figure 5 shows the results of the comparison. The single precision version has an error of around 10. -8 , while the fixed-point data type has an error of about 10 -5 . This is within our specified error range. If your application requires higher precision, you may want to increase the fixed word length.

Figure 5 Simulation results comparing double-precision IRR filter results with single-precision results (top) and fixed-point results (bottom).

This kind of quantized fusion requires experience in hardware design, a thorough understanding of possible system inputs, explicit precision requirements, and some help from the fixed-point designer. If this helps to narrow down the algorithm for production deployment, then the effort is worthwhile. But what about situations where simple deployment to prototype hardware is required, or where accuracy requirements make it difficult to reduce the physical footprint? In these cases, one solution is to use single precision local floating point.

 

Simplifying the Process with Native Floating Point

There are two advantages to using native floating point.

- You don't need to spend time trying to analyze the minimum number of bits required to keep a wide variety of input data accurate enough.

- The dynamic range of single-precision floating-point operations is more efficient at a fixed cost of 32 bits.

Now, the design process is much simpler, and you know that with sign, exponent, and trailing bits, you can represent a wide dynamic range of numbers. The table in Figure 6 compares the floating-point resource utilization using the data type selection shown in Figure 5 with the fixed-point implementation of the ir filter.

Figure 6 Comparison of resource utilization for fixed-point and floating-point implementations of the ir filter

When you compare the results obtained from the floating-point and fixed-point implementations, keep in mind that floating-point computations require more operations than simple fixed-point algorithms. Using single precision will result in higher physical resource usage when deployed to an FPGA or ASIC. If circuit area is an issue, then you will need to trade off higher precision and resource usage. You can also use a combination of floating-point and fixed-point to reduce area while maintaining single precision to achieve high dynamic range for digitally intensive computations.

Tags:

Share

Related News more>

Intel Foundry Direct Connect 2025: Advancing 14A and 18A Technologies, Strengthening Ecosystem Partnerships
Today, the 2025 Intel Foundry Direct Connect kicked off with Intel sharing the latest advances in multi-generation core process and advanced packaging technologies, and announcing new ecosystem programs and partnerships. Also, industry sectors convened to discuss how Intel's system-level foundry model fosters synergies with partners to help customers advance innovation. In his opening remarks, Intel CEO Lip-Bu Tan discussed Intel Foundry's progress and future development priorities, highlighting that the co....
10CL016YU256C8G FPGAs: Features, Alternatives and Applications
10CL016YU256C8G Description The 10CL016YU256C8G is an FPGA (Field Programmable Gate Array) from Intel’s Cyclone 10 LP family, designed for low power consumption and cost-sensitive applications. Built using a 60 nm process, it balances programmable logic density and power efficiency, making it ideal for edge devices, industrial controls, and automotive applications. It offers strong performance with simplified power management in a compact 256-pin UFBGA package. 10CL016YU256C8G Features Logic Element....
Understanding Sourcing Current, Sinking Current, and Absorbing Current in Digital Circuits
I. What are pull current, sink current and absorption current?Pull currentPull current refers to the current that is actively supplied to the load when the output is in a high level state. When a digital circuit outputs a high level, the output will provide current to the external circuit, and this current is the pull current. The pull current reflects the chip's ability to drive when outputting a high level. Potting CurrentThe potting current is the current that flows into the chip from the external circui....
What is a Microcontroller Unit (MCU)? Uses, Features, and Applications Explained
What is Microcontroller Unit? The microcontroller unit, sometimes referred to as the central processor unit (CPU), is a single chip that combines LCD driver circuitry, memory, counters, USB, A/D conversion, UART, PLC, DMA, and other peripheral interfaces. As a result, a chip-level computer is produced that can execute different control combinations for different applications. There are numerous ways to control different applications with the chip. The figure of the Microcontroller Unit may show devices like....