Optimization of DSP implementation in FPGA

Maximizing performance in FPGA systems

With programmable hard intellectual property like DSP building blocks, serdes and embedded processors, FPGAs have become complex systems-on-chip. As a result, extracting higher performance involves far more than just cranking up the fabric clock rate. Typically, one must balance a complex set of requirements-I/O bandwidth, hardware logic and/or embedded-processing performance.

Harnessing built-in FPGA features for maximum performance also takes the right combination of design techniques. Tool settings are needed that optimally implement the functional description as written in RTL code. Each phase of design development, synthesis and implementation is critical.

System architecture must be considered for effective trade-offs between programmable hardware resources. With the architecture defined and RTL code ready, synthesis tools assign the design's basic conceptual building blocks to technology cells.

In addition, designers must consider such technology mapping alternatives as trade-offs between hard IP and logic fabric. They must also use tool settings that extract maximum performance from the RTL code and the FPGA in which the design is implemented. Final implementation offers a chance to further optimize for higher performance with more-efficient packing, placement and routing. Here are some additional tips.


  • Evaluate the trade-offs between processor and programmable hardware implementations. Today's applications blend data-processing and control functions. Programmable hardware, usually logic and dedicated DSP blocks, can implement hardware algorithms that achieve nanosecond cycle times. But there are applications in which software algorithms implemented with embedded processors can provide sufficient performance and are easier to implement.
  • Employ pipelining techniques that will increase system clock frequency. Most FPGAs have abundant registers that generally have minimal impact on overall resource utilization in the design as they get packed with lookup tables. Pipelining reduces the number of combinatorial logic cells between flip-flops, thus reducing the maximum path delay. For example, some FPGAs offer fully cascadable multiply and accumulate blocks that, when configured as adder chains, can significantly improve performance for DSP filters.
  • Take advantage of the latest physical-synthesis tools. These tools reconcile front-end optimizations with the actual results derived from performing place and route. Physical synthesis tightly couples synthesis and place and route by making synthesis aware of actual timing bottlenecks early in the design. This ensures that synthesis optimizations are effectively applied to the appropriate places and that synthesis interacts with placement to deliver superior results.


  • Forget to take full advantage of the massive built-in parallelism that FPGAs offer through wider buses, bonding together several high-speed serial transceivers, or building fully parallel hardware-based algorithms using the fabric and the fully cascadable DSP blocks.
  • Forget to write your RTL description to direct the synthesis tool to use silicon-specific functions and capabilities that will enhance performance. Consider, for example, the integrated DSP block that some FPGAs offer. Properly implemented, these blocks provide ASIC-like performance, but performance can be severely impacted if the RTL coding style implies the use of an asynchronous reset. DSP blocks lose performance when this is done because their native reset is synchronous. A synchronous reset enables registers to be merged into the block, substantially improving performance and area.
  • Neglect to use retiming techniques, or the ability of the synthesis tool to move existing registers across technology cells in the design. Retiming preserves latency, so no sequential elements are added to the design. One caveat of retiming is that since some sequential elements have been moved, the design can be more difficult to verify. But retiming is usually confined to only a few paths in the design and synthesis reports notify the user about these changes.
  • Neglect to properly use the attributes that help determine how the design's conceptual blocks will be mapped onto technology cells. These attributes will direct the synthesis tool to use a certain type of cell vs. another. For example, synthesis tools would, by default, use the fabric to implement accumulators. Depending on the particular design, however, using a DSP block can be a better choice.

Adrian Cosoroaba, Virtex solutions manager at Xilinx Inc.

15:33 Gepost door Mobile blogger | Permalink | Commentaren (0) |  Facebook |

De commentaren zijn gesloten.