Oct 28, 2025 / 6 min read
In-depth technical articles, white papers, videos, webinars, product announcements and more.
Advanced Instruction Fusion in Synopsys ARC-V Processor introduces a novel mechanism for fusing common pairs of RISC-V instructions, aimed at improving processor pipeline efficiency, particularly for resource constrained embedded processors. It extends a single-issue in-order processor to support dual instruction issue by fusing instructions from different functional units. Importantly, this does not introduce new instructions but maintains full RISC-V compatibility and is software agnostic, ensuring seamless integration with existing software and hardware environments. By reducing pipeline overhead and simplifying instruction handling, Advanced Instruction Fusion delivers significant efficiency improvements for embedded processors. The approach also provides adaptable design principles and can be extended from dual to multi-instruction fusion options to benefit RISC-V processor implementations across the ecosystem.
As embedded systems continue to evolve, designers face the growing challenge of balancing tight power and cost constraints with the need for higher performance and increasingly heterogeneous processing architectures. This shift is driven in part by the rapid expansion of edge AI, where more workloads are being pushed closer to the data source, demanding smarter, more capable embedded solutions. At the same time, the open-standard RISC-V architecture is gaining momentum, particularly in microcontroller units (MCUs), which are leading in adoption and shipment volumes. These processors must meet extreme power efficiency, safety and reliability standards while supporting complex workloads at the edge.
The RISC-V ISA was designed to be simple and modular, utilizing many simple instructions to minimize CPU power consumption and area. However, this instruction verbosity can introduce performance limitations, as complex operations require more cycles to execute.
While techniques like dual-issue, multi-issue and out-of-order execution can boost Instructions per Cycle (IPC) and performance, they often increase area requirements, posing challenges for resource-constrained embedded processors.
Instruction fusion is a well-known technique that exploits available hardware resources to increase instruction level parallelism (ILP) [1], [2]. Instruction fusion offers a way to enhance ILP and CPU performance with minimal area overhead, making it particularly beneficial for improving performance density in small, in-order processors.
This article describes a novel Advanced Instruction Fusion technique for fusing pairs of RISC-V instructions at the micro-architectural level. This technique captures the main efficiency benefits of a dual-issue processor, while maintaining RISC-V compatibility and avoiding the need for a separate pipeline.
Architectural fusion vs. micro-architectural fusion
Some ISAs fuse instructions at the architectural level, and some ISAs leave the option to fuse instructions at the micro-architectural/implementation level. Typical examples are load/store pair and load/store with auto-increment. In some ISAs (e.g.: ARM and ARC) these are fused at the architectural level, i.e.: these are performed by a single instruction. Other ISAs (e.g.: RISC-V) take a different approach by keeping architectural instructions simple and delegate to the implementation to perform fusion at the micro-architecture level.
The main advantages of microarchitectural fusion compared to architectural fusion are:
Implementing fusion at the micro-architecture level requires the processor to have sufficient instruction fetch bandwidth. A simple RISC ISA (e.g.: RISC-V) is very verbose and therefore consumes more instruction fetch bandwidth than ISAs that perform instruction fusion at the architectural level.
Simple in-order single issue processors usually have an instruction fetch bandwidth no greater than 4 bytes per cycle. This imposes a severe limitation on micro-architectural fusion. Most fusion pairs would need to be 16-bit compressed instructions.
Therefore, the first step to exploit micro-architecture fusion in resource constrained embedded processors is to increase its instruction fetch bandwidth.
Advanced Instruction Fusion in resource constrained RISC-V designs
A traditional fusion pair does not require additional read or write register file bandwidth. Just like any other RISC-V instruction, a fusion pair would read at most two source operands from the register file and produce at most one result. There are, however, fusion pair candidates that break this rule:
To take advantage of these advanced fused pairs (load-double, store-double, and MAC) requires additional hardware resources. More specifically: the register file should be able to provide three source operands, and the addition of a second register-file write port.
The advanced instruction fusion technique adds additional hardware resources, and it increases its utilization. It does so by leveraging the micro-architecture fusion framework to enable limited dual-issue capabilities on an in-order processor. With this approach, any two independent instructions that map to different functional units, require up to three source operands, and produce no more than two destination registers can be considered candidates for advanced fusion (dual-issue).
The instructions are fused in the front-end with pre-decoded information about opcode and register operand identifiers. The pre-decoded register operand identifiers are used to detect the absence of data dependencies between a pair of advanced fused instructions. The decoder is augmented to receive additional information about the fused instruction, but it is not duplicated. Each instruction of the fused pair is dispatched to its respective functional unit. The back-end of the processor is mostly agnostic to instruction fusion, except for the increment of the architectural PC and handling of exceptions triggered by fused instructions. There is no need to introduce a separate pipeline.
Figure 1 illustrates typical implementation of a RISC-V processor front end with Advanced Instruction Fusion. Some examples include the following instruction pairs: LOAD+ALU, LOAD+BR, LOAD+MPY, ST+BR, ST+ALU.
Pollack's Rule states that performance improvements from microarchitectural enhancements generally scale with the square root of increased complexity. Figure 2 shows that for the Synopsys ARC-V RMX, a RISC-V embedded processor with Advanced Instruction Fusion, the measured CoreMark/MHz indicates performance gains that scale linearly with silicon area, resulting in a greater performance benefit. Additionally, performance density improvements may be even more substantial, as advanced instruction fusion incur only a fixed area overhead.
Advanced Instruction Fusion presents an effective approach for enhancing processor pipeline efficiency in resource-constrained embedded systems by fusing common pairs of RISC-V instructions. By enabling dual instruction issue on single-issue in-order processors through the fusion of instructions from different functional units, this technique achieves notable performance gains without introducing new instructions or requiring software modifications, thus maintaining full RISC-V compatibility. The reduction in pipeline overhead and simplification of instruction handling result in significant efficiency improvements, while the adaptable design allows for future extensions to multi-instruction fusion. Overall, Advanced Instruction Fusion offers a practical and scalable solution to efficiently improve performance in RISC-V processor implementations throughout the ecosystem.
To learn about Synopsys ARC-V Processor IP, please visit our ARC-V Processor IP webpage.
Ideal for Embedded Systems: The technique is tailored for small, in-order processors where area and power constraints limit traditional multi-issue or out-of-order designs.
Performance Boost Without ISA Changes: Advanced Instruction Fusion improves IPC and pipeline efficiency without introducing new instructions or breaking RISC-V compatibility.
Microarchitectural Fusion Advantage: Fusion at the microarchitecture level allows more flexible and aggressive optimizations compared to architectural fusion.
Hardware-Efficient Dual Issue: Enables dual instruction issue by fusing instructions from different functional units, requiring modest hardware enhancements like additional register file ports.
Scalable Design: The fusion framework is adaptable and can be extended to support multi-instruction fusion, paving the way for broader adoption across the RISC-V ecosystem.
[1] Exploring Instruction Fusion Opportunities in General Purpose Processors; Sawan Singh, Arthur Perais, Alexandra Jimborean, Alberto Ros
[2] The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V; Christopher Celio, Daniel Dabbelt, David A. Patterson Krste Asanović
Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.