Analog Matrix Computing: A Breakthrough for High-Performance Edge AI on Battery Power
October 30, 2025
Blog
Often in electronics system design, the parameters of power and performance are traded off against each other: an improvement in one parameter weakens the other.
In implementing AI inference on battery-powered edge devices, however, developers want more performance and less power consumption: latency determines the quality of the user experience, but the high processor performance which could produce a near real-time response must not come at the expense of battery charge cycle time. It’s a problem that conventional microcontrollers and processors have had limited success in solving.
There’s a fundamental physical reason for this, and it is rooted in the architecture of the silicon which performs the compute function in MCUs and many types of processors. The classic general-purpose compute block in digital chips today operates by fetching data from memory, processing it through an arithmetic logic unit, and spitting out the result to another memory. Performance and power consumption depend on the speed of the memory and its proximity to the logic unit, but the classic general-purpose compute silicon architecture constrains chip designers’ ability to optimize either parameter.
This architecture also prevents chip designers from matching the topology of the compute system to the structure of a neural network, resulting in grossly inefficient instruction set architectures (ISAs) for implementing neural network processing.
Now a new type of AI processor which draws on fundamental silicon innovations has provided a breakthrough in power and performance which makes on-device edge AI genuinely possible on battery power. This is the story of how a new AI-native compute architecture provides power/performance gains of more than two orders of magnitude over conventional microcontrollers and processors.
Fundamental new building block of an AI-native processor
There are two basic problems in the general-purpose compute function implemented in the main types of digital processor used for neural network processing today, whether it is a CPU, a neural processing unit (NPU), or a graphics processing unit (GPU):
- The processor operates by fetching data from memory – typically DRAM or SRAM – manipulating the data in an arithmetic logic unit (ALU), and spitting out the processed data to memory again. The processes both of fetching data from memory and outputting it to memory are highly wasteful of both time and power.
- Massively parallel neural networking operations, which consist mostly of multiply-accumulate (MAC) functions, map poorly on to the instruction set architectures (ISAs) of conventional processors. Compiling a neural network’s MAC operations to a conventional ISA results in highly wasteful use of processor cycles.
So optimizing the compute architecture for AI operations requires different approaches to both data access and to the topology of the compute function. This is how the innovations in the GPX family of processors have enabled Ambient Scientific to achieve two orders of magnitude improvements in power and performance.
The first innovation is the fundamental compute building block, the analog MAC unit (see Figure 1). This block provides a viable arrangement for co-locating data processing with memory, enabling the creation of in-memory computing cells, which eliminate the need to fetch data from and send data to external DRAM or SRAM.
As well as reducing the size of the MAC circuitry compared to a conventional digital compute block, the in-memory arrangement also eliminates wires, greatly reducing latency and power consumption. Alongside this innovation, Ambient Scientific has implemented a 3D memory structure on-chip, increasing the rate at which the analog MAC unit can process the high number of operands in a neural network’s matrix computations.
Fig. 1: the analog MAC unit enables processing of MAC operations without needing to access external DRAM memory
Silicon mapped to the topology of neural networks
The second key element of the silicon innovations introduced by Ambient Scientific is the topology of the processing blocks, which are configured as a matrix computer. A typical neural network, for example, can be represented as a 1x32x8 matrix (see Figure 2).
Fig. 2: a neural network can typically be represented as a matrix of input values and weights
This is mirrored in the structure of Ambient Scientific’s DigAn™ matrix computer, which is assembled in silicon from multiple analog MAC blocks (see Figure 3).
Fig. 3: a single matrix compute block
Multiple matrix computers are then connected in layers to match the layering of a neural network (see Figure 4).
Fig. 4: a multi-layer DigAn matrix computer
This is the physical realization of an AI-native processor, and the results are astonishing: 32 layers of a typical 1x32x8 neural network matrix would require 1,235,200 cycles for a conventional compute architecture to perform. In a DigAn matrix computer, this requires just 32 cycles.
So the silicon innovations developed by Ambient Scientific transform AI processing in two different but complementary ways:
- They make compute operations faster and more efficient, by performing them in in-memory compute blocks
- And they dramatically reduce the number of compute operations required for any given neural networking task
Ultra-low power matrix computing SoC for edge AI applications
To implement this matrix computing architecture at the chip level, Ambient Scientific has developed AI processor cores, called MX8 units (see Figure 5). These cores provide a highly scalable system architecture, enabling Ambient Scientific to implement its matrix computing silicon in every type of device, from small 10-core edge AI systems-on-chip (SoCs) up to large processors containing as many as 2,000 cores for use in data center servers.
Fig. 5: the MX8 AI processor core implements the AI-centric DigAn instruction set
The first chips in production to feature the MX8 cores are the GPX10 and the new GPX10 Pro edge AI SoCs. These ultra-low power devices are fully integrated AI controllers, featuring 10 DigAn cores, a multi-channel ADC, sensor fusion capability to connect up to 10 analog and digital sensors simultaneously, and an Arm® Cortex®-M4F CPU core for executing non-AI workloads (see Figure 6).
Fig. 6: GPX10 AI processor block diagram
The difference that the DigAn matrix computing architecture makes in real-world applications is extraordinary: peak AI performance of the GPX10 is 512 GOPs, comparable to the performance of popular edge-oriented GPUs, and far higher than the performance available from conventional microcontrollers on the market today.
But the power consumption is orders of magnitude lower than that of GPUs intended for use at the edge: around 80µW for peak AI performance, compared to 6W for edge GPUs. Essentially, what the GPX10 and GPX10 Pro offer is either more than 100x higher AI performance than a typical MCU with similar power consumption, or the same performance as a typical low-end GPU with more than 100x lower power consumption.
In other words, always-on AI inference at the edge, for functions such as keyword spotting, object recognition, and anomaly detection, is now possible at low latency and with a power profile suitable for operation on a small battery power supply. The GPX10 and GPX10 Pro chips are already being designed into numerous embedded edge products including:
- Smart rings
- Smart footwear
- Smart helmets
- Wearable health monitoring devices
- Smart watches
- Industrial machines
- Livestock monitoring and other agricultural equipment
Rich development ecosystem supports electronics system designers
Embedded system developers working on these and other products can take advantage of an ecosystem of tools and resources to enable them to realize new design ideas.
The GPX family of processors is compatible with the main machine learning frameworks, including TensorFlow, PyTorch, Keras and ONNX. The Ambient Scientific software development kit (SDK) for GPX devices includes a full model training toolchain.
Once the application’s model has been trained, the developer uses Nebula, Ambient Scientific’s AI-centric, Eclipse-based integrated development environment (IDE) for GPX SoCs. This includes a tool for compiling AI models to MX8 cores, as well as tools for configuring middleware – device drivers, real-time operating system and so on – to run on the device’s Arm Cortex-M4F core.
This means that developers who choose to base edge AI designs on the GPX10 or GPX10 Pro can use familiar platform software for model development and achieve the same design productivity with the Ambient Scientific IDE as they are used to with conventional microcontrollers.
Fundamental silicon innovation achieves performance breakthrough
The Ambient Scientific story, then, is one of radical innovation: once it is accepted that AI is a fundamentally different type of computing operation from the general-purpose compute performed historically by microprocessors, it follows that AI applications require a completely different kind of compute function.
By realizing in silicon a new kind of AI-centric compute function, Ambient Scientific has been able to provide the combination of power and performance improvements that are required to enable true AI applications to run at the edge on battery power. While other types of AI processor products, whether MCUs, NPUs or GPUs, will continue to be hamstrung by the inherent inefficiency of their compute architecture, Ambient Scientific will scale its AI-native MX8 cores to cover the needs of AI applications from the edge to the cloud.
Consumer
-
Arm Delivers Personal AI on Consumer Devices with New Lumex CSS Platform
September 10, 2025
-
TDK Announces High-Performance IMU to Accelerate Optical Image Stabilization Adoption
August 08, 2025
-
Application Highlight: Infineon Makes HMI the Main Event
August 04, 2025
-
Status Audio and Knowles Redefine Earbud Performance with Hybrid Driver Technology
July 15, 2025
Debug & Test
-
Rohde & Schwarz MXO 3 Series Brings Advanced MXO Technology to Cost-Effective Compact Designs
October 20, 2025
-
Teradyne Titan HP Platform Delivers High-Power, Real-World SLT for AI and Cloud Devices
October 13, 2025
-
The Road to embedded world North America: PLS Showcases UDE Universal Debug Engine for Multicore Debugging
October 09, 2025
-
Embedded Computers Ease Semiconductor Test Challenges
September 26, 2025
Storage
-
SOCAMM: The New Memory Kid on the AI Block
August 28, 2025
-
Macronix Introduces Secure-Boot NOR Flash Memory
August 20, 2025
-
Goodram Enterprise SSDs Offer Power Loss Protection and Metadata Security for Critical Systems
August 14, 2025
-
Ampex’s TuffCORD X Brings NVMe Architecture to Mission-Critical Systems
August 04, 2025
Networking & 5G
-
Product of the Week: Infineon Technologies’ AIROC CYW55913 Connected Microcontroller
October 20, 2025
-
Gateworks and Morse Micro Partner to Bring Wi-Fi HaLow to Industrial IoT
October 15, 2025
-
Mouser Product of the Week: NXP Semiconductors’ IW610 IoT Optimized Wi-Fi 6 Tri-Radio Modules
October 06, 2025
-
Morse Micro Announces Mass Production of MM8108 Wi-Fi HaLow SoC, Modules, Evaluation Kit, and HaLowLink 2
September 23, 2025