[フレーム]

Analog Matrix Computing: A Breakthrough for High-Performance Edge AI on Battery Power

By GP Singh

CEO and Founder

Ambient Scientific

October 30, 2025

Blog

Often in electronics system design, the parameters of power and performance are traded off against each other: an improvement in one parameter weakens the other.

In implementing AI inference on battery-powered edge devices, however, developers want more performance and less power consumption: latency determines the quality of the user experience, but the high processor performance which could produce a near real-time response must not come at the expense of battery charge cycle time. It’s a problem that conventional microcontrollers and processors have had limited success in solving.

There’s a fundamental physical reason for this, and it is rooted in the architecture of the silicon which performs the compute function in MCUs and many types of processors. The classic general-purpose compute block in digital chips today operates by fetching data from memory, processing it through an arithmetic logic unit, and spitting out the result to another memory. Performance and power consumption depend on the speed of the memory and its proximity to the logic unit, but the classic general-purpose compute silicon architecture constrains chip designers’ ability to optimize either parameter.

This architecture also prevents chip designers from matching the topology of the compute system to the structure of a neural network, resulting in grossly inefficient instruction set architectures (ISAs) for implementing neural network processing.

Now a new type of AI processor which draws on fundamental silicon innovations has provided a breakthrough in power and performance which makes on-device edge AI genuinely possible on battery power. This is the story of how a new AI-native compute architecture provides power/performance gains of more than two orders of magnitude over conventional microcontrollers and processors.

Fundamental new building block of an AI-native processor

There are two basic problems in the general-purpose compute function implemented in the main types of digital processor used for neural network processing today, whether it is a CPU, a neural processing unit (NPU), or a graphics processing unit (GPU):

  1. The processor operates by fetching data from memory – typically DRAM or SRAM – manipulating the data in an arithmetic logic unit (ALU), and spitting out the processed data to memory again. The processes both of fetching data from memory and outputting it to memory are highly wasteful of both time and power.
  2. Massively parallel neural networking operations, which consist mostly of multiply-accumulate (MAC) functions, map poorly on to the instruction set architectures (ISAs) of conventional processors. Compiling a neural network’s MAC operations to a conventional ISA results in highly wasteful use of processor cycles.

So optimizing the compute architecture for AI operations requires different approaches to both data access and to the topology of the compute function. This is how the innovations in the GPX family of processors have enabled Ambient Scientific to achieve two orders of magnitude improvements in power and performance.

The first innovation is the fundamental compute building block, the analog MAC unit (see Figure 1). This block provides a viable arrangement for co-locating data processing with memory, enabling the creation of in-memory computing cells, which eliminate the need to fetch data from and send data to external DRAM or SRAM.

As well as reducing the size of the MAC circuitry compared to a conventional digital compute block, the in-memory arrangement also eliminates wires, greatly reducing latency and power consumption. Alongside this innovation, Ambient Scientific has implemented a 3D memory structure on-chip, increasing the rate at which the analog MAC unit can process the high number of operands in a neural network’s matrix computations.

Fig. 1: the analog MAC unit enables processing of MAC operations without needing to access external DRAM memory

Silicon mapped to the topology of neural networks

The second key element of the silicon innovations introduced by Ambient Scientific is the topology of the processing blocks, which are configured as a matrix computer. A typical neural network, for example, can be represented as a 1x32x8 matrix (see Figure 2).

Fig. 2: a neural network can typically be represented as a matrix of input values and weights

This is mirrored in the structure of Ambient Scientific’s DigAn™ matrix computer, which is assembled in silicon from multiple analog MAC blocks (see Figure 3).

Fig. 3: a single matrix compute block

Multiple matrix computers are then connected in layers to match the layering of a neural network (see Figure 4).

Fig. 4: a multi-layer DigAn matrix computer

This is the physical realization of an AI-native processor, and the results are astonishing: 32 layers of a typical 1x32x8 neural network matrix would require 1,235,200 cycles for a conventional compute architecture to perform. In a DigAn matrix computer, this requires just 32 cycles.

So the silicon innovations developed by Ambient Scientific transform AI processing in two different but complementary ways:

  • They make compute operations faster and more efficient, by performing them in in-memory compute blocks
  • And they dramatically reduce the number of compute operations required for any given neural networking task

Ultra-low power matrix computing SoC for edge AI applications

To implement this matrix computing architecture at the chip level, Ambient Scientific has developed AI processor cores, called MX8 units (see Figure 5). These cores provide a highly scalable system architecture, enabling Ambient Scientific to implement its matrix computing silicon in every type of device, from small 10-core edge AI systems-on-chip (SoCs) up to large processors containing as many as 2,000 cores for use in data center servers.

Fig. 5: the MX8 AI processor core implements the AI-centric DigAn instruction set

The first chips in production to feature the MX8 cores are the GPX10 and the new GPX10 Pro edge AI SoCs. These ultra-low power devices are fully integrated AI controllers, featuring 10 DigAn cores, a multi-channel ADC, sensor fusion capability to connect up to 10 analog and digital sensors simultaneously, and an Arm® Cortex®-M4F CPU core for executing non-AI workloads (see Figure 6).

Fig. 6: GPX10 AI processor block diagram

The difference that the DigAn matrix computing architecture makes in real-world applications is extraordinary: peak AI performance of the GPX10 is 512 GOPs, comparable to the performance of popular edge-oriented GPUs, and far higher than the performance available from conventional microcontrollers on the market today.

But the power consumption is orders of magnitude lower than that of GPUs intended for use at the edge: around 80µW for peak AI performance, compared to 6W for edge GPUs. Essentially, what the GPX10 and GPX10 Pro offer is either more than 100x higher AI performance than a typical MCU with similar power consumption, or the same performance as a typical low-end GPU with more than 100x lower power consumption.

In other words, always-on AI inference at the edge, for functions such as keyword spotting, object recognition, and anomaly detection, is now possible at low latency and with a power profile suitable for operation on a small battery power supply. The GPX10 and GPX10 Pro chips are already being designed into numerous embedded edge products including:

  • Smart rings
  • Smart footwear
  • Smart helmets
  • Wearable health monitoring devices
  • Smart watches
  • Industrial machines
  • Livestock monitoring and other agricultural equipment

Rich development ecosystem supports electronics system designers

Embedded system developers working on these and other products can take advantage of an ecosystem of tools and resources to enable them to realize new design ideas.

The GPX family of processors is compatible with the main machine learning frameworks, including TensorFlow, PyTorch, Keras and ONNX. The Ambient Scientific software development kit (SDK) for GPX devices includes a full model training toolchain.

Once the application’s model has been trained, the developer uses Nebula, Ambient Scientific’s AI-centric, Eclipse-based integrated development environment (IDE) for GPX SoCs. This includes a tool for compiling AI models to MX8 cores, as well as tools for configuring middleware – device drivers, real-time operating system and so on – to run on the device’s Arm Cortex-M4F core.

This means that developers who choose to base edge AI designs on the GPX10 or GPX10 Pro can use familiar platform software for model development and achieve the same design productivity with the Ambient Scientific IDE as they are used to with conventional microcontrollers.

Fundamental silicon innovation achieves performance breakthrough

The Ambient Scientific story, then, is one of radical innovation: once it is accepted that AI is a fundamentally different type of computing operation from the general-purpose compute performed historically by microprocessors, it follows that AI applications require a completely different kind of compute function.

By realizing in silicon a new kind of AI-centric compute function, Ambient Scientific has been able to provide the combination of power and performance improvements that are required to enable true AI applications to run at the edge on battery power. While other types of AI processor products, whether MCUs, NPUs or GPUs, will continue to be hamstrung by the inherent inefficiency of their compute architecture, Ambient Scientific will scale its AI-native MX8 cores to cover the needs of AI applications from the edge to the cloud.

Subscribe
Categories
AI & Machine Learning
Analog & Power
IoT - Edge Computing
Consumer
MORE
Debug & Test
MORE
Storage
MORE
Networking & 5G
MORE

AltStyle によって変換されたページ (->オリジナル) /