MCU AI

Embedded AI on
Microcontrollers

Running machine learning inference on STM32, ESP32, and ARM Cortex-M microcontrollers — the same hardware that manages sensors, actuators, and communication. No external AI accelerator required.

The Engineering Problem

MCU-class inference is not a matter of shrinking a cloud model and flashing it to hardware. The constraints are severe and non-negotiable: a microcontroller with 256KB of RAM cannot run a model that requires 4MB of working memory, regardless of how the problem is framed.

Effective embedded AI engineering starts from the hardware constraints and works backward to the model architecture. Target accuracy is balanced against memory footprint, inference latency, and power consumption from the beginning of the model development process — not at the end.

WIRL Engineering designs embedded AI systems where the MCU selection, firmware architecture, model design, and quantization strategy are co-optimized. Each component is chosen in the context of the others.

Engineering Challenges
01
Flash and RAM measured in kilobytes — model size must match the hardware reality
02
No floating-point unit on lower-tier MCUs — INT8 quantization is not optional
03
Single-core sequential execution — inference must coexist with firmware task scheduling
04
No operating system — model runtime must work in bare-metal or RTOS environments
05
Power consumption — inference duty cycles must fit within battery budgets
06
Temperature and environmental variation — model accuracy must hold across deployment conditions
Scope of Work
  • Model architecture selection for MCU resource constraints
  • TensorFlow Lite for Microcontrollers deployment
  • INT8 and INT4 post-training quantization
  • CMSIS-NN kernel optimization for Cortex-M targets
  • Edge Impulse project development and export
  • Bare-metal and RTOS inference integration
  • Flash footprint and RAM profiling
  • Inference latency benchmarking on target hardware
  • Production firmware integration and testing
Supported Hardware Targets
ESP32-S3
Dual-core Xtensa LX7, 8MB PSRAM available, hardware matrix acceleration
STM32H7
Cortex-M7 @ 480MHz, DTCM RAM, hardware FPU, ART Accelerator
Nordic nRF9160
Cortex-M33, integrated LTE-M/NB-IoT modem, low-power design
STM32U5 / Cortex-M33
Ethos-U55 NPU support, TrustZone, ultra-low-power run modes
RP2040
Dual Cortex-M0+, PIO for sensor interfacing, cost-effective deployment
Technology Stack
TensorFlow Lite for MCUsEdge Impulse SDKCMSIS-NNONNX RuntimeC / C++ARM CompilerFreeRTOSZephyrPython (training)TVM (optional)

AI Integration for
Your Hardware Platform

Discuss your MCU target and application requirements with our engineering team.