IEEE ICASSP 2026 || Barcelona, Spain || 4-8 May 2026

IET-4: Edge AI and Efficient Intelligence

Thu, 7 May, 09:00 - 11:00 (UTC +2)

Location: Auditorium

IET-4.1: Signal Processing Inspired AI for Sensing - an Industry Perspective

Arpan Pal, TCS Research, Tata Consultancy Services Ltd.

Sensing is the key for creation of new perception modalities in cyber-physical system (CPS) applications – it provides the right data with relevant markers for the downstream AI inferencing applications - this is well-understood in the context of IoT and CPS. In this talk we introduce the concept of “AI for Sensing" – integration of advanced ML techniques at every stage of the sensing workflow, from transducer data acquisition/ calibration to signal enhancement/ denoising to signal representation/ fusion to meet diverse sensitivity/ specificity/ resolution/ dynamic range requirements for a given sensor type. This is a sensor-specific yet application-agnostic soft-sensing pipeline that can be computed on-board the sensing device. It needs to be lightweight enough to be embedded in the sensing device and low latency enough to enable closed-loop acquisition and calibration strategies, such as adaptive sampling, auto-gain-control, auto-filtering, beam steering and auto-calibration. The talk will cover ML/DNN based signal enhancement/ denoising; attention/ auto-encoder based learning of embeddings for signal representation from signal features followed by multi-modal early-fusion; and RL based closed-loop control for acquisition/calibration and will give evidence of how this pipeline is sensor type specific and is not dependant on the application. The talk will present a novel idea of how such a pre-trained pipeline can be created in a lightweight edge-deployable manner (low latency/ low memory/ low power) with examples from real life industry applications and how such a pipeline can be adopted for different make/model /configuration of the same sensor type. The output signal representation can be considered as unsupervised or pre-trained; can be considered as a master sensor-specific feature representation that can be seen as an equivalent of token or vocabulary in the sensing context. This can be used by any downstream AI/ML models pipeline to learn application specific features via application focussed supervised learning. The whole idea will be explained with real-life examples of ECG Sensing for cardiac conditions, Microwave radar based sensing for concealed imaging/ in-body imaging; Acousto-optic sensing for heat susceptibility; Quantum sensing for high-resolution/ high sensitivity electromagnetic field ; Nano-sensing based physiological sensing for early disease screening. The talk will outline how such sensor specific chips can lead to Sensor Specific Intelligent chips (SSICs) that can embed both the transducer and a "AI of Sensing" pipeline of data acquisition/calibration to signal enhancement/denoising to signal representation/fusion and present why such systems can be of real value for deployable systems. The talk will also briefly cover how AI can be used for design of sensing systems with an example of Plasmonic sensor design for micro-plastics detection in water. It will present how a Generative AI based system can be used for structural design of complex Plasmonic sensors. AI for sensing transforms traditional, handcrafted pipelines into intelligent, self-adapting closed-loop systems towards software-defined, scalable, and explainable sensing representations that are application-agnostic and can be used by downstream application specific systems. This intelligent sensing elevates the performance/reliability of sensing and it uses Signal Processing based signal morphology understanding to make the AI-based representation syntactically interpretable.

IET-4.2: Enabling End-to-End Ecosystem of Spatial-Temporal Gaussian Splatting

Guan-Ming Su, Dolby Labs

in the past few years, Gaussian Splatting has become the most promising volumetric video representation. For real-world scenario, a great amount of research efforts has been conducted in many perspectives for better spatial-temporal multi-modal scene reconstruction and more effective deployment. In this talk, we will first introduce the fundamental representation of Gaussian Splatting, including attributes, construction, rendering, and optimization. Then, we will overview the recent development of Gaussian Splatting along the end-to-end ecosystem from content capture, content creation, content delivery, to content consumption. More specifically, on the content capture stage, we will address issues and solutions for multi-view camera set up in both spatial and temporal domain to assist good quality model building. On the content creation stage, to enrich the multi-modal experience, learned audio and semantics attributes from foundation models, such as CLIP, DINO, etc., are embedded into the Gaussian Splatting primitives to enable the joint audio-visual-semantics representation. Volumetric video editing to enhance perceived experience is also an important tool along the pipeline. On the content delivery stage, instead of explicitly coding dozens of attributes per Gaussian, implicit methods such as using coarse geometry representations, 2D plane projection, and MLP to leverage the conventional 2D video codec to reach high photorealistic quality and streamable bit rate will be discussed. On the content consumption stage, language and semantics guided methods are presented to enable interactive 3D scene navigation and efficient physics-aware multi-modal rendering. At the end of this talk, we will present the latest international standardization efforts and highlight future research trends. In summary, the goal of this talk is to enable the ICASSP 2026 attendees to understand the latest technology for Gaussian Splatting in both theories and applications. The other goal is to enable more discussion for the volumetric video research in ICASSP 2026, and hope this talk can motivate attendees to identify the potential research topics along those directions and inspire more innovative solutions and technical papers to ICASSP 2027.

IET-4.3: Language Models on Microcontrollers: Achieving Cloud-Class AI in <32MB

Niall Lyons, Infineon

Thirty billion microcontrollers ship annually, powering everything from industrial sensors to medical devices. Yet AI remains trapped in the cloud, inaccessible to the vast majority of embedded systems. We've broken this barrier. Infineon's Nexus model family demonstrates that model efficiency begins with data, not just architecture. Our sophisticated data curation pipeline, quality filtering, synthetic data generation, and strategic dataset composition enables 8M-25M parameter models to achieve capabilities typically requiring 10-100x more parameters. Leveraging Infineon's unique position in embedded silicon for hardware-software co-design, we developed novel quantization techniques and hardware-optimized attention mechanisms that achieved 3rd place globally on HuggingFace's LLM Edge leaderboard. Our 25M parameter model outperforms 1.5B parameter models while ranking behind only 2B parameter solutions, a 60-80x parameter efficiency advantage. Running entirely on microcontrollers with under 32MB memory, this fundamentally changes what's possible at the edge. This efficiency unlocks entirely new markets. Battery-powered industrial sensors now perform intelligent audio analysis for predictive maintenance, enabling extended multi-year operation without network connectivity. Medical wearables process patient speech on-device with sub-50ms latency, maintaining HIPAA compliance while enabling real-time health monitoring. Smart home devices achieve always-on voice activation at minimal power consumption, impossible with cloud-dependent solutions. Consumer devices with $1-5 BOMs gain AI capabilities previously reserved for premium products. These aren't theoretical, pilot deployments are currently validating these capabilities across industrial, consumer, and medical applications. Our architecture extends beyond text to simultaneously enable speech-to-text, text-to-speech, and audio classification, all on the same PSOC™ Edge platform. A single microcontroller can understand voice commands, generate speech responses, and classify environmental sounds concurrently, transforming passive sensors into intelligent systems capable of rich environmental understanding through multiple modalities. We present the complete pipeline: curated training datasets specifically designed for parameter efficiency, distributed training frameworks optimized for small-scale models, aggressive quantization and optimization techniques, and seamless deployment tooling for embedded conversion. This end-to-end workflow enables rapid iteration from research to production-ready firmware, addressing the critical gap that has historically prevented edge AI adoption at scale. The implications extend beyond technical achievement. When AI inference costs approach zero and operate entirely offline, new business models emerge. Privacy-critical applications become viable. Battery-powered devices gain intelligence without infrastructure dependencies. This presentation demonstrates that the future of AI isn't solely about frontier models, it's about making sophisticated intelligence accessible everywhere, enabling billions of existing devices to gain capabilities previously impossible at their price point, power budget, and connectivity constraints.