IET-2: Audio Technology and Consumer Audio Innovation
Wed, 6 May, 09:00 - 11:00 (UTC +2)
Location: Auditorium
IET-2.1: Immersive Audio via Headphones: Status and New Solutions
Currently, nearly all audio productions for movies and music are mastered to both traditional two channel stereo sound and newer multichannel formats. These include Dolby Atmos, MPEG-H/Sony-360 and Eclipsa audio. To listen to these sounds, until now many loudspeakers (e.g., 12 at an 7.1.4 arrangement) are necessary. There has been work on headphone reproduction for immersive audio, but all current software-based systems pale when compared to a loudspeaker-based solution.
The talk will both introduce the basics of spatial audio including standards like MPEG-H, and give details of earlier advancements in headphone-based reproduction over the last 50 years.
The company of the presenter introduced a state-of-the-art headphone-based system in early 2025. This excels in the plausibility of the reproduction of sounds via headphones. It is currently targeted at professional users at mixing studios or post-production facilities as well as at schools educating the next generation of audio engineers. There are already plans for a second generation of these products for users craving the best audio fidelity, who want to have a headphone-based solution.
A multichannel system with the best plausibility should reproduce sound in a room so the listener doesn’t have the feeling of wearing headphones at all. This necessitates the virtual reproduction of sound in a way that it is nearly impossible to distinguish from real sound sources like loudspeakers in the room.
Technically, this process relies on measuring the room’s acoustics so that rendering can be performed using a simplified model of the room. Additionally, fast and accurate 6DoF (six degrees of freedom) head tracking is required, allowing the algorithm to estimate the room’s impulse responses from a virtual sound source to the listener’s headphone position. Thus, the main cues necessary to trick the human brain into perceiving virtual sources as real are reconstructed as needed for a plausible reproduction of sounds via headphones.
IET-2.2: Building Dolby Atmos FlexConnect: From Research Project to Product
Dolby Atmos FlexConnect (DAFC) is an adaptive home‑audio solution designed to operate with an arbitrary number of loudspeakers placed freely throughout a room, without requiring predefined layouts. The system begins by performing acoustic mapping to localize each speaker, estimating its position, distance, and orientation. These estimations feed into a flexible rendering framework that renders the Dolby Atmos or multichannel soundtrack in real time to maintain a faithful and stable soundstage across highly asymmetric speaker configurations.
A significant portion of the DAFC effort centered on developing algorithms that remain reliable under practical, uncontrolled conditions. The mapping pipeline had to handle uncertainty, ambiguities, reflections, and occasionally contradictory information arising from real acoustic environments. The rendering stage required preserving the spatial and timbral characteristics of the original mix despite irregular geometries and variable device capabilities. In both areas, we employed a combination of signal‑processing methods, optimization‑based approaches, and data‑driven techniques to find the most appropriate solution for each problem. Much of the algorithmic insight arose from exploring how to generalize solutions beyond ideal test settings and how to design behaviors that degrade gracefully when assumptions inevitably break.
Equally important were the engineering and product‑focused challenges that shaped the final system. Bringing DAFC from a research prototype to a deployable technology required adapting algorithms to embedded hardware with strict constraints on compute, memory, and power, all influenced by device Bill of Materials (BOM) costs. Integration with partner devices introduced variability in acoustics, transducer performance, and wireless synchronization characteristics. This demanded extensive stress testing, data collection, and automated evaluation across diverse room configurations. These constraints guided algorithmic simplifications, robustness strategies, and the overall system architecture.
By presenting both the algorithmic foundations and the practical steps required to transform them into a shipping product, the talk aims to provide ICASSP attendees with insights relevant to real‑world algorithmic and engineering work: how theoretical approaches evolve under practical pressures, how robustness becomes a central design principle, and how interdisciplinary iteration enables the deployment of complex audio technology in everyday environments. The goal is to convey lessons that are technically grounded, broadly applicable, and motivating to researchers developing systems intended for real use.
IET-2.3: From ANC to Blood Pressure: How Earbuds Are Becoming Multimodal Health Sensors
True wireless earbuds have become one of the most ubiquitous computing platforms we wear—yet we still mostly use them for audio. This talk provides a general overview of the emerging technology area of in-ear sensing and explains why earables are poised to follow the smartwatch trajectory: from convenience features to sensor-first, health-oriented systems. The ear is a particularly attractive measurement site because it combines stable skin contact, rich local vasculature, natural vibration damping from the musculoskeletal system, and a built-in acoustic interface for privacy-preserving feedback and just-in-time interventions.
I will survey the field from early earable computing platforms (IMU + microphone) that enabled head-gesture, activity, diet (chewing/drinking), and facial-expression inference using lightweight time-frequency features and compact classifiers, to modern multimodal earables that add optical biosensing (PPG), temperature, multiple microphones, storage, and on-device machine learning. For the signal processing community, the key point is that earables are constrained, real-time, multi-rate sensing systems where algorithm design must co-optimize accuracy, latency, memory, and energy. I will discuss architecture: low-power scheduling, on-device fusion, and privacy-preserving processing that avoids cloud dependence.
The core technical deep dive is in-ear photoplethysmography (PPG). PPG can yield heart rate (HR), heart-rate variability (HRV), oxygen saturation (SpO₂), and respiration rate (RR) from a small number of LEDs and a photodiode, but in-ear deployment introduces unique constraints: anatomical variability, comfort, seal quality, ambient-light leakage, and motion artifacts. I will outline an end-to-end vital-sign extraction pipeline (bandpass filtering, normalization, peak detection, AC/DC component estimation, and windowed estimation), then zoom into the most consequential design choice: placement behind-the-ear (BTE), in-the-ear (ITE), or in-the-canal (ITC). Using controlled recordings across rest and motion (speaking, walking, running), I will show why ITC placement typically reduces error variability for HR/HRV/SpO₂ via stronger skin–sensor adhesion and improved ambient-light shielding, while also emphasizing the remaining Achilles’ heel: motion artifacts that can drive errors from about 15% (speaking) up to about 30% (running). This motivates co-design of ear-tip mechanics, seal-quality estimation, and artifact suppression.
Finally, I will highlight a novel multimodal route to cuffless blood pressure (BP) sensing from a single earbud: combining in-ear PPG with an in-ear microphone that can capture attenuated heart sounds (S1/S2) when the ear canal is well sealed (occlusion effect). Because acoustic propagation through tissue is far faster than blood flow, timing features such as vascular transit time (S1→PPG peak) and ejection time (S1→S2) become measurable from one location. I will describe a low-compute, time-domain pipeline for marker extraction and a personalized calibration procedure; in a pilot with 10 healthy participants and induced BP changes (slow breathing and cold pressor), we observed SBP MAE = 2.50 ± 2.20 mmHg and DBP MAE = 2.42 ± 2.62 mmHg under controlled conditions.
The session closes with a roadmap of open problems for ICASSP: robustness under ANC and music playback, in-the-wild validation at scale, and principled personalization without sacrificing generalization. The goal is to equip attendees with a practical taxonomy of in-ear signals—and a research agenda where signal processing can turn earables into “tiny but mighty” health platforms.