Satellite Workshops

Workshops Offered
Submit Paper WS-6: Timely and Private Machine Learning over Networks Sun, 14 Apr, 08:30 - 12:00 South Korea Time (UTC +9)
Submit Paper WS-3: Self-supervision in Audio, Speech and Beyond (SASB) Sun, 14 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Submit Paper WS-8: Revolutionizing Interaction: Embodied Intelligence and the New Era of Human-Robot Collaboration
Submit Paper WS-10: 1st Workshop on Integration of Sensing, Communication, and Computation (ISCC)
Submit Paper WS-11: Signal Processing and Machine Learning Advances in Automotive Radars Sun, 14 Apr, 13:00 - 17:30 South Korea Time (UTC +9)
Submit Paper WS-1: Deep Neural Network Model Compression
Submit Paper WS-9: SPID-CPS: Signal Processing for resilient Intrusion Detection in Cyber-Physical Systems
Submit Paper WS-12: Workshop on Radio Maps and Their Applications (RMA) Mon, 15 Apr, 08:30 - 12:00 South Korea Time (UTC +9)
Submit Paper WS-14: Fearless Steps APOLLO: A Naturalistic Team based Speech Communications Community Resource (FS-APOLLO)
Submit Paper (IEEEXplore) Submit Paper WS-4: XAI-SA: ICASSP 2024 Workshop on Explainable AI for Speech and Audio Mon, 15 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Submit Paper WS-7: Second Workshop on Signal Processing for Autonomous Systems (SPAS)
Submit Paper WS-13: Super-resolution integrated communications, localization, vision and radio mapping (SUPER-CLAM)
Submit Paper WS-15: Hands-free Speech Communication and Microphone Arrays (HSCMA 2024): Efficient and Personalized Speech Processing through Data Science
Submit Paper WS-2: Trustworthy Speech Processing (TSP) Mon, 15 Apr, 14:00 - 17:30 South Korea Time (UTC +9)
Submit Paper WS-5: Workshop on Computational Imaging Using Synthetic Apertures

Sun, 14 Apr, 08:30 - 12:00 South Korea Time (UTC +9)
Location: Room 209A

Organized by: Howard H. Yang, Nikolaos Pappas, and Harpreet S. Dhillon

Workshop Website

Machine learning over networked systems, e.g., distributed/federated learning, is envisioned as the bedrock of future intelligent Internet-of-Things. By exploiting the computing power of end-user devices and inter-node communications, agents can exchange information with each other to collaboratively train a statistical model without centralizing their private data, which also contributes to the development of trustworthy intelligent systems. Despite its great potential, several new challenges must be addressed to make this paradigm possible. Specifically, in many applications, the parameters/states to be learned at different agents vary over time. And owing to impacts from data processing time, communication bandwidth, and transmission errors, the parameters delivered from one agent to the others may not be fresh. On the one hand, the stalled information impedes the performance of a distributed learning system, especially for real-time applications. On the other hand, the corrupted and stalled information improves end-users’ privacy, as instantaneous, accurate information is inaccessible. To that end, this workshop aims to foster discussion, discovery, and dissemination of novel ideas and approaches in the interplay between timeliness and privacy in machine learning over networks.

Topics of Interests

  • Robust distributed learning algorithms against staleness in information exchange
  • Privacy enhancing schemes (e.g., adopting differential privacy or creating synthetic data) for distributed learning systems
  • Timeliness-aware distributed algorithms for networked machine learning systems
  • Fundamental limits of network parameters on the performance of distributed learning systems
  • Networking protocols to improve timeliness and privacy in distributed learning
  • Over-the-air computation for private distributed machine learning systems
  • Impact of network topology on the timeliness of distributed machine learning algorithms
  • Robust and Private distributed reinforcement/meta/deep learning and other novel learning paradigms
  • Novel methods for distributed machine learning with limited communication resources
  • Experimental implementations and testbeds on large-scale distributed learning systems

Howard H. Yang, ZJU-UIUC Institute, Zhejiang University, China
Nikolaos Pappas, Department of CIS, Linköping University, Sweden
Harpreet S. Dhillon, Bradley Department of Electrical and Computer Engineering, Virginia Tech

Sun, 14 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 104

Organized by: Titouan Parcollet, Marcely Zanon Boito, Paola Garcia, Hung-Yi Lee, Yannick Esteve

Workshop Website

Self-Supervised Learning (SSL) of latent representations is transforming deep learning powered technologies. In the speech and audio domains, most state-of-the-art systems rely on large Transformer neural networks pretrained on thousands of hours of signal following various methods such as contrastive or multitask learning.

Recent top-tier conferences from the field have seen an exponential increase in the number of accepted articles mentionning self-supervised learning techniques, yet many challenges still prevent a wider adoption of these techniques in real-life speech and audio technologies.

In facts, SSL models currently suffer from critical complexity issues, the lack of a standardized and widely adopted evaluation protocol, dramatical biases and robustness concerns as well as disconnection with others closely related modalities (e.g. text or video).

Throughout a schedule that maximizes interractions within the audience via multiple panels and a poster session, the Self-supervision in Audio, Speech and Beyond (SASB) workshop aims at fostering interactions from the whole SSL community including experts from different modalities.

SASB will act as a dedicated place for the SSL community to properly frame the building of a technology currently appearing as a groundbreaking solution for the audio, speech and beyond communities.


Titouan Parcollet
Marcely Zanon Boito
Paola Garcia
Hung-Yi Lee
Yannick Esteve

Sun, 14 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 206

Organized by: Wen Qi, Zhengjun Yue, Andrea Aliverti, and Stavros Ntalampiras

Workshop Website

With the rapid development of AI technology, its application has become a prominent topic in contemporary society. Integrating AI technology into various aspects of life to enhance the quality of life (for example, human-robot collaboration and remote healthcare) requires a profound understanding of human embodied sensing and environmental awareness. Despite significant advancements in research across various domains, building a comprehensive intelligent system still faces numerous challenges, including achieving a comprehensive embodied perception of the human body and environmental context, processing and integrating multimodal data/signals, developing robust algorithms, establishing high-capacity and secure real-time communication systems, and integrating the system with different application scenarios. Therefore, our workshop will focus on the construction of intelligent environments, including embodiment and environmental perception, algorithm optimization, signal analysis, and processing, as well as various applications of the system, to promote the practical implementation of AI technology in daily life.

The development and application of various artificial intelligence technologies should be guided by addressing the various problems faced by humans. Therefore, it is worth extensively researching the application of comprehensive perception of the human body and the surrounding environment, combined with communication technology and AI algorithms, in various intelligent scenarios [1]. It has the potential to generate new momentum in the field by fostering collaboration, innovative research, and addressing complex challenges [2].

Embodied Intelligence is an advanced research field that describes how intelligent agents achieve intelligent behaviours and learning through interacting and perceiving the environment. This paradigm emphasizes the significance of the interaction between the human body, robot, and environment in the development of cognition and behaviour. Embodied AI emphasizes the interaction between intelligent agents and the environment, where the agent's perceptual capabilities towards the external world impact its learning and decision-making processes with the environment. To achieve comprehensive and stable perception for intelligent agents requires the integration of various IoT sensing devices, such as auditory sensors (sound, heartbeat), visual sensors (RGBD, imaging), physical sensors (IMU, contact force), physiological sensors (ECG, EMG, and EOG.), and environmental sensors (temperature, and light). Once sensory data is collected, it can be transmitted, processed, and analyzed using various technologies and AI algorithms. It enables many applications, for instance, in robotics, i.e., Human-Robot Interaction and teleoperation. In healthcare, it facilitates remote health monitoring. In transportation, it finds application in traffic management and autonomous driving, among others.

Our workshop aims to explore the main directions of multi-modal perception, advanced AI algorithms, and communication systems in the context of the human body and environmental sensing with different application-oriented. By combining various sensing modalities, including vision, audio, and physiological signals, a comprehensive understanding of the human body and its interaction with the environment can be achieved. Advanced AI and signal processing algorithms play a critical role in processing and analyzing multi-modal data for tasks such as robotics interaction, telemedicine, and smart Transportation [3,4]. Furthermore, the design of efficient communication systems and infrastructure facilitates real-time data transmission and collaboration among intelligent devices [5].

This topic exhibits novelty and emerging fields as it merges expertise from multi-modal sensing, signal processing, algorithm optimization, and diverse application domains such as robotics and healthcare. The interdisciplinary nature of the workshop promotes knowledge exchange, encourages collaboration across different disciplines, and fosters synergies with other societies and communities working in the areas of AI, sensing technologies, and communication systems, facilitating cross-pollination of ideas and advancements.

In this satellite workshop, we invite submissions that explore the challenges and solutions associated with embodied intelligence and human-robot collaboration.

Topics of Interests

  • Control algorithms for physical collaboration
  • Sensory perception models in Robotics
  • Interpretation of human cues and their implications
  • Cognitive processing and decision-making in Robotics
  • Designing intuitive human-robot interfaces
  • Real-World applications and case studies of embodied intelligence
  • Advanced motor control techniques for collaborative tasks
  • Collaborative task planning and execution in human-robot teams
  • Immersive technologies for human-robot collaboration
  • Real-time robot learning from human collaboration

Wen Qi is an Associate Professor at the South China University of Technology where she teaches artificial intelligence and digital image processing. Dr. Qi is an Associate Editor for ICARM, Frontiers in Neurorobotics, and Frontiers in Neuroscience. She is also the permanent ICRA/IROS/ICARM reviewer and the Guest Associate Editor for a few journals, such as JBHI/TII/RAL/T-ASE/NN/EAAI. She won five best paper (finalist) awards, including the 2021 Andrew P. Sage Best Transactions Paper Award on IEEE Transactions on Human-Machine Systems and Best Paper Award in Advanced Robotics at the IEEE International Conference on Advanced Robotics and Mechatronics in 2020. Her research interests include artificial intelligence, multimodal data fusion, deep learning, wearable medical devices, multiple sensor fusion, human-machine interaction, and cyber-twin networks.

Zhengjun Yue is an Assistant Professor at the Multi-media Computing Group at Delft University of Technology, the Netherlands. She is working on speech technology for healthcare. Before that, she held a post-doc position at King's College London, UK. She completed her Ph.D. in dysarthric speech recognition in the Speech and Hearing Group (SPandH) at the University of Sheffield, UK. She participated in and published work in the Interspeech and ICASSP conferences in speech technology and published papers in IEEE/ACM Transactions on audio speech and language processing. She co-organized the special session in Interspeech 2023 on Connecting Speech Science and Speech Technology for Children’s Speech. Her research interests are atypical speech processing and recognition. Specifically, she will work on pathological speech recognition, children's speech analysis, and cognitive decline detection. She is also interested in building conversation AI medical systems for healthcare, and using heart sound and brain signals for medical care and treatment.

Andrea Aliverti is a Full Professor at the Department of Electronics, Information, and Bioengineering (DEIB) at the Politecnico di Milano. He is the chairman of the Ph.D. Program in Bioengineering. He is responsible for Lares (Respiratory Analysis Lab) at the Biomedical Technology Laboratory (TBM-Lab) of DEIB. His actual main research interests include the bioengineering of the respiratory system, physiological measurements, functional lung imaging, biomedical instrumentation, and sensors, in particular, the development of new methods for continuous monitoring of physiological variables utilizing wearable sensors and artificial intelligence. He has been the technical coordinator of the EU-funded Projects BREATH, BREATH-PGC, and CARED, local PI of the EU Projects PLANHAB and HOLOZCAN, two NIH-funded Projects, and several industry-funded projects. He is the author or co-author of more than 250 papers in peer-reviewed scientific journals, 15 book chapters, editor of 4 books, and inventor of 15 patents. He is a member of the editorial board of the Journal of Applied Physiology (American Physiological Society), Respiratory Physiology and Neurobiology, Breathe, and Sensors. He is an active European Respiratory Society member and a former Secretary and Head of the Assembly “Clinical Physiology and Sleep’. He has been awarded the ERS (European Respiratory Society) COPD Award and the Vertex Innovation Award (VIA). Since 2020, he has been an honorary Fellow of the European Respiratory Society.

Stavros Ntalampiras is an Associate Professor with the Department of Computer Science, University of Milan, Milan, Italy. He received engineering and Ph.D. degrees from the Department of Electrical and Computer Engineering, University of Patras, Patras, Greece, in 2006 and 2010, respectively. He has carried out research and/or didactic activities with Politecnico di Milano, Milan, the Joint Research Center of the European Commission, the National Research Council of Italy, Rome, Italy, and Bocconi University, Milan. His research interests include content-based signal processing, machine learning, audio pattern recognition, medical acoustics, bioacoustics, and cyber–physical systems. He is currently an Associate Editor of IEEE ACCESS, PLOS One, IET Signal Processing, and CAAI Transactions on Intelligence Technology, as well as a member of the IEEE Computational Intelligent Society Task Force on Computational Audio Processing.

Sun, 14 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 105

Organized by: Prof. Yonina C. Eldar, Prof. Zhi-Quan (Tom) Luo, Prof. Octavia A. Dobre, Prof. Qingjiang Shi, Prof. Marwa Chafii, Dr. Guangxu Zhu, Prof. Seung-Woo Ko, Dr. Xiaoyang Li, and Dr. Yuanhao Cui

Workshop Website

The upcoming 6G era is expected to see the wide-spreading of the complex intelligent applications (e.g., auto-driving, digital twins and metaverse) over wireless networks. The smooth execution of these applications relies on the tight cooperation of the three basic functionalities, namely sensing, communication, and computation, provided by the 6G networks. However, in traditional wireless networks, these three functionalities are designed separately for different goals: sensing for obtaining high-quality environmental data, communication for data delivery, and computation for executing the downstream task within a certain deadline. Such a separation design principal encounters difficulty in accommodating the stringent demands of ultra-low latency, ultra-high reliability, and high capacity in emerging 6G applications such as auto-driving.

The previous studies focus on the integration of two of the above three entities. By designing dual-functional signals for both data communication and radar sensing, integrated sensing and communication (ISAC) has been proposed to improve the data sensing and delivering efficiency. Another vein of researches focuses on the joint communication and computation resource management, which includes mobile edge computing (MEC) and over-the-air computation (AirComp). With the rapid increase in data volumes in MEC, the communication and computation capabilities at the network edge become the bottleneck. Particularly, the limited wireless resources make it challenging for the edge server to receive significant amounts of data from edge devices swiftly through wireless links. Hence, many researches have focused on joint communication and computation resource management to tackle this issue in MEC. AirComp, as opposed to “communication before computing”, integrates computing into communication, resulting in a new scheme of “communication while computing”. In contrast to traditional wireless communication over a multi-access channel, which requires separate transmission and decoding of information, AirComp allows edge devices to simultaneously transmit their respective signals on the same frequency band with proper processing, such that the functional computation of the distributed data is accomplished directly over the air. This thus significantly improves the communication and computing efficiency, and considerably reduces the latency required for multiple access and data fusion.

Though the techniques such as ISAC, MEC and AirComp have been regarded as the hot topics in the main conference, the entire ISCC process needs further attention. Particularly, the sensing and communication processes compete for radio resources, and the allowed communication resource further determines the required quantization level such that the quantized features can be transmitted reliably to the edge server under a delay constraint. This thus prompts a new wireless design paradigm of integrated sensing, communication, and computation (ISCC), which comprehensively accounts for the data use in the downstream tasks in 6G.

Topics of Interests

  • Fundamental information theoretical limits for ISCC
  • Edge computing/learning for ISCC
  • AirComp for ISCC
  • Network architectures/transmission protocols/frame designs for ISCC
  • Spectrum analysis and management of ISCC
  • Full duplex/interference management techniques of ISCC
  • Modulation/waveform/precoding/receiver design for ISCC
  • Security and privacy issues for ISCC
  • Machine learning methods in ISCC
  • MIMO/Massive MIMO/ IRS/Holographic MIMO surface for ISCC
  • Millimeter wave/THz technologies for ISCC
  • UAV enabled ISCC
  • ISCC for IoT/IIoT/IoE
  • ISCC for V2X network
  • Standardization progress of ISCC
  • Wi-Fi sensing/positioning/detection for ISCC
  • Experimental demonstrations, testbeds and prototypes of ISCC

Prof. Yonina C. Eldar, Professor, Weizmann Institute of Science, Rehovot, Israel.
Prof. Zhi-Quan (Tom) Luo, Professor, The Chinese University of Hong Kong, Shenzhen, China.
Prof. Octavia A. Dobre, Professor, Memorial University, Canada.
Prof. Qingjiang Shi, Professor, Tongji University, Shanghai, China.
Prof. Marwa Chafii, Associate Professor, New York University (NYU) Abu Dhabi, UAE.
Dr. Guangxu Zhu, Research Scientist, Shenzhen Research Institute of Big Data, China.
Prof. Seung-Woo Ko, Associate Professor, Inha University, Korea.
Dr. Xiaoyang Li, Research Scientist, Shenzhen Research Institute of Big Data, China.
Dr. Yuanhao Cui, Research Assistant Professor, Southern University of Science and Technology, China.

Sun, 14 Apr, 13:00 - 17:30 South Korea Time (UTC +9)
Location: Room 205

Organized by: Ashish Pandharipande, Pu (Perry) Wang, Gor Hakobyan, and Avik Santra

Workshop Website

The ongoing automation of driving functions in cars results in the evolution of advanced driver assistance systems (ADAS) into ones capable of highly/fully automated driving in future. Radar has emerged as a core technology for sensing and perception of a vehicle’s surroundings to facilitate diverse driver assistance functions. With growing levels of automation, there is a need for high-performance radar sensing, especially with respect to angular resolution. To achieve this, advanced waveform designs and signal processing algorithms such as digital waveforms, MIMO radar, high-resolution DoA and sparse radar signal processing, and cognitive, adaptive radar approaches are needed. Furthermore, machine learning (ML)-powered radar systems are still in an early stage of development. Advances are required to design reliable, robust and explainable radar perception for real-time, safety-critical driving applications. The proposed workshop is aimed at providing a spotlight on the diverse signal processing and ML challenges in advancing automotive radar technology.

The workshop will be a mix of peer-reviewed technical papers and invited speakers from academia/industry covering a diversity of topics in automotive radar systems.

Topics of Interests

  • Automotive Radar Waveform Design
  • DoA Estimation in Automotive Radars
  • Automotive Imaging Radars
  • Automotive Radar Interference-handling Methods
  • Radar Point Cloud Processing Techniques
  • Automotive Radar Fusion
  • Cognitive Automotive Radar Methods
  • Automotive Joint Radar-Communication Systems
  • ML Methods for Automotive Radar Applications
  • ML Hardware for Automotive Radars

Ashish Pandharipande, NXP Semiconductors, Eindhoven, Netherlands
Pu (Perry) Wang, MERL, Cambridge, USA
Gor Hakobyan, Waveye Inc, California, USA
Avik Santra, Infineon Technologies, California, USA

Sun, 14 Apr, 14:00 - 17:30 South Korea Time (UTC +9)
Location: Room 209A

Organized by: João Gama, Arijit Ukil, Angshul Majumdar, and Antonio J. Jara

Workshop Website

Deep neural networks (DNN) that are being used for analyzing images, videos, signals and texts demand for large amount of memory size and intensive computing power. The largely successful GPT4 model contains more than few trillion parameters. Such models, although extremely powerful, have very limited usage in real-life applications like Industrial IoT, self-driven automobiles, algorithmic screening for health condition detection that are intended to be deployed over constrained mobile or edge devices. The requirement of running large models on resource-constrained edge devices has led to significant research interests in the topic of DNN model compression. Traditionally, data compression (image/ video/ audio) has been championed by signal processing researchers. Incidentally, many of these techniques are being leveraged for compressing DNN. Unfortunately, most of these papers are primarily being published in machine learning conference venues. Given the contribution of signal processing community in the compression domain, it is imperative that a premier signal processing venue like IEEE ICASSP will be a more appropriate venue for such research works. We solicit research papers in the topics of (not restricted to) model optimization, quantization, pruning, low rank factorization, lottery ticket hypothesis, knowledge distillation, fundamental performance bounds/limits of learned compression, construction of signal processing and information theoretic models, on-device learning, edge analytics using efficient models for deploying industrial applications, sustainable AI, etc.


João Gama, Full Professor, Laboratory of Artificial Intelligence and Decision Support and Faculty of Economics
Arijit Ukil, Senior Scientist, TCS Research, Tata Consultancy Services, India
Angshul Majumdar, Professor, Department of Electronics and Communication Engineering and Department of Computer Science, Indraprastha Institute of Information Technology, Delhi, India
Antonio J. Jara, Chief Scientific Officer, Libelium, Spain

Sun, 14 Apr, 14:00 - 17:30 South Korea Time (UTC +9)
Location: Room 209B

Organized by: Massimiliano Albanese, Antonio Galli, and Vincenzo Moscato

Workshop Website

Rapid changes in the digital technology landscape have significantly transformed industrial processes, driven by the deep integration of physical and digital components in production environments, resulting in the emergence of Cyber-Physical Systems (CPS). The remarkable potential of data analytics techniques applied to CPSs spans various domains, including cost reduction in maintenance, mitigating machine faults, minimizing repair downtime, optimizing spare parts inventory, extending spare part lifespan, increasing overall production, improving operator safety, verifying repairs, and enhancing profitability. However, the practical implementation of machine learning and deep learning approaches in real-world applications is constrained by their data-heavy requirements.

Anomaly detection methods have been proposed in different domains in recent years, but traditional approaches are not directly applicable to ensure CPS security due to the increasing complexity and sophistication of attacks. The growing volume of data and the need for domain-specific knowledge challenge these methods, necessitating innovative solutions that integrate advanced artificial intelligence models with diverse sources of information, such as IoT sensor measurements and network data.

Moreover, the rise of cyber-physical attacks, exemplified by incidents like Triton and Stuxnet, introduces novel and challenging issues. These attacks can deceive monitoring platforms, highlighting the need for advanced predictive maintenance techniques that leverage AI and deep learning to analyze specific industrial equipment features. Such techniques can uncover symptoms of potential failures, including those caused by malicious activities.

To address these challenges, this special issue aims to investigate and analyze emerging trends in AI-based Anomaly Detection for Cyber-Physical Systems. We invite contributions focusing on advanced modeling and mining techniques, harnessing the power of AI and deep learning to detect anomalies in CPS. We welcome both theoretical advancements and application-oriented studies that explore the development of novel approaches based on advanced optimization techniques and learning paradigms, such as online learning, reinforcement learning, and deep learning. By advancing our understanding of complex phenomena in Cyber-Physical Systems, these contributions will contribute to the development of explainable AI models for Anomaly Detection in CPS.

Topics of Interests

  • Supervised, Semi-supervised, and unsupervised techniques for intrusion detection in CPS;
  • Multi-dataset time series for intrusion detection in CPS;
  • Game theory and Adversarial learning approach for anomaly detection in CPS;
  • Explainable Artificial Intelligence techniques for intrusion detection in CPS;
  • Representation learning, Transfer learning, Sequence learning and Reinforcement learning based methods for anomaly detection in CPS;
  • Machine Learning Explainability and Interpretability in CPS Security;
  • Data Fusion and Information Fusion in CPS Security;

Massimiliano Albanese, George Mason University, USA
Antonio Galli, University of Naples Federico II, Italy
Vincenzo Moscato, University of Naples Federico II, Italy

Mon, 15 Apr, 08:30 - 12:00 South Korea Time (UTC +9)
Location: Room 209A

Organized by: Çağkan Yapar, Fabian Jaensch, Ron Levie, Gitta Kutyniok, and Giuseppe Caire

Workshop Website

In wireless communications, propagation modeling is a critical task that allows wireless engineers to predict the behavior of emitted radio waves in a propagation environment of interest.

Deterministic simulation methods such as ray tracing are known to provide highly accurate estimates of real-world electromagnetic propagation. However, such methods suffer from high computational complexity and their accuracy depends on the availability of detailed and reliable knowledge about the propagation environment, i.e. the shape and material of the objects in the environment. The generation of pathloss radio maps by such models has been of particular interest due to its ubiquitous appearance in wireless system design problem formulations.

Recently, many research groups have shown that appropriately designed deep neural networks can provide very accurate approximations of highly complex propagation models in much shorter run times.

The workshop aims to provide an overview of recent developments in radio propagation modeling, especially learning-based methods. The applications of propagation models (or the generated radio maps) will also be of interest.

We aim to attract novel contributions on the above points, and also hope to foster the identification of next research questions through panel discussions and Q&A sessions of the invited/plenary talks and the regular papers.


Çağkan Yapar TU Berlin HFT 6, Einsteinufer 25,10587 Berlin, Germany
Fabian Jaensch TU Berlin HFT 6, Einsteinufer 25,10587 Berlin, Germany
Ron Levie Technion Mathematics Department, Technion, 32000 Haifa, Israel
Gitta Kutyniok LMU Munich Department of Math., LMU, Akademiestraße 7, 80799 Munich, Germany
Giuseppe Caire TU Berlin HFT 6, Einsteinufer 25,10587 Berlin

Mon, 15 Apr, 08:30 - 12:30 South Korea Time (UTC +9)
Location: Room 209B

Organized by: John H.L. Hansen, Iván López-Espejo, Aditya Joglekar, Meena Chandra Shekar, SzuJui Chen, Xi Liu, Midia Yousefi

Workshop Website

The field of speech communications has seen extensive advancements in speech technology as well as communication sciences for human interaction. While many speech and language resources have been established, few exist which focus on team based problem solving in naturalistic settings. The main challenge in establishing such a resource is the ability to capture such audio in a consistent manner, and ensure proper associated meta-data and content knowledge is provided. Privacy constraints, as well as releasing subsequent audio and meta-data represent the primary obstacle to overcome. To address this, the Fearless Steps APOLLO Community Resource, supported by NSF, is an ongoing massive naturalistic communications resource for researchers, scientists, historians, and technology innovators to advance their fields. Historically, the NASA Apollo program represents one of mankind’s most significant technological challenges to place a human on the moon. This series of manned speech missions were made possible by the dedication of an extensive group of committed scientists, engineers, and specialists who collaborated harmoniously, showcasing an extraordinary level of teamwork through voice communications. The primary objective of this workshop is to bring interested researchers together to explore urgent needs within the speech/language community – and openly share +100,000hrs of carefully curated audio and SLT created meta-data to support the community based on the massive naturalistic Fearless Steps APOLLO corpus.

Main topics covered in this workshop are:

  • Big Data Recovery and Deployment; Current progress in the Fearless Steps APOLLO initiative.
  • Applications to Education, History, & Archival efforts.
  • Applications to Communication Science, Psychology (Group Dynamics/Team Cohesion).
  • Applications to SLT development, including but not limited to automatic speech recognition (ASR), speech activity detection (SAD), speaker recognition, and conversational topic detection.
  • Sharing all AUDIO and META-DATA with those interested for their research/history/archiving/etc.
  • John H.L. Hansen; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • Iván López-Espejo; Aalborg Univ., Denmark; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • Aditya Joglekar; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • Meena Chandra Shekar; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • SzuJui Chen; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • Xi Liu; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA
  • Midia Yousefi; Univ. of Texas at Dallas; CRSS – Center for Robust Speech Systems, USA; Microsoft, Seattle, WA
Mon, 15 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 105

Organized by: Cem Subakan, Francesco Paissan, Mirco Ravanelli, Shubham Gupta, Pascal Germain, Paris Smaragdis

Workshop Website

Deep learning algorithms have made significant strides in various applications in the speech and audio domains in the last few years. However, for the most part, the lack of transparency in deep learning methods makes their deployment difficult in critical areas such as healthcare and forensics. In this workshop, our goal is to have presentations on state-of-the-art explanation techniques and interpretable models for speech and audio. We aim to facilitate insightful discussions on the use cases, evaluations, reliability of explanations and the design of interpretable models within the speech and audio domains. We also invite paper submissions which propose explainable machine learning methods on application domains other than speech and audio.


Cem Subakan
Francesco Paissan
Mirco Ravanelli
Shubham Gupta
Pascal Germain
Paris Smaragdis

Mon, 15 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 104

Organized by: Shunqiao Sun, Lucio Marcenaro, and Bo Li

Workshop Website

The 2nd Workshop on Signal Processing for Autonomous Systems (SPAS) is a one-day satellite workshop associated with ICASSP 2024, organized by the Autonomous Systems Initiative (ASI) of the IEEE Signal Processing Society (SPS). The workshop will bring together researchers working on various aspects of autonomous systems to present the latest advances in the area and discuss future research opportunities and needs from industry. Autonomous systems are gaining major traction in various sectors of industry, including autonomous vehicles, warehouse settings, smart production systems, industrial and infrastructure monitoring, medical systems, etc. There is certainly a great deal of signal processing technology that will be utilized to realize these various systems, but many challenges also exist. Understanding the precise needs in these various domains will be critical in propelling future signal processing research forward The workshop aims to bring together researchers, practitioners and students from signal processing, machine learning and artificial intelligence fields to share knowledge on methodologies, features and results related to the evaluation, modeling and understanding of autonomous systems.

Topics of Interests

  • The workshop will aim to attract speakers and solicit high quality papers in many key areas of autonomous systems, including those outside the main specialties of signal processing. Specific topics of interest include, but are not limited to:
  • Perception: scene understanding based on multiple sensors such as camera, radar, Lidar; sensor fusion techniques; mapping and localization.
  • Networked autonomous systems: cooperative positioning; edgebased and cloud based processing systems; distributed learning; privacy-preserving data analysis.
  • Planning and Control: motion planning, distributed and decentralized planning, optimization, robust and optimal control in the presence of uncertainty.
  • HumanAutonomous Systems Interaction: interface design, cooperative systems, human factors, robot learning.
  • Applications: autonomous cars/trucks, service robots, drones, warehouse and logistics, medical systems, infrastructure.
  • General artificial intelligence (AI) based frameworks for signal representation and inference in autonomous systems: theories and techniques.

Shunqiao Sun, Assistant Professor, The University of Alabama, USA
Lucio Marcenaro, Associate Professor, University of Genoa, Italy
Bo Li, Associate Professor, University of Chicago, USA

Mon, 15 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 206

Organized by: Dirk Slock, EURECOM; George C. Alexandropoulos, National and Kapodistrian University of Athens (NKUA); Luis M. Pessoa, INESC TEC; and Filipe B. Teixeira, INESC TEC

Workshop Website

Telecommunications and computer vision have evolved as separate scientific areas. This is envisioned to change with the advent of wireless communications with radios operating in the mmWave frequencies, up to the sub-THz, characterized by line-of-sight operating ranges, which could benefit from visual data to accurately predict the wireless channel dynamics such as anticipating future received power and blockages, as well as constructing high-definition 3D maps for positioning. On the other hand, computer vision applications can become more robust against occlusion and low luminosity if helped by radio-based imaging, such as the high frequency radio signals generated by 6G large reconfigurable intelligent surfaces that can also provide high-resolution sensing. This new and emerging joint research field relies on a range of technologies in the fields of wireless communications, computer vision, sensing, computing, and machine learning, and is perfectly aligned with the recent research trend called Integrated Sensing and Communications (ISAC) but expands it substantially. This field has a high innovation potential for a wide range of innovative applications. However, the full potential of this convergent research area can only be reached by significant interaction between the separate constitutive areas, for which this Workshop will provide a collaborative forum.

Topics of Interests

  • Vision-aided wireless communications
  • Vision-aided localisation and sensing
  • Computer vision empowered by RF sensing/imaging
  • ISAC at Millimeter-wave and sub-THz
  • Radio and vision data fusion
  • Machine learning for ISAC
  • RIS-aided localisation and sensing
  • RIS-aided device-free environment mapping
  • RIS-aided super-resolution ISAC, waveform design and channel estimation
  • ISAC with automated ground/aerial vehicles

Dirk Slock, EURECOM
George C. Alexandropoulos, National and Kapodistrian University of Athens (NKUA)
Luis M. Pessoa, INESC TEC
Filipe B. Teixeira, INESC TEC

Mon, 15 Apr, 08:30 - 17:30 South Korea Time (UTC +9)
Location: Room 205

Organized by: Minje Kim and Paola Garcia

Workshop Website

The workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) is a key assembly for academia and industry, focusing on challenges in hands-free communication and microphone arrays. Established in 2005, HSCMA promotes interdisciplinary collaboration between the Audio and Acoustic Signal Processing Technical Committee and the Speech and Language Processing Technical Committee. At ICASSP 2024, the workshop emphasizes “Efficiency and Personalization.” The spotlight is on optimizing computational methods for complex models, so their on-device implementations can robustly address individual users’ diverse and personal needs. Key topics range from front-end methods (e.g., microphone array processing) to the back-end speech recognition system. To this end, HSCMA fosters collaborations in signal processing, machine learning, and human-computer interaction. This edition of HSCMA especially signifies a commitment to utilizing data science in innovating solutions to the unique issues involved in personalization, such as data privacy. This IEEE Data Science and Learning Workshop is co-sponsored by the Data Science Initiative, AASP, and SLTC. The workshop welcomes demonstrations and prototypes, providing an interactive atmosphere for participants to engage with tangible implementations, spurring discussions on real-world applications and steering the future of hands-free speech communication technologies.

Topics of Interests

  • Hands-free speech communication
  • Microphone arrays
  • Speech recognition
  • Speech synthesis
  • Voice conversion
  • Noisy environments
  • Voice-controlled assistants
  • Teleconferencing systems
  • Beamforming techniques
  • Acoustic echo cancellation
  • Source separation
  • Personalization
  • Privacy preservation
  • Multi-modal speech processing
  • Speech enhancement.

Minje Kim (Indiana University)
Paola Garcia (Johns Hopkins University)
Jonah Casebeer (Adobe Research)

Mon, 15 Apr, 14:00 - 17:30 South Korea Time (UTC +9)
Location: Room 209A

Organized by: Anil Ramakrishna, Rahul Gupta, Amazon Inc.; Shrikanth Narayanan, University of Southern California; Isabel Trancoso, University of Lisbon; Bhiksha Raj, Carnegie Mellon University; Theodora Chaspari, Texas A&M University

Workshop Website

Title: Trustworthy Speech Processing (TSP)

Given the ubiquity of Machine Learning (ML) systems and their relevance in daily lives, it is important to ensure private and safe handling of data alongside equity in human experience. These considerations have gained considerable interest in recent times under the realm of Trustworthy ML. Speech processing in particular presents a unique set of challenges, given the rich information carried in linguistic and paralinguistic content including speaker trait, interaction and state characteristics including health status. In this workshop on Trustworthy Speech Processing (TSP), we aim to bring together new and experienced researchers working on trustworthy ML and speech processing. We invite novel and relevant submissions from both academic and industrial research groups showcasing theoretical and empirical advancements in TSP.

Topics of Interests

  • Differential privacy
  • Bias and Fairness
  • Federated learning
  • Ethics in speech processing
  • Model interpretability
  • Quantifying & mitigating bias in speech processing
  • New datasets, frameworks and benchmarks for TSP
  • Discovery and defense against emerging privacy attacks
  • Trustworthy ML in applications of speech processing like ASR

Anil Ramakrishna, Amazon Inc.
Shrikanth Narayanan, University of Southern California
Rahul Gupta, Amazon Inc.
Isabel Trancoso, University of Lisbon
Bhiksha Raj, Carnegie Mellon University
Theodora Chaspari, Texas A&M University

Mon, 15 Apr, 14:00 - 17:30 South Korea Time (UTC +9)
Location: Room 209B

Organized by: Peter Vouras, Dr. Kumar Vijay Mishra, Prof. Guoan Zheng, and Dr. Raghu Raj

Workshop Website

The ICASSP-2024 workshop on ``Computational Sensing Using Synthetic Apertures” invites submissions describing advances in sensing and imaging using synthetic apertures (SAs). The term SA refers generically to a collection of spatially diverse, phase coherent or phaseless, signal samples taken before an algorithm is applied to extract information or create an image. The samples may be measured in the signal domain from a propagating wavefield (channel sounding) or target backscatter (radar, sonar). Alternatively, an SA may sample in the k-space domain via different look angles around an object or scene, as in X-ray tomography, spotlight SAR or Fourier ptychography. An SA may also refer to a sparse collection of spatially distributed signal samples that yield the sensing performance of a filled aperture after processing steps are applied (radiometry).

Also of interest for the workshop are papers that describe the use of SAs to send information. In this case, the elements of the SA are spatially diverse, phase coherent, signal sources. Examples include multiple input multiple output (MIMO) antennas in 5G, wireless power transfer, distributed coherent radars or sensors, unmanned aerial vehicle (UAV) swarms, and 6G communications using intelligent reflecting surfaces. In these applications, a user combines the energy received from all emitters to yield an enhanced output, such as with higher data throughput rate or higher SNR.


Peter Vouras, U.S. Department of Defense
Dr. Kumar Vijay Mishra, U.S. Army Research Laboratory
Prof. Guoan Zheng, University of Connecticut
Dr. Raghu Raj, U.S. Naval Research Laboratory