Tutorials
Presented by: Fei Chen, Yu Tsao
Part I: Fundamental of objective speech intelligibility and quality assessment
 Background speech assessment and objective speech assessment metrics
 Methodology of existing objective speech assessment metrics
 Approaches to improve the power of objective speech assessment metrics
 Progress to develop nonintrusive speech assessment metrics
 Design of speech assessment metrics for special listening conditions (noisesuppression processing with nonlinear distortions) and specific listeners (hearingimpaired listeners)
Part II: Deep learningbased assessment metrics and their applications
 Traditional assessment metrics and their limitations
 The advantages of deep learningbased assessment metrics
 Deep learningbased assessment metrics with advanced input, model architectures, and training criteria
 Wellknown datasets and deep learningbased assessment metrics
 Applying deeplearningbased assessment metrics speech signal processing systems
Presented by: Laura Balzano, Qing Qu, Peng Wang, Zhihui Zhu
The Neural Collapse phenomenon has garnered significant attention in both practical and theoretical fields of deep learning, as evident from the extensive research on the topic. The presenters' own works have made key contributions to this body of research. Below is a summary of the tutorial outline. The first half focuses on the structures of representations appearing in the last layer, and we generalize the study into intermediate layers in the second half of this tutorial.
1. Prevalence of Neural Collapse & Global Optimality
The tutorial starts with the introduction of the Neural Collapse phenomenon in the last layer and its universality in deep network training, and lays out the mathematical foundations of understanding its cause based upon simplified unconstrained feature model (UFM) . We then generalize and explain this phenomenon and its implications under data imbalanceness.
2. Optimization Theory of Neural Collapse
We provide a rigorous explanation of the emergence of Neural Collapse from an optimization perspective and demonstrate its impacts on algorithmic choices, drawing on recent works. Specifically, we conduct a global landscape analysis under the UFM to show that benign landscapes are prevalent across various loss functions and problem formulations. Furthermore, we demonstrate the practical algorithmic implications of Neural Collapse on training deep neural networks.
3. Progressive Data Compression & Separation Across Intermediate Layers
We open the blackbox of deep representation learning by introducing a law that governs how realworld deep neural networks separate data according to their class membership from the bottom layers to the top layers. We show that each layer roughly improves a certain measure of data separation by an equal multiplicative factor. We demonstrate its universality by showing its prevalence across different network architectures, dataset, and training losses.
4. Theory & Applications of Progressive Data Separation
Finally, we delve into theoretical understandings of the structures in the intermediate layer via studying the learning dynamics of gradient descent. In particular, we reveal that there are certain parsimonious structures in gradient dynamics so that a certain measure of data separation exhibits layerwise linear decay from shallow to deep layers. Finally, we demonstrate its practical implications of understanding the phenomenon in transfer learning and the study of foundation models, leading to efficient finetuning methods with reduced overfitting.
Presented by: Jun Qi, YingJer Kao, Samuel YenChi Chen, Mohammadreza Noormandipour
 Introduction to Quantum Computing and Tensor Networks
 Principles of Quantum Computing
 Fundamentals of Tensor Networks
 The intersection of Quantum Computing and Tensor Networks
 Foundations of Quantum Machine Learning
 Parametrized Quantum Circuits
 Data Encoding
 Quantum Neural Networks and Quantum Kernels
 Quantum Neural Networks
 Variational Quantum Circuits
 Quantum Convolutional Neural Network
 Quantum Long ShortTerm Memory
 Quantum Reinforcement Learning
 Hybrid QuantumClassical Machine Learning Architecture
 Training Quantum Machine Learning with Gradient Descent
 Variational Quantum Circuits
 Tensor Networks for Quantum Machine Learning
 Fundamentals of Tensor Networks
 Tensor Networks as Quantum Machine Learning Models
 Learnability of Quantum Tensor Networks
 Hybrid Quantum Tensor Networks and Quantum Neural Networks
 Conclusion and Open Questions
 Prospects of Quantum Tensor Networks
 Conclusion Remarks
Presented by: Dirk Slock, Christo K. Thomas
Part I Approximate Bayesian Techniques
 Variational Bayes
 Variational Free Energy
 Variational Bayes (VB)
 Mean field and EM algorithms
 Expectation Propagation
 Factor Graph models
 Bethe Free Energy
 Belief Propagation
 Expectation Propagation
 Convergent Alternating Constrained Minimization
 ADMM
 Algorithm unfolding
 Relation to Deep NNs
 LMMSE case: multistage Wiener Filter
 Compressed Sensing
 LASSO, OMP etc.
 Sparse Bayesian Learning (SBL)
Part II Generalized Linear models
 xAMP (AMP, GAMP, VAMP, GUAMP,…)
 convergent GAMP
 Large System Analysis (iid, Haar)
 Bayes optimality
 Stein’s Unbiased Risk Estimation
 SBL example: EM, VB, SURE
Part III Bilinear models
 CellFree Massive MIMO setting
 MAP and MMSE estimates
 CRB variations
 EP variations: Factor Level, Variable Level
Part IV Adaptive Kalman filtering
 Dynamic SBL setting
 Bayesian CRB
 EM, VB,…., applied to Kalman filtering
Presented by: Kush R. Varshney
 Overview of traditional (nonLLM) trustworthy machine learning based on the book “Trustworthy Machine Learning” by the presenter
 Definitions of trustworthiness and safety in terms of aleatoric and epistemic uncertainty
 AI fairness
 Humancentered explainability
 Adversarial robustness
 Controltheoretic view of transparency and governance
 What are the new risks
 Informationrelated risks
 Hallucination, lack of factuality, lack of faithfulness
 Lack of source attribution
 Leakage of private information
 Copyright infringement and plagiarism
 Interactionrelated risks
 Hateful, abusive, and profane language
 Bullying and gaslighting
 Inciting violence
 Prompt injection attacks
 Informationrelated risks
 Brief discussion of moral philosophy
 How to change the behavior of LLMs
 Data curation and filtering
 Supervised fine tuning
 Parameter efficient fine tuning, including lowrank adaptation
 Reinforcement learning with human feedback
 Model reprogramming and editing
 Prompt engineering and prompt tuning
 How to mitigate risks in LLMs and make them safer
 Methods for training data source attribution based on influence functions
 Methods for incontext source attribution based on post hoc explainability methods
 Equituning, fair infinitesimal jackknife, and fairness reprogramming
 Aligning LLMs to unique userspecified values and constraints stemming in use case constraints, social norms, laws, industry standards, etc. via policy elicitation, parameterefficient finetuning, and red team audits
 Orchestrating multiple possibly conflicting values and constraints
Presented by: Yao Xie, Xiuyuan Cheng
Introduction
 Generative model in estimation and inference problems
 The problem of generation and conditional generation
 Overview of neural network methods
Mathematical background
 Partial differential equation (PDE)
 Stochastic differential equation (SDE)
 Langevin dynamics
 Continuity equation
 Samplers
Diffusion model and ODE flow
 SDE and ODE approaches for normalizing flow
 Score matching
 Fokker Planck Equation and Transport Equation
 Deterministic and random backward process sampler
 Scorebased and flowbased forward process
 Wasserstein gradient flow by flow network
Neural ODE and continuous normalizing flow (CNF)
 From ResNet to CNF
 Computation of the exact likelihood
 The computational challenges in highdimension
Learning of interpolating distributions
 The problem of distribution interpolation from data
 Learning of dynamic Optimal Transport flow
 Density ratio estimation based on flow network
Evaluation of generative models
 Differential comparison of distributions in highdimensions
 Twosample test
 Goodnessoffit test
 Theoretical guarantee for kernel and neural network tests
Applications
 Image generation: MNIST, CiFar
 Generative model for sequence data and adversarial samplers
 Uncertainty quantification for graph prediction using invertible graph neural networks (GNN)
Open problems and discussion
Presented by: Tianyi Chen, Xiaodong Cui, Lisha Chen
Part I  Introduction and Background
 New challenges of learning under multiple objectives
 Two optimization toolboxes to address those challenges
 History of bilevel and multiobjective optimization
Part II  Bilevel Optimization for Learning with Ordered Objectives
 Solution concepts and metrics of optimality
 Implicit gradientbased methods for bilevel optimization
 Value functionbased methods for bilevel optimization
Part III  Multiobjective Optimization for Learning with Competing Objectives
 Solution concepts and metrics of optimality
 Dynamic weightingbased methods for multiobjective optimization
 Generalization bounds on multiobjective optimization algorithms
Part IV  Applications to Automatic Speech Recognition
 Automatic Speech Recognition Opportunities and Challenges
 Recursive pretraining and finetuning with limited labels
 Multilingual training for lowresource speech recognition
Part V  Open Research Directions
Presented by: Keshab K. Parhi
Engineering practical and reliable quantum computers and communication systems requires: (a) protection of quantum states from decoherence, and (b) overcoming the reliability issues due to faulty gates. The halfday tutorial will provide a detailed overview of the new developments related to quantum ECCs and fault tolerant computing. Specific topics include: (a) Introduction to quantum gates and circuits, (b) Shor’s 9qubit ECC and stabilizer formalism for quantum ECCs (c) Systematic method for construction of quantum ECC circuits, (d) Optimization of quantum ECC circuits in terms of number of multiplequbit gates, and (e) Nearest neighbor compliant (NNC) quantum ECC circuits. Descriptions of the topics are listed below.
 Introduction to quantum gates and circuits.
 Shor’s 9qubit code and stabilizer formalism – Bit flip codes, phase flip codes, Shor’s 9qubit code, Stabilizer formalism.
 Systematic method for construction of quantum ECC circuits – Encoder circuit, Syndrome measurement circuit, 5qubit code encoder and decoder circuit, Steane code encoder and decoder circuit.
 Optimization of quantum ECC circuits in terms of number of multiplequbit gates – Circuit equivalence rules, Optimization of circuits using circuit equivalence rules, Optimization using group theoretic matrix equivalence.
 Nearestneighbor compliant quantum circuits – Various IBM architectures, nearest neighbor compliance, swap gates, minimization of swap gates for NNC circuits.
Presented by: Sam Buchanan, Yi Ma, Druv Pai, Yaodong Yu
During the past decade, machine learning and highdimensional data analysis have experienced explosive growth, due in major part to the extensive successes of deep neural networks. Despite their numerous achievements in disparate fields such as computer vision and natural language processing, which has led to their involvement in safetycritical data processing tasks (such as autonomous driving and security applications), such deep networks have remained mostly mysterious to their end users and even their designers. For this reason, the machine learning community continually places higher emphasis on explainable and interpretable models, those whose outputs and mechanisms are understandable by their designers and even end users. The research community has recently responded to this task with vigor, having developed various methods to add interpretability to deep learning. One such approach is to design deep networks which are fully whitebox ab initio, namely designed through mechanisms which make each operator in the deep network have a clear purpose and function towards learning and/or transforming the data distribution. This tutorial will discuss classical and recent advances in constructing whitebox deep networks from this perspective. We now present the Tutorial Outline:
 [Yi Ma] Introduction to highdimensional data analysis (45 min): In the first part of the tutorial, we will discuss the overall objective of highdimensional data analysis, that is, learning and transforming the data distribution towards template distributions with relevant semantic content for downstream tasks (such as linear discriminative representations (LDR), expressive mixtures of semanticallymeaningful incoherent subspaces). We will discuss classical methods such as sparse coding through dictionary learning as particular instantiations of this learning paradigm when the underlying signal model is linear or sparsely generated. This part of the presentation involves an interactive Colab on sparse coding.
 [Sam Buchanan] Layerwise construction of deep neural networks (45 min): In the second part of the tutorial, we will introduce unrolled optimization as a design principle for interpretable deep networks. As a simple special case, we will examine several unrolled optimization algorithms for sparse coding (especially LISTA and “sparseland”), and show that they exhibit striking similarities to current deep network architectures. These unrolled networks are whitebox and interpretable ab initio. This part of the presentation involves an interactive Colab on simple unrolled networks.
 [Druv Pai] Whitebox representation learning via unrolled gradient descent (45 min): In the third part of the tutorial, we will focus on the special yet highly useful case of learning the data distribution and transforming it to an LDR. We will discuss the information theoretic and statistical principles behind such a representation, and design a loss function, called the coding rate reduction, which is optimized at such a representation. By unrolling the gradient ascent on the coding rate reduction, we will construct a deep network architecture, called the ReduNet, where each operator in the network has a mathematically precise (hence whitebox and interpretable) function in the transformation of the data distribution towards an LDR. Also, the ReduNet may be constructed layerwise in a forwardpropagation manner, that is, without any backpropagation required. This part of the presentation involves an interactive Colab on the coding rate reduction.
 [Yaodong Yu] Whitebox transformers (45 min): In the fourth part of the tutorial, we will show that by melding the perspectives of sparse coding and rate reduction together, we can obtain sparse linear discriminative representations, encouraged by an objective which we call sparse rate reduction. By unrolling the optimization of the sparse rate reduction, and parameterizing the feature distribution at each layer, we will construct a deep network architecture, called CRATE, where each operator is again fully mathematically interpretable, we can understand each layer as realizing a step of an optimization algorithm, and the whole network is a white box. The design of CRATE is very different from ReduNet, despite optimizing a similar objective, demonstrating the flexibility and pragmatism of the unrolled optimization paradigm. Moreover, the CRATE architecture is extremely similar to the transformer, and many of the layerwise interpretations of CRATE can be used to interpret the transformer, showing the benefits in interpretability from suchderived networks may carry over to understanding current deep architectures which are used in practice. We will highlight in particular the powerful and interpretable representation learning capability of these models for visual data by showing how segmentation maps for visual data emerge in their learned representations with no explicit additional regularization or complex training recipes.
Presented by: Baihan Lin
In recent years, reinforcement learning and bandits have transformed a wide range of realworld applications including healthcare, finance, recommendation systems, robotics and computer vision, and last but not least, the speech and language processing. While most speech and language applications of reinforcement learning algorithms are centered around improving deep network training with its flexible optimization properties, there are still many grounds to explore to utilize the benefits of reinforcement learning, such as its rewarddriven adaptability, state representations, temporal structures and generalizability. In this onesession tutorial, we will overview the recent advancements of reinforcement learning and bandits and discuss how they can be employed to solve various speech and natural language processing problems with models that are interpretable and scalable, especially in emerging topics such as large language models.
First, we briefly introduce the basic concept of reinforcement learning and bandits, as well as the major variant problem settings in this machine learning domain. Second, we translate various speech and language tasks into the reinforcement learning problems and show the key challenges. Third, we introduce some reinforcement learning and bandit techniques and their varieties for speech and language tasks and their machine learning formulations. Fourth, we present several stateoftheart applications of reinforcement learning in different fields of speech and language. Lastly, we will discuss some open problems in reinforcement learning and bandits to show how to further develop more advanced algorithms for speech and language research in the future.
As the second iteration of this tutorial, the topic will emphasize additional coverage in new developments in large language models and deep reinforcement learning. The audience can refer to two resources after the tutorial: (1) a review paper by the author on arXiv covering many topics in this tutorial, and (2) an upcoming Springer book by the author on the same topic to be released this December, which includes more case studies, handson examples and additional coverage on recent advancements in large language models.
The outline of the tutorial, along with the topics and subtopics covered, is as follows:
1. Introduction (5 min)
 The significance of RL and Bandits in speech and language processing (realworld use cases, potential advantages)
 Challenges and opportunities in integrating these techniques
2. A Concise Tutorial of Reinforcement Learning and Bandits (85 min)
 Preliminaries
 MultiArmed Bandits (MAB)
 Contextual Bandits
 Reinforcement Learning (RL)
 Inverse Reinforcement Learning (IRL)
 Imitation Learning and Behavioral Cloning
BREAK (20 min)
3. Reinforcement Learning Formulation for Speech and Language Applications (45 min)
 Automatic Speech Recognition (ASR)
 Speaker Recognition and Diarization
 Spoken Language Understanding (SLU)
 Natural Language Understanding (NLU)
 Sequence Generation and TexttoSpeech (TTS) Synthesis
 Natural Language Generation (NLG)
 Large Language Models (LLM)
 Conversational Recommendation Systems (CRS)
4. Emerging Reinforcement Learning Strategies (15 min)
 Deep Reinforcement Learning and Bandits
 Batched and Offline Reinforcement Learning
 Optimization with Customized Rewards
 Graph Structures and Attention Mechanisms
 Transfer Learning in Reinforcement Learning
5. Conclusions, Open Questions and Challenges (10 minutes)
 Multiagent Settings in Speech and Language
 Multiobjective Training and Human Priors
 Societal Implications Using RL for Responsible AI
 Summary and Additional Resources
Overall takeaways for our attendees:
 Comprehensive understanding of RL and Bandits and their applications in speech and language processing.
 Practical knowledge and handson experience with implementing these techniques
 Insight into emerging strategies and challenges in the field
 Open discussions and problemsolving sessions for practical application
Presented by: Petros Maragos
Tropical geometry is a relatively recent field in mathematics and computer science combining elements of algebraic geometry and polyhedral geometry. The scalar arithmetic of its analytic part preexisted (since the 1980s) in the form of maxplus and minplus semiring arithmetic used in finite automata, nonlinear image processing, convex analysis, nonlinear control, and idempotent mathematics.
Tropical geometry recently emerged successfully in the analysis and extension of several classes of problems and systems in both classical machine learning and deep learning. Such areas include (1) Deep Neural Networks (DNNs) with piecewiselinear (PWL) activation functions, (2) Morphological Neural Networks, (3) Neural Network Minimization, (4) Optimization (e.g. dynamic programming) and Probabilistic Dynamical Systems, and (5) Nonlinear regression with PWL functions. Areas (1), (2) and (3) have many novel elements and have recently been applied to image classification problems. Area (4) offers new perspectives on several areas of optimization. Area (5) is also novel and has many applications.
The proposed tutorial will cover the following topics:
Elements from Tropical Geometry and MaxPlus Algebra (Brief). We will first summarize introductory ideas and objects of tropical geometry, including tropical curves and surfaces and Newton polytopes. We will also provide a brief introduction to the maxplus algebra that underlies tropical geometry. This will involve scalar and vector/signal operations defined over a class of nonlinear spaces and optimal solutions of systems of maxplus equations. Tropical polynomials will be defined and related to classical polynomials through Maslov dequantization. Then, the above introductory concepts and tools will be applied to analyzing and/or providing solutions for problems in the following broad areas of machine learning.
Neural Networks with Piecewiselinear (PWL) Activations. Tropical geometry recently emerged in the study of deep neural networks (DNNs) and variations of the perceptron operating in the maxplus semiring. Standard activation functions employed in DNNs, including the ReLU activation and its “leaky” variants, induce neural network layers which are PWL convex functions of their inputs and create a partition of space welldescribed by concepts from tropical geometry. We will illustrate a purely geometric approach for studying the representation power of DNNs  measured via the concept of a network's “linear regions”  under the lens of tropical geometry.
Morphological Neural Networks. Recently there has been a resurgence of networks whose layers operate with maxplus arithmetic (inspired by the fundamental operators of morphological image processing). Such networks enjoy several promising aspects including faster training and capability of being pruned to a large degree without severe degradation of their performance. We will present several aspects from this emerging class of neural networks from some modern perspectives by using ideas from tropical geometry and mathematical morphology. Subtopics include methods for their training and pruning resulting in sparse representations.
Neural Network Minimization. The field of tropical algebra is closely linked with the domain of neural networks with PWL activations, since their output can be described via tropical polynomials in the maxplus semiring. In this tutorial, we will briefly present methods based on approximation of the NN tropical polynomials and their Newton Polytopes via either (i) a form of approximate division of such polynomials, or (ii) the Hausdorff distance of tropical zonotopes, in order to minimize networks trained for multiclass classification problems. We will also present experimental evaluations on known datasets, which demonstrate a significant reduction in network size, while retaining adequate performance.
Approximation Using Tropical Mappings. Tropical Mappings, defined as vectors of tropical polynomials, can be used to express several interesting approximation problems in ML. We will focus on three closely related optimization problems: (a) the tropical inversion problem, where we know the tropical mapping and the output, and search for the input, (b) the tropical regression problem, where we know the inputoutput pairs and search for the tropical mapping;, and (c) the tropical compression problem, where we know the output, and search for an input and a tropical mapping that represent the data in reduced dimensions. There are several potential applications including data compression, data visualization, recommendation systems, and reinforcement learning. We will present a unified theoretical framework, where tropical matrix factorization has a central role, a complexity analysis, and solution algorithms for this class of problems. Problem (b) will be further detailed under PWL regression (see next).
Piecewiselinear (PWL) Regression. Fitting PWL functions to data is a fundamental regression problem in multidimensional signal modeling and machine learning, since approximations with PWL functions have proven analytically and computationally very useful in many fields of science and engineering. We focus on functions that admit a convex repr¬¬esentation as the maximum of affine functions (e.g. lines, planes), represented with maxplus tropical polynomials. This allows us to use concepts and tools from tropical geometry and maxplus algebra to optimally approximate the shape of curves and surfaces by fitting tropical polynomials to data, possibly in the presence of noise; this yields polygonal or polyhedral shape approximations. For this convex PWL regression problem we present optimal solutions w.r.t. $\ell_p$ error norms and efficient algorithms.
Presented by: Danilo Mandic, Harry Davies
The Hearables paradigm, that is, inear sensing of neural function and vital signs, is an emerging solution for 24/7 discrete health monitoring. The tutorial starts by introducing our own Hearables device, which is based on an earplug with the embedded electrodes, optical, acoustic, mechanical and temperature sensors. We show how such a miniaturised embedded system can be can used to reliably measure the Electroencephalogram (EEG), Electrocardiogram (ECG), Photoplethysmography (PPG), respiration, temperature, blood oxygen levels, and behavioural cues. Unlike standard wearables, such an inconspicuous Hearables earpiece benefits from the relatively stable position of the ear canal with respect to vital organs to operate robustly during daily activities. However, this comes at a cost of weaker signal levels and exposure to noise. This opens novel avenues of research in Machine Intelligence for eHealth, with numerous challenges and opportunities for algorithmic solutions. We describe how our hearables sensor can be used, inter alia, for the following applications:
 Automatic sleep scoring based on inear EEG, as sleep disorders are a major phenomenon which undercuts general health problems, from endocrinology through to depression and dementia.
 Screening for chronic obstructive pulmonary disease based on inear PPG, in the battle against the third leading cause of death worldwide, with an emphasis on developing countries that often lack access to hospitalbased examinations.
 Continuous 24/7 ECG from a headphone with the earECG, as cardiac diseases are the most common cause of death, but often remain undetected as until the emergence of Hearables it was only possible to record ECG in a clinic and not in the community.
For the Hearables to provide a paradigm shift in eHealth, they require domainaware Machine Intelligence, to detect, estimate, and classify the notoriously weak physiological signals from the earcanal. To this end, the second part of our tutorial is focused on interpretable AI. This is achieved through a first principles matchedfiltering explanation of convolutional neural networks (CNNs), introduced by us. We next revisit the operation of CNNs and show that their key component – the convolutional layer – effectively performs matched filtering of its inputs with a set of templates (filters, kernels) of interest. This serves as a vehicle to establish a compact matched filtering perspective of the whole convolutionactivationpooling chain, which allows for a theoretically well founded and physically meaningful insight into the overall operation of CNNs. This is shown to help mitigate their interpretability and explainability issues, together with providing intuition for further developments and novel physically meaningful ways of their initialisation. Interpretable networks are pivotal in the integration of AI into medicine, by dispelling the black box nature of deep learning and allowing clinicians to make informed decisions based off network outputs. We demonstrate this in the context of Hearables by expanding on the following key findings:
 We argue from first principles that convolutional neural networks operate as matched filters.
 Through this lens, we further examine network weights, activation functions and pooling operations.
 We detail the construction of a fully interpretable convolutional neural network designed for Rpeak detection, demonstrating its operation as a matched filter and analysing the convergence of its filter weights to an ECG pattern.
Owing to their unique Collocated Sensing nature, Hearables record a rich admixture of information from several physiological variables, motion and muscle artefacts and noise. For example, even a standard Electroencephalogram (EEG) measurement contains a weak ECG and muscle artefacts, which are typically treated as bad data and are subsequently discarded. In the quest to exploit all the available information (no data is bad data), the final section of the tutorial focuses on a novel class of encoderdecoder networks which, taking the advantage from the collocation of information, maximise data utility. We introduce the novel concept of a Correncoder and demonstrate its ability to learn a shared latent space between the model input and output, making it a deepNN generalisation of partial least squares (PLS). The key topics of the final section of this tutorial are as follows:
 A thorough explanation of Partial Least Squares (Projection on Latent Spaces) regression, and the lens of interpreting deep learning models as an extension of PLS.
 An introduction to the Correncoder and Deep Correncoder, a powerful yet efficient deep learning framework to extract correlated information between input and references.
 Realworld examples of the Correncoder to Hearables data are presented, ranging from transforming Photoplethysmography (PPG) into respiratory signals, through to making sense from artefacts and decoding implanted brain electrical signals into movement.
In summary, this tutorial details how the marriage of the emerging but crucially sensing modality of Hearables and customised interpretable deep learning models can maximise the utility of wearables data for healthcare applications, with a focus on the longterm monitoring of chronic diseases. Wearable inear sensing for automatic screening and monitoring of disease has the potential for immense global societal impact, and for personalised healthcare outofclinic and in the community – the main aims of the future eHealth.
The presenters are a perfect match for the topic of this tutorial, Prof Mandic’s team are pioneers of Hearables and the two presenters have been working together over the last several years on the links between Signal Processing, Embedded systems and Connected Health; the presenters also hold three international patents in this area.
Tutorial Outline
The tutorial with involve both the components of the Hearables paradigm and the Interpretable AI solutions for 24/7 wearable sensing in the realworld. The duration will be over 3 hours, with the following topics covered:
 The Hearables paradigm. Here, we will cover the Biophysics supporting inear sensing the neural function and vital signs, together with the corresponding COMSOL Multiphysics simulations and the realworld recordings of the Electroencephalogram (EEG), Electrocardiogram (ECG), Photoplethysmogram (PPG), respiration, blood oxygen level (SpO2), temperature, movement and sound – all from an earplug with embedded sensors. (40 minutes)
 Automatic Sleep Staging and Cognitive Load Estimation from Hearables. Here we demonstrate two realworld applications of Hearables, with inear polysomnography enabling unobtrusive inhome sleep monitoring, and robust tracking of cognitive workload during memory tasks and gaming and their links with dementia. (30 minutes)
 Interpretable Convolutional Neural Networks (CNN). This section explains CNNs through the lens of the matched filter (MF), a sevendecade old core concept in signal processing theory. This section of the tutorial finishes with the example of a deep Matched Filter that is designed for robust Rpeak detection in noisy EarECG. (40 minutes)
 Physiologically informed data augmentation. Here we build upon our pioneering work on screening for chronic obstructive pulmonary disease (COPD) with inear PPG, by detailing an apparatus designed to simulate COPD in healthy individuals. We demonstrate the advantages of using domain knowledge within such an apparatus when producing surrogate data in deeplearning models. (20 minutes)
 An introduction to the Correncoder. Here we introduce a new rethinking of the classic encoderdecoder structure, with the aim of extracting correlated information between two signals. At each stage, we mirror this model with the method of Projection on Latent Spaces (PLS) showing that this deep learning framework can be interpreted as a deep generalisable PLS. We show multiple realworld applications of such a framework in the context of wearable Ehealth. (40 minutes)
 No data is bad data. In this final section of the tutorial, we reject the null hypothesis that data containing artefacts should be discarded, with examples from earEEG signal processing. We demonstrate that in many cases rich information can be determined from artefacts, and that with the Correncoder framework we can achieve artefact removal in real time. (20 minutes)
Presented by: Christos Thrampoulidis, Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi
Part I: Motivation and Overview
I.1 The Transformer Revolution:
Our tutorial begins by providing an indepth account of the Transformer architecture and its extensive array of applications. We place special emphasis on examples most relevant to the signalprocessing audience, including speech analysis, timeseries forecasting, image processing, and most recently, wireless communication systems. Additionally, we introduce and review essential concepts associated with Transformers' training, such as pretraining, finetuning, and prompttuning, while also discussing the Transformers' emerging abilities, such as incontext learning and reasoning.
I.2 A SignalProcessingFriendly Introduction to the Attention Mechanism:
We then dive into a comprehensive explanation of the Transformer block's structure. Our primary focus is on the Attention mechanism, which serves as the fundamental distinguishing feature from conventional architectures like fully connected, convolutional, and residual neural networks. To facilitate the signalprocessing community's understanding, we introduce a simplified attention model that establishes an intimate connection with problems related to sparse signal recovery and matrix factorization. Using this model as a basis, we introduce critical questions regarding its capabilities in memorizing lengthy sequences, modeling longrange dependencies, and training effectively.
Part II: Efficient Inference and Adaptation: Quadratic attention bottleneck and Parameterefficient tuning (PET)
II.1 Kernel viewpoint, lowrank/sparse approximation, Flashattn (system level, implementation):
Transformers struggle with long sequences due to quadratic selfattention complexity. We review recentlyproposed efficient implementations aimed to tackle this challenge, while often achieving superior or comparable performance to vanilla Transformers. First, we delve into approaches that approximate quadratictime attention using dataadaptive, sparse, or lowrank approximation schemes. Secondly, we overview the importance of systemlevel improvements, such as FlashAttention, where more efficient I/O awareness can greatly accelerate inference. Finally, we highlight alternatives which replace selfattention with more efficient problemaware blocks to retain performance.
II.2 PET: Prompttuning, LoRa adapter (Lowrank projection):
In traditional Transformer pipelines, models undergo general pretraining followed by taskspecific finetuning, resulting in multiple copies for each task, increasing computational and memory demands. Recent research focuses on parameterefficient finetuning (PET), updating a small set of taskspecific parameters, reducing memory usage, and enabling mixedbatch inference. We highlight attention mechanisms' key role in PET, discuss prompttuning, and explore LoRA, a PET method linked to lowrank factorization, widely studied in signal processing.
II.3 Communication and Robustness gains in Federated Learning:
We discuss the use of large pretrained transformers in mobile ML settings with emphasis on federated learning. Our discussion emphasizes the ability of transformers to adapt in a communication efficient fashion via PET methods: (1) Use of large models shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scaling allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. (2) PET methods, by design, enable >100× less communication in bits while potentially boosting robustness to client heterogeneity and small sample size.
BREAK I
Part III: Approximation, Optimization, and Generalization Fundamentals
III.1 Approximation and Memorization Abilities:
We discuss Transformers as sequencetosequence models with a fixed number of parameters, independent of sequence length. Despite parameter sharing, Transformers exhibit universal approximation capabilities for sequencetosequence tasks. We delve into key results regarding Transformer models' approximation abilities, examining the impact of depth versus width. We also address their memorization capacity, emphasizing the tradeoff between model size and the number of memorized sequencetosequence patterns. Additionally, we discuss the link between Transformers and associative memories, a topic of interest within the signal processing community.
III.2 Optimization dynamics: Transformer as Support Vector Machines:
In this section, we present a fascinating emerging theory that elucidates how the attention layer learns, during training, to distinguish 'good' sequence elements (those most relevant to the prediction task) while suppressing 'bad' ones. This separation is formally framed as a convex optimization program, similar to classical supportvector machines (SVMs), but with a distinct operational interpretation that relates to the problems of lowrank and sparse signal recovery. This unique formulation allows us to engage the audience with a background in signal processing, as it highlights an implicit preference within the Transformer to promote sparsity in the selection of sequence elements—a characteristic reminiscent of traditional sparsityselection mechanisms such as the LASSO.
III.3 Generalization dynamics:
Our discussion encompasses generalization aspects related to both the foundational pretraining phase and subsequent task performance improvements achieved through prompt tuning. To enhance our exploration, we will introduce statistical data models that extend traditional Gaussian mixture models, specifically tailored to match the operational characteristics of the Transformer. Our discussion includes an overview and a comprehensive list of references to a set of tools drawn from highdimensional statistics and recently developed learning theories concerning the neural tangent kernel (NTK) and the deep neural network's feature learning abilities.
BREAK II
Part IV: Emerging abilities, incontext learning, reasoning
IV.1 Scaling laws and emerging abilities:
We begin the last part of the tutorial by exploring the intriguing world of scaling laws and their direct implications on the emerging abilities of Transformers. Specifically, we will delve into how these scaling laws quantitatively impact the performance, generalization, and computational characteristics of Transformers as they increase in size and complexity. Additionally, we draw connections between the scaling laws and phase transitions, a concept familiar to the signal processing audience, elucidating via examples in the literature how Transformers' behavior undergoes critical shifts as they traverse different scales.
IV.2 Incontext learning (ICL): Transformers as optimization algorithms
We delve into the remarkable capability of ICL, which empowers Transformers to engage in reasoning, adaptation, and problemsolving across a wide array of machine learning tasks through the use of straightforward language prompts, closely resembling human interactions. To illustrate this intriguing phenomenon, we will provide concrete examples spanning both languagebased tasks and mathematically structured, analytically tractable tasks. Furthermore, we present findings that shed light on an intriguing perspective of incontext learning: the Transformer's capacity to autonomously learn and implement gradient descent steps at each layer of its architectural hierarchy. In doing so, we establish connections to deepunfolding techniques, which have garnered popularity in applications such as wireless communications and solving inverse problems.
IV.3 Primer on Reasoning:
The compositional nature of human language allows us to express finegrained tasks/concepts. Recent innovations such as prompttuning, instructiontuning, and various prompting algorithms are enabling the same for language models and catalyzing their ability to accomplish complex multistep tasks such as mathematical reasoning or code generation. Here, we first introduce important prompting strategies that catalyze reasoning such as chainofthought, treeofthought, and selfevaluation. We then demonstrate how these methods boost reasoning performance as well as the model’s ability to evaluate its own output, contributing to trustworthiness. Finally, by building on the ICL discussion, we introduce mathematical formalisms that shed light on how reasoning can be framed as “acquiring useful problem solving skills” and “composing these skills to solve new problems”.
Conclusions, outlook, and open problems
We conclude the tutorial by going over a list of important and exciting open problems related to the fundamental understanding of Transformer models, while emphasizing how this research creates opportunities for enhancing architecture and improving algorithms & techniques. This will bring the audience to the very forefront of fastpaced research in this area.
Presented by: Shiwei Liu, Olga Saukh, Zhangyang (Atlas) Wang, Arijit Ukil, and Angshul Majumdar
This tutorial will provide a comprehensive overview of recent breakthroughs of sparsity in the emerging area of large language models (LLMs), showcasing progress and posing challenges, and endeavor to provide insights to improve the affordability and knowledge of LLMs through sparsity. The outline of this tutorial is fourfold: (1) a thorough overview/categorization of sparse neural networks; (2) the latest progress of LLMs compression via sparsity; (3) the caveat of sparsity in LLMs; and finally (4) the benefits of sparsity beyond model efficiency.
The detailed outline is given below:
Tutorial Introduction. Presenter: Zhangyang (Atlas) Wang.
Part 1: Overview of sparse neural networks. Presenter: Shiwei Liu.
We will first provide a brief overview and categorization of existing works on sparse neural networks. As one of the most classical concepts in machine learning, the pristine goal of sparsity in neural networks is to reduce inference costs. However, the research focus on sparsity has undertaken a significant shift from posttraining sparsity to priortraining sparsity over the past few years, due to the latter's promise of endtoend resource saving from training to inference. Researchers have tackled many interlinked concepts such as pruning [13], Lottery Ticket Hypothesis [14], Sparse Training [15,16], Pruning at Initialization [17], and Mixture of Experts [18]. However, the shift of interest only occurred in the last few years, and the relationships among different sparse algorithms in terms of their scopes, assumptions, and approaches are highly intricate and sometimes ambiguous. Providing a comprehensive and precise categorization of these approaches is timely for this newly shaped research community.
Part 2: Scaling up sparsity to LLMs: latest progress. Presenter: Shiwei Liu.
In the context of gigantic LLMs, sparsity is becoming even more appealing to accelerate both training and inference. We will showcase existing attempts that address sparse LLMs, encompassing weight sparsity, activation sparsity, and memory sparsity. For example, SparseGPT [8] and Essential Sparsity [9] shed light on prominent weight sparsity in LLMs, while the unveiling of ''Lazy Neuron" [13] and ''Heavy Hitter Oracle" [10] exemplifies activation sparsity and token sparsity. Specifically, the introduction of Essential Sparsity discovers a consistent pattern across various settings, that is, 30%50% of weights from LLMs can be removed by the naive oneshot magnitude pruning for free without any significant drop in performance. Ultimately, those observations suggest that sparsity is also an emerging property in the context of LLMs, with great potential to improve the affordability of LLMs.
Coffee Break.
Part 3: The caveat of sparsity in LLMs: What tasks are we talking about? Presenter: Zhangyang (Atlas) Wang.
While sparsity has demonstrated its success in LLMs, the commonly used evaluation in the literature of sparse LLMs are often restricted to simple datasets such as GLUE, Squad, WikiText2, and PTB; and/or simple oneturn question/instructions. Such (over) simplified evaluations may potentially camouflage some unexpected predicaments of sparse LLMs. To depict the full picture of sparse LLMs, we highlight two recent works, SMCBench [11] and ''Junk DNA Hypothesis", that unveil the failures of (magnitudebased) pruned LLMs on harder language tasks, indicating a strong correlation between the model's ''prunability" and its target downstream task's difficulty.
Part 4: Sparsity beyond efficiency. Presenter: Olga Saukh.
In addition to efficiency, sparsity has been found to boost many other performance aspects such as robustness, uncertainty quantification, data efficiency, multitasking and task transferability, and interoperability [19]. We will mainly focus on the recent progress in understanding the relation between sparsity and robustness. The research literature spans multiple subfields, including empirical and theoretical analysis of adversarial robustness [20], regularization against overfitting, and noisy label resilience for sparse neural networks. By outlining these different aspects, we aim to offer a deep dive into how network sparsity affects the multifaceted utility of neural networks in different scenarios.
Part 5: Demonstration and Handson Experience. Presenter: Shiwei Liu.
The Expo consists of three main components: Firstly, an implementation tutorial will be presented via a typical laptop offering stepbystep guidance in building and training sparse neural networks from scratch. Secondly, a demo will be given to showcase how to prune LLaMA7B on a single A6000 GPU. Thirdly, we will create and maintain userfriendly opensource implementation for sparse LLMs, ensuring participants have ongoing resources at their disposal. To encourage ongoing engagement and learning, we will make all content and materials readily accessible through the tutorial websites.
Presented by: Moe Z. Win, Andrea Conti
The availability of realtime highaccuracy location awareness is essential for current and future wireless applications, particularly those involving InternetofThings and beyond 5G ecosystem. Reliable localization and navigation of people, objects, and vehicles – LocalizationofThings (LoT) – is a critical component for a diverse set of applications including connected communities, smart environments, vehicle autonomy, asset tracking, medical services, military systems, and crowd sensing. The coming years will see the emergence of network localization and navigation in challenging environments with submeter accuracy and minimal infrastructure requirements.
We will discuss the limitations of traditional positioning, and move on to the key enablers for highaccuracy location awareness. Topics covered will include: fundamental bounds, cooperative algorithms for 5G and B5G standardized scenarios, and network experimentation. Fundamental bounds serve as performance benchmarks, and as a tool for network design. Cooperative algorithms are a way to achieve dramatic performance improvements compared to traditional noncooperative positioning. To harness these benefits, system designers must consider realistic operational settings; thus, we present the performance of B5G localization in 3GPPcompliant settings. We will also present LoT enablers, including reconfigurable intelligent surfaces, which promise to provide a dramatic gain in terms of localization accuracy and system robustness in next generation networks.
Presented by: Huck Yang, PinYu Chen, Hungyi Lee, KaiWei Chang, ChengHan Chiang
 Introduction and Motivation for Studying ParameterEfficient learning
To be presented by Dr. Huck Yang Background: Largescale Pretrained and Foundation Models
 Definition and Theory of parameterefficient learning
 Basics of Pretrained Model Representation Errors Analysis
 Editing Models with Task Arithmetic
 Advanced Settings of Task Vectors
 Multimodal Weights Merging
 BERT + Hubert for ASR
 Vit + AST for Acoustic Modeling
 InContext Learning
 Frozen Model Adaptation through long context windows
 Multimodal Weights Merging
 New Approaches on Neural Model Reprogramming
To be presented by Dr. PinYu Chen, IBM Research AI  Reprogramming for Medical Images and DNA with 1B+ LLM (ICML 23)
 Prompting Large Language Models
To be presented by ChengHan Chiang and Prof. Hungyi Lee Connection between prompting and parameterefficient learning
 Prompting large language models for reasoning
 ReAct, PlanandSolve, TreeofThought prompting
 Faithfulness and robustness of LLM reasonings
 Using LLMs for tool using
 Automatic evaluation using large language models by prompting
 LLM evaluation and GEval
 ParameterEfficient Learning for Speech Processing
To be presented by KaiWei Chang and Prof. Hungyi Lee Adapting text Large Language Models for Speech Processing
 Adapting text LLM (e.g. LLaMA) for spoken language modeling
 Prompting and Instruction Tuning on Speech Pretrained Models
 Semantic and acoustic tokens for speech language models
 Prompting and instruction tuning for various speech processing tasks
 Conclusion and Open Questions
To be presented by Prof. Hungyi Lee Lessons learned: a signal processor wandering in the land of largescale models
 Available resources and code for research in parameterefficient learning
Presented by: Nir Shlezinger, Sangwoo Park, Tomer Raviv, and Osvaldo Simeone
Wireless communication technologies are subject to escalating demands for connectivity, latency, and throughput. To facilitate meeting these performance requirements, emerging technologies such as mmWave and THz communication, holographic MIMO, spectrum sharing, and RISs are currently being investigated. While these technologies may support desired performance levels, they also introduce substantial design and operating complexity. For instance, holographic MIMO hardware is likely to introduce nonlinearities on transmission and reception; the presence of RISs complicates channel estimation; and classical communication models may no longer apply in novel settings such as the mmWave and THz spectrum, due to violations of farfield assumptions and lossy propagation. These considerations notably affect transceiver design.
Traditional transceiver processing design is modelbased, relying on simplified channel models, which may no longer be adequate to meet the requirements of nextgeneration wireless systems. The rise of deep learning as an enabler technology for AI has revolutionized various disciplines, including computer vision and natural language processing (NLP). The ability of deep neural networks (DNNs) to learn mappings from data has spurred growing interest in their usage for transceiver design. DNNaided transceivers have the ability to succeed where classical algorithms may fail. They can learn a detection function in scenarios having no wellestablished physicsbased mathematical model, a situation known as modeldeficit; or when the model is too complex to give rise to tractable and efficient modelbased algorithms, a situation known as algorithmdeficit.
Despite their promise, several core challenges arise from the fundamental differences between wireless communications and traditional AI domains such as computer vision and NLP. The first challenge is attributed to the nature of the devices employed in communication systems. Wireless communication transceivers are highly constrained in terms of their compute and power resources, while deep learning inherently relies on the availability of powerful devices, e.g., highperformance computing servers. A second challenge stems from the nature of the wireless communication domain. Communication channels are dynamic, implying that the task, dictated by the data distribution, changes over time. This makes the standard pipeline of data collection, annotation, and training highly challenging. Specifically, DNNs rely on (typically labeled) data sets to learn from the underlying unknown, but stationary, data distributions. This is not the case for wireless transceivers , whose processing task depends on the timevarying channel, restricting the size of the training data set representing the task. These challenges imply that successfully applying AI for transceivers design requires deviating from conventional deep learning approaches. To this end, there is a need to develop communicationoriented AI techniques that are not only of high performance for a given channel, but also lightweight, interpretable, flexible, and adaptive.
In the proposed tutorial we shall present in a pedagogic fashion the leading approaches fordesigning of practical and effective deep transceivers that address the specific limitations imposed by the use of dataand resourceconstrained wireless devices and by the dynamic nature of the communication channel. We advocate that AIbased wireless transceiver design requires revisiting the three main pillars of AI, namely, (i) the architecture of AI models;(ii) the data used to train AI models; and (iii) the training algorithm that optimizes the AI model for generalization, i.e., to maximize performance outside the training set (either on the same distribution or for a completely new one). For each of these AI pillars, we survey candidate approaches from the recent literature. We first discuss how to design lightweight trainable architectures via modelbased deep learning. This methodology hinges on the principled incorporation of modelbased processing, obtained from domain knowledge on optimized communication algorithms, within AI architectures. Next, we investigate how labeled data can be obtained without impairing spectral efficiency, i.e., without increasing the pilot overhead. We show how transreceivers can generate labeled data by selfsupervision, aided by existing communication algorithms; and how they may further enrich data sets via data augmentation techniques tailored for such data. We then cover training algorithms designed to meet requirements in terms of efficiency, reliability, and robust adaptation of wireless communication systems, avoiding overfitting from limited training data while limiting training time. These methods include communicationspecific metalearning as well as generalized Bayesian learning and modular learning.
Tutorial outline:
 Introduction and motivation
 Dramatic success of deep learning
 Gains of deep learning for wireless communications and sensing
 Overcoming model deficiency
 Overcoming algorithm deficiency
 Applications
 The fundamental differences between wireless technologies and conventional AI domains and its associated challenges
 Nature of the devices
 Nature of the domain
 The need for AI that is lightweight, flexible, adaptive, and interpretable
 Tutorial goal + outline
 Deep Learning Aided Systems in Dynamic Environments
 System model and main running example of deep learning aided receivers
 Overview of existing approaches for handling dynamic tasks
 Joint learning
 Estimated environment parameters as input
 Online learning
 Pros and cons of each approach when and why should AIaided systems be trained on device?
 Paradigm shift in AI needed to enable such operations:
 Go beyond design of parametric models
 Holistic treatment of machine learning algorithms –
 Architecture
 Data
 Training
 Architecture:
 The family of mappings one can learn
 From black box highlyparameterized architectures to lightweight interpretable machine learning systems via domain knowledge
 Modelbased deep learning methodologies
 Deep unfolding and its forms:
 Learned hyperparameters
 Learned objective
 DNN conversion
 Deep unfolding and its forms:
 DNNaided inference
 Issues for future research
 Data:
 Data for learning the task under the current environment
 From few pilots to large labeled data sets
 Selfsupervision:
 Codeword level
 Decisionlevel
 Active learning
 Data augmentation
 Complete data enrichment pipeline
 Issues for future research
 Training:
 Tuning parametric architecture from data
 Train rapidly with limited data, possibly exploiting modelbased architectures
 Deciding when to train using concept drift
 Metalearning:
 Gradientbased metalearning
 Hypernetworkbased metalearning
 Bayesian learning:
 Endtoend Bayesian learning
 Modelbased aware Bayesian learning
 Continual Bayesian learning
 Modular learning for modelbased deep architectures
 Issues for future research
 Summary:
 Additional aspects of federated learning not discussed in this tutorial
 Hardwareaware and poweraware AI
 Collaborative flexible AI for mobile wireless devices
 Conclusions
 Additional aspects of federated learning not discussed in this tutorial
Presented by: Sijia Liu, Zhangyang Wang, Tianlong Chen, PinYu Chen, Mingyi Hong, Wotao Yin
Part 1: Introduction of ZOML
 Preliminary Concepts and Mathematical Foundations
 Basic mathematical tools and formulations
 Why ZO over FO: Limitations of Traditional GradientBased Optimization
 Emerging challenges and drawbacks of relying solely on FO gradientbased methods
 Survey of Practical Applications and Use Cases
 Overview of applications that benefit from ZOML
Part 2: Foundations of ZOML
 Algorithmic Landscape of ZOML
 A rundown of primary algorithms and methods in ZOML
 Convergence and Query Complexity
 Understanding the provable properties of ZOML
 Scaling ZOML: Practical Techniques and Implementations
 Tips and tricks for ZOML algorithms at scale
 Extending ZOML across Learning Paradigms
 How does ZOML adapt to various ML paradigms?
Break
Part 3: Applications of ZOML
 Prompt Learning in FMs
 Finetuning and Personalization in FMs via ZOML
 ZOML in the Context of AI Robustness, Efficiency, and Automation
Part 4: Demo Expo
 Introducing the ZOML Toolbox
 A guided tour of our specialized toolbox for ZOML
 Benchmarking with ZO algorithms
 An introduction to ZO performance metrics and benchmark applications
 Practical Demos: Utilizing ZOT for ParameterEfficient FineTuning (PEFT), and Adversarial Defense
 Live demonstrations showcasing the utility of ZOML
Part 5: Conclusion and Q&A
 WrapUp: Key Takeaways from the Tutorial
 Future Horizons: SP and ML Opportunities and Challenges
 Resources for Deeper Exploration
 A curated list of essential ZOML resources
Presented by: Ehsan Variani, Georg Heigold, Ke Wu, Michael Riley
The first part of this talk focuses on the mathematical modeling of the existing neural ASR criteria. We introduce a modular framework that can explain all the existing criteria such as: Cross Entropy (CE), Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNNT), Hybrid Autoregressive Transducer (HAT) and Listen, Attend and Spell (LAS). We also introduce the LAtticebased Speech Transducer library (LAST) which provides efficient implementation of these criteria and allows the user mix and match different components to create new training criterion. A simple colab is presented to engage the audience by using LAST and implementing a simple ASR model on a digit recognition task.
The second half of the talk focuses on some practical problems in ASR modeling and some principled solutions. The problems are:
 Language model integration: mainly focuses on principled ways of adding language models within noisy channel formulation of ASR. We introduce ways to estimate internal language models for different ASR models and approaches to integrate external language models during firstpass decoding or secondpass rescoring.
 Streaming ASR: We explain the main theoretical reason for why streaming ASR models perform much worse than their nonstreaming counterparts and present two solutions. The main focus will be on the problem of label bias, and how local normalization assumption in the existing ASR training criteria has signified it. Finally we also present a way to measure modeling latency and how to optimize models with this respect.
 Time alignment: how to improve time alignment of ASR models is the main question of this section is trying to answer. Furthermore, how the solution can lead to simpler ASR decoding methods.
 Speech representation: what features can be extracted from an ASR systems for downstream tasks which preserve the following properties: A) Backpropagation: the downstream model can fine tune the upstream ASR model if the pairwise data exist, B) Robust: changing the upstream ASR system does not require retraining of the downstream model. We will present several speech representations with such properties.
 Semisupervised training: how to extend the supervised training criteria to take advantage of unlabeled speech and text data. We show detailed formulation of the semisupervised criteria and present several experimental results.
For all the problems above, the audiences will have a chance to use the LAST library and the colab to evaluate the effectiveness of the solutions themselves during the tutorial.
Presented by: Xing Liu, Tarig Ballal, Jose A. LopezSalcedo, Gonzalo SecoGranados, Tareq AlNaffouri
 Background Material
 Introduction to PNT
 Introduction to satellite constellations (LEO, MEO, and GEO)
 Legacy PNT using GNSS.
 GNSS PNT shortcomings.
 LEO Constellation Basics

A closer look into LEO constellations and their main characteristics, orbits, geometry, velocity, coverage, etc. We will focus more on the signaling aspects such as modulation schemes, coding techniques, channel characteristics, receiver design, etc. We will contrast LEO attributes with those of GNSS, highlighting potential strengths and weaknesses.

 PNT using LEO constellations
In this section, we will cover the main techniques for PNT based on LEO satellite signals. We will distinguish between two main groups of methods:
 PNT based on SoP from LEO satellite designed for other (nonPNT) purposes.
 PNT based on dedicated LEO satellite signals.
 PNT based on 5G nontersterial networks (NTNs).
 The signal models.
 The main signal parameters (known as observations) that are useful for navigation.
 The methods that can be applied to acquire the signal parameters.
 A general model for each observation type.
We will discuss the pros and cons of each of the two categories. For each category, we will discuss the following topics:
We will conclude this section by presenting
The latter observation models will be used in the following section to develop specific techniques and algorithms for LEObased PNT.
 PNT using LEObased PNT Techniques
Here we will provide detailed descriptions of algorithms that can be used, or that have been proposed, for LEO PNT. We will establish a connection with GNSSbased techniques. We will cover the following topics:
 Dopplerbased techniques.
 Pseudorangebased techniques.
 Carrierphasebased techniques.
 A variety of PNT filtering techniques.
 Simulations and demonstrations
In this section of the tutorial, we will present results from extensive simulations to highlight various aspects of LEO PNT. We will make our simulation codes freely accessible in the public domain.
 Opportunities and Challenges
The final part of the tutorial will highlight the most prominent research directions and challenges that might be of interest to the community.
 Summary
Tutorial Summary highlighting the takeaway messages.
 References
We will provide an extensive list of references.
Presented by: ByungJun Yoon, Youngjoon Hong
Generative AI models have emerged as a groundbreaking paradigm that can generate, modify, and interpret complex data patterns, ranging from images and sounds to structured datasets. In the realm of signal processing, these models have the potential to revolutionize how we understand, process, and leverage signals. Their capabilities span from the generation of synthetic datasets to the enhancement and restoration of signals, often achieving results that traditional methods can't match. Thus, understanding and harnessing the power of generative AI is not just an academic endeavor; it's becoming an imperative for professionals and researchers who aim to stay at the forefront of the signal processing domain.
The last few years have witnessed an explosive growth in the development and adoption of generative AI models. With the introduction of architectures like GANs, VAEs, and newer transformerbased models, the AI research community is regularly setting new performance benchmarks. The signal processing community also begins to exploit these advancements. The year 2024 presents a crucial juncture where the convergence of AI and signal processing is no longer a future possibility but an ongoing reality. Thus, a tutorial on this topic is not just timely but urgently needed.
While there have been numerous tutorials and courses on generative AI in the context of computer vision or natural language processing, its application in the pure signal and data processing domain is less explored. This tutorial is unique in its comprehensive approach, combining theory, practical methods, and a range of applications specifically tailored for the signal processing community. Attendees will not only learn about the core concepts but will also gain theory and application of generative AI techniques.
Generative AI provides a fresh lens through which to approach longstanding challenges in signal processing. This tutorial will introduce:
 New Ideas: Concepts like latent space exploration, variational inference, and diffusion models which can provide new insights into signal representation and transformation.
 New Topics: Areas where generative AI has found success, such as data augmentation, signal enhancement, and anomaly detection in signals.
 New Tools: Practical demonstrations and handson sessions using stateoftheart software libraries and tools tailored for generative AI in signal processing.
In conclusion, by bridging the gap between the advancements in generative AI and the vast potential applications in signal processing, this tutorial promises to equip attendees with knowledge and tools that can redefine the boundaries of what's possible in the field.