Presented by: Reinhard Heckel, Mahdi Soltanolkotabi
There is a long history of algorithmic development for solving inverse problems arising in sensing and imaging systems and beyond. Examples include medical imaging such as accelerated MRI and CT, super-resolution imaging, and denoising. Until recently, algorithms for solving such inverse problems were based on static signal models derived from physics or intuition, such as wavelets or sparse representations. Today, however, the best performing approaches are based on deep learning. There are two components to this success: Deep learning models, typically designed by taking physical models into account, and the data that is used to train them.
In this tutorial, we will discuss the best performing deep learning models for imaging and the data and methods for training them. In particular, we will argue that the major challenge that impede further improvements in this area is a lack of robustness and reliability. For example, in practice these methods can be brittle to (1) distribution shifts between training and test, (2) adversarial perturbations, (3) and are prone to hallucination and/or missing diagnostically significant details.
Therefore, we focus in this tutorial on how to build reliable and robust deep learning based reconstruction methods. We will start with discussing neural networks trained end-to-end for signal reconstruction; this includes modern un-rolled architectures and transformer based architectures, as well as supervised and self-supervised training. We will then discuss how scaling the dataset size and model size can improve performance.
Then, we focus on robustness aspects of deep networks for imaging in two fronts. First, we will discuss the robustness to worst-case perturbation. Second, we will argue that in practice a major performance-limiting factor of deep-learning based methods is a discrepancy of the training data and the data the models are applied to (e.g., a method is trained on data from one hospital and applied to another). This is called a distribution shift. We will then argue how algorithmic interventions and more diverse data can improve performance under distribution shifts. Finally, we will discuss methods that improve the reliability of these algorithms to avoid hallucinations or missing diagnostically significant detail.Outline:
- Neural Net architectures for imaging: Convolutional networks, transformers, and un-rolled networks.
- Training neural networks for imaging: Empirical risk minimization, self-supervised training, and scaling laws to study the dependence on the amount of training data.
- Designing robust models for imaging: Worst-case robustness, distribution shifts, and avoiding hallucinations.
We will provide lecture notes (parts of the course taught at TUM) as well as corresponding exercises (also existent from the course at TUM), as well as slides that we will prepare for this course.
Reinhard Heckel is a Rudolf Moessbauer assistant professor in the Department of Computer Engineering at the Technical University of Munich, and an adjunct assistant professor at Rice University, where he was an assistant professor in Electrical and Computer Engineering from 2017-2019. Before that, he spent one and a half years as a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, and a year in the Cognitive Computing & Computational Sciences Department at IBM Research Zurich. He completed his PhD in electrical engineering in 2014 at ETH Zurich and was a visiting PhD student at the Statistics Department at Stanford University. Reinhard is working in the intersection of machine learning and signal/information processing with a current focus on deep networks for solving inverse problems, learning from few and noisy samples, and DNA data storage.
Mahdi Soltanolkotabi is the director of the center on AI Foundations for the Sciences (AIF4S) at the University of Southern California. He is also an associate professor in the Departments of Electrical and Computer Engineering, Computer Science, and Industrial and Systems engineering where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and faculty awards from Google and Amazon. His research focuses on developing the mathematical foundations of modern data science via characterizing the behavior and pitfalls of contemporary nonconvex learning and optimization algorithms with applications in deep learning, large scale distributed training, federated learning, computational imaging, and AI for scientific applications.
Presented by: Nicola Conci, Niccolo' Bisagno
The tutorial will take a holistic view on the ongoing research, the relevant issues, and the potential application of using synthetic data for multimedia data processing, as a standalone resource or in combination with real data. In particular, the attention will be focused on the domain of images and videos, where the lack of representative data for specific problem categories has let emerge the possibility of relying on machine-generated contents.
Image and video processing has seen a rapid growing in the last decade, with remarkable improvements made possible thanks to the availability of ever-increasing computing power as well as deep learning-based frameworks that now allow human-like and beyond performances in many applications, including detection, classification, segmentation, to name a few.
However, it is to be noted that the development of novel algorithms and solutions is strictly bound to the availability of a relevant amount of data, which must be representative of the task that needs to be addressed.
With this respect, the literature has shown a rapid proliferation of datasets, tackling a multitude of problems, from the simplest to the most complex ones. Some of them are largely adopted and are currently recognized as the reference benchmark against which all newly proposed methods need to compete. As far as images are concerned, the most famous ones are (in order of complexity) MNIST, CIFAR-10, CIFAR-100, ImageNet.
When dealing with videos, instead, action/event recognition is among the earliest tasks being addressed by the research community, and the most widespread and well-known datasets include the Weizmann dataset (for simple action recognition), the UCF-101, UCF-Sports, EgoKitchen, to name a few. In the domain of surveillance the CAVIAR, and the PETS datasets have been largely adopted and, more recently the MoT Challenge has attracted the attention of many researchers because of the variety and diversity of contexts and situations in which detection and tracking solutions can be validated.
Still, there is an ever growing demand for data, to which researchers respond with larger and larger datasets, at a huge cost in terms of acquisition, storage, and annotation of images and clips. However, when dealing with complex problems, it is common to validate the developed algorithms across different datasets, facing inconsistencies in annotations (i.e. segmentation maps vs bounding boxes), the use of different standards (i.e. the number of joints of human skeletons in OpenPose and SMPL).
The use of synthetically-generated data can overcome such limitations, as the generation engine can be designed to fulfill an arbitrary number of requirements, all at the same time. For example, the same bounding box can hold for multiple viewpoints of the same object/scene; the 3D position of the object is always known, as well as its volume, the appearance, and the motion features. These considerations have motivated the adoption of computer-generated content to satisfy mostly two requirements: (a) the visual fidelity and (b) the behavioral fidelity.
To this aim, researchers have investigated efficient solutions to cope with these problems, including fine tuning and domain adaptation.
The tutorial will cover a number of topics dealing with the current use of datasets in a topic-wise fashion, together with the corresponding methodologies in the state of the art. A tentative list of of the topics is reported hereafter:
- image-based datasets and complexity
- video-based datasets and complexity
- limitations and need for adaptation
- synthetic datasets, pros and cons
- complementing real data with synthetic datasets
- fine tuning, domain adaptation, unsupervised learning
The focus of the tutorial will be technical, we aim at giving participants a broad view of research and important topics for developing efficient algorithms and solutions that are capable of combining the use of real and synthetic data to solve complex problems.
The attendees will be provided with the presentation slides, together with a comprehensive list of papers and reports of interest, aimed on the one hand at letting attendees be acquainted with the topic, and on the other hand to promote the research in this interesting interdisciplinary area.
Nicola Conci is Associate Professor at the Department of Information Engineering and Computer Science, University of Trento, where he teaches Computer Vision and Signal Processing. He received his Ph.D in 2007 from the same University. In 2007 he was a visiting student at the Image Processing Lab. at University of California Santa Barbara. In 2008 and 2009 he was post-doc researcher in the Multimedia and Vision research group at Queen Mary University of London.
Prof. Conci has authored and co-authored more than 130 papers in peer-reviewed journals and conferences. His current research interests are related to video analysis and computer vision applications for behavioral understanding and monitoring, coordinating a team of 6 Ph.D Students, 1 post-doc and 2 junior researchers.
At the University of Trento he coordinates the M.Sc. Degree in Information and Communications Engineering, he is member of the executive committee of the IECS Doctoral School, and he is delegate for the department of the research activities related to the Winter Olympic Games Milano-Cortina 2026.
He has served as Co-chair of several conferences, including the 1st and 2nd International Workshop on Computer Vision for Winter Sports, hosted at IEEE WACV 2022 and 2023, General Co-Chair of the International Conference on Distributed Smart Cameras 2019, General Co-Chair of the Symposium Signal Processing for Understanding Crowd Dynamics, held at IEEE AVSS 2017, and Technical Program Co-Chair of the Symposium Signal Processing for Understanding Crowd Dynamics, IEEE GlobalSip 2016.
Niccolò Bisagno received his Ph.D in 2020 from the ICT International Doctoral School of the University of Trento, Italy, for the thesis “On simulating and predicting pedestrian trajectories in a crowd”. In 2019, he was visiting PhD student at the University of Central Florida, Orlando, USA. In 2018, he was visiting Ph.D student at the Alpen-Adria-Universität, Klagenfurt , Austria.
His research area focuses on crowd analysis with a focus on pedestrian trajectory prediction and crowd simulation in virtual environments. He is also interested in machine learning and computer vision, with special focus on biologically-inspired deep learning architectures and sports analysis applications.
Presented by: Ali C. Begen
HTTP adaptive streaming is a complex technology with dynamics that need to be studied thoroughly. The experience from the deployments in the last 10+ years suggests that streaming clients typically operate in an unfettered greedy mode and they are not necessarily designed to behave well in environments where other clients exist or network conditions can change dramatically. This largely stems from the fact that clients make only indirect observations at the application (HTTP) layer (and limitedly at the transport layer, if any at all).
Typically, there are three primary camps when it comes to scaling and improving streaming systems: (𝑖) servers control client’s behavior/actions and the network uses appropriate QoS, (𝑖𝑖) servers and clients cooperate with each other and/or the network, or (𝑖𝑖𝑖) clients stay in control and no cooperation with the servers or network is needed as long as there is enough capacity in the network (said differently, use dumb servers and network and throw more bandwidth at the problem). Intuitively, using hints should improve streaming since it helps the clients and servers take more appropriate actions. The improvement could be in terms of better viewer experience and supporting more viewers for the given amount of network resources, or the added capability to explicitly support controlled unfairness (as opposed to bitrate fairness) based on features such as content type, viewer profile and display characteristics.
In this tutorial, we will examine the progress made in this area over the last several years, primarily focusing on the MPEG’s Server and Network Assisted DASH (SAND) and CTA’s Common Media Client/Server Data standards. We will also describe possible application scenarios and present an open-source sample implementation for the attendees to explore this topic further in their own, practical environments.
Upon attending this tutorial, the participants will have an overview and understanding of the following topics:
- Brief review of history of streaming, key problems and innovations
- Current standards, interoperability guidelines and deployment workflows
- Focus topics
- End-to-end system modeling and analysis
- Improvements in player algorithms
- Low-latency and omnidirectional streaming extensions
- Server-client collaboration
- Open problems and research directions
The slides will be distributed electronically to the participants.
Ali C. Begen is currently a computer science professor at Ozyegin University and a technical consultant in Comcast's Advanced Technology and Standards Group. Previously, he was a research and development engineer at Cisco. Begen received his PhD in electrical and computer engineering from Georgia Tech in 2006. To date, he received several academic and industry awards (including an Emmy® Award for Technology and Engineering), and was granted 30+ US patents. In 2020 and 2021, he was listed among the world's most influential scientists in the subfield of networking and telecommunications. More details are at https://ali.begen.net.
Presented by: Wei Gao
The technologies and applications of 3D point clouds have raised much attention from both the academia and industry, which can effectively model the 3D scenes and objects with high-precision representation of geometry and associated attributes, such as colors and reflectances. 3D point clouds can improve both the immersive visual experience and the machine vision analysis performances. Similar with the big image and video data, the huge amount data of point clouds require more efficient compression algorithms to obtain desirable rate-distortion tradeoff. Deep learning-based end-to-end compression methods have been successfully utilized for image and video compression, and the attempts have been also made for deep learning-based point cloud compression. Due to the different characteristics of density and application scenarios, different data structures and organization approaches have been devised for different utility optimization, as well as different neural network architectures. Both human and machine perception can be effectively optimized in the deep learning-based frameworks. Moreover, the large-scale datasets are also being constructed for point clouds in different application scenarios, and the quality assessment methods are also comprehensively studied by designing subjective experiments and objective models. Additionally, the deep learning-based enhancement and restoration methods for degenerated point clouds have also been extensively explored, where the samples with compression artifacts and the low-resolution, noisy and incomplete samples can be effectively dealt with. The quality improvements play the critical role in boosting the application utilities of point clouds in the wild. In this tutorial, we will provide an overview of these technologies and the recent progress during the past few years. We will also discuss the recent efforts in MPEG standardization for deep learning-based point cloud compression (AI-3DGC), and our established first open-source projects for deep learning-based point cloud compression and processing, namely OpenPointCloud, as well as the advances in the point cloud applications. This tutorial will introduce the basic knowledge of the 3D point cloud technologies, including the data acquisition and assessment, compression and processing algorithms, standardization progress, open source efforts, and diverse practical applications.
- Point Cloud Concept and Applications
- Point Cloud Data Acquisition
- Point Cloud Perception Assessment
- Subjective Experiments
- Objective Quality Assessment Methods
- Saliency Detection Modeling
- Machine Perception Modeling
- Learning-based Point Cloud Compression Technologies
- Related Standardization Activities
- MPEG Standard (V-PCC and G-PCC)
- AVS Point Cloud Compression Standard
- Deep Learning-based Coding Standard
- Optimization Techniques for Non-learning-based Coding (V-PCC and G-PCC)
- Low Complexity Optimization
- Rate Control Optimization
- Content-Adaptive Transform Coding Optimization
- Divide-and-Conquer Entropy Coding Optimization
- Learning-based Point Cloud Coding
- Pixel-based Coding
- Voxel-based Coding
- Octree-based Coding
- Sparse Tensor-based Coding
- Human Perception-oriented Coding
- Machine Perception-oriented Coding
- Related Standardization Activities
- Learning-based Point Cloud Enhancement
- Upsampling Techniques
- Completion Techniques
- Compression Artifacts Removal Techniques
- Denoising Techniques
- Frame Interpolation Techniques
- Enhancement Optimization for Human and Machine Perception
- Open Source Projects for Point Cloud Learning
- Overview of Existed Point Cloud Open Source Projects
- Conclusion and Discussions for Future Research Directions
Dr. Wei GAO is currently an Assistant Professor at the School of Electronic and Computer Engineering, Peking University, China, and also the Director of Laboratory for Open Source Algorithms/Synergy Algorithms at Peng Cheng Laboratory, China. He received the Ph.D. degree in Computer Science from City University of Hong Kong, Hong Kong, in 2016. From 2012 to 2013, he was an Camera ISP Engineer at OmniVision Technologies, Shanghai, China, where several camera VLSI chips have been successfully taped out. In 2016, he was a Visiting Scholar at University of California, Los Angeles, CA, USA. From 2017 to 2019, he worked at City University of Hong Kong, Hong Kong, and Nanyang Technological University, Singapore, respectively. His research interests include Multimedia Coding and Processing, and the related topics of Deep Learning and Artificial Intelligence. He has published over 80 journal and conference papers, and applied over 60 patents. He has led the establishments of several high-impact open-source projects, including OpenPointCloud (First Open Source Project for Point Cloud Coding and Processing). In 2021, he won the IEEE Multimedia Rising Star Runner Up Award for Outstanding Early-stage Career Achievements in the Area of 3D Immersive Media Research. He is serving Associate Editor for Signal Processing (Elsevier), Neural Processing Letters (Springer), etc., and an Elected Member of Image, Video, and Multimedia Technical Committee, Asia-Pacific Signal and Information Processing Association. He has organized workshops and special sessions at IEEE ICME 2021, IEEE VCIP 2022 and ACM MM 2022 on the topics of visual experience assessment of interactive media, 3D point cloud compression and processing, etc. He is a Senior Member of IEEE.
Presented by: Habib Zaidi
This tutorial represents a complete and balanced review of the subject having a broad scope and coverage of quantitative analysis of multimodality medical images using conventional and deep learning techniques, which is growing in importance both for clinical and research applications. The seminar begins with an introduction to various medical imaging modalities followed by a detailed examination of the fundamental concepts of quantitative image analysis techniques as they are applied in diagnostic and therapeutic molecular imaging using conventional single-modality instrumentation and dual-modality imaging devices. It covers the entire range of molecular imaging from basic principles to various steps required for obtaining quantitatively accurate data from nuclear medicine images including data collection methods and algorithms used to correct them for physical degrading factors, and image reconstruction algorithms (analytic, iterative) as well as image processing and analysis techniques as their clinical and research applications. Impact of physical degrading factors including collimator response (in SPECT), attenuation of photons and contribution from photons scattered in the patient and partial volume effect on diagnostic quality and quantitative accuracy of medical images will be discussed. Computer implementations of dedicated software packages and their clinical and research applications are described and illustrated with some useful features and examples. Various subjective and objective quantitative assessment of image quality will be presented including well-known figures of merit. A detailed description of analytical and Monte Carlo modelling of imaging systems, the functionality of computer codes widely used and development of anthropomorphic mathematical and voxel-based phantoms will be provided together with their potential in qualitative and quantitative assessment of image quality. Prospective future applications of quantitative molecular imaging are also addressed especially its use prior to therapy for dose distribution modelling and optimisation of treatment volumes in external radiation therapy and patient-specific 3D dosimetry in targeted therapy towards the concept of image-guided radiation therapy.
Habib Zaidi is Chief physicist and head of the PET Instrumentation & Neuroimaging Laboratory at Geneva University Hospital and full Professor at the medical school of Geneva University. He is also a Professor of Medical Physics at the University of Groningen (Netherlands), Adjunct Professor of Medical Physics and Molecular Imaging at the University of Southern Denmark, Adjunct Professor of Medical Physics at Shahid Beheshti University visiting Professor at Tehran University of Medical Sciences and Distinguished Adjunct Professor at King Abdulaziz University, KSA. He is actively involved in developing imaging solutions for cutting-edge interdisciplinary biomedical research and clinical diagnosis in addition to lecturing undergraduate and postgraduate courses on medical physics and medical imaging. His research is supported by the EEC, Swiss National Foundation, EEC, private foundations and industry (Total 8.8 M US$) and centres on hybrid imaging instrumentation (PET/CT and PET/MRI), deep learning for various imaging applications, modelling medical imaging systems using the Monte Carlo method, development of computational anatomical models and radiation dosimetry, image reconstruction, quantification and kinetic modelling techniques in emission tomography as well as statistical image analysis, and more recently on novel design of dedicated PET and PET/MRI scanners. He was guest editor for 13 special issues of peer-reviewed journals dedicated to Medical Image Segmentation, PET Instrumentation and Novel Quantitative Techniques, Computational Anthropomorphic Anatomical Models, Respiratory and Cardiac Gating in PET Imaging, Evolving medical imaging techniques, Trends in PET quantification (2 parts), PET/MRI Instrumentation and Quantitative Procedures and Clinical Applications, Nuclear Medicine Physics & Instrumentation, and Artificial Intelligence and serves as founding Editor-in-Chief (scientific) of the British Journal of Radiology (BJR)|Open, Deputy Editor for Medical Physics, and member of the editorial board of the Journal of Nuclear Cardiology, Physica Medica, International Journal of Imaging Systems and Technology, Clinical and Translational Imaging, American Journal of Nuclear Medicine and Molecular Imaging, Brain Imaging Methods (Frontiers in Neuroscience & Neurology), Cancer Translational Medicine and the IAEA AMPLE Platform in Medical Physics. He has been elevated to the grade of fellow of the IEEE, AIMBE, AAPM, IOMP, AAIA and the BIR and was elected liaison representative of the International Organization for Medical Physics (IOMP) to the World Health Organization (WHO) and Chair of Subcommittee on Part 1 Examination of the International Medical Physics Certification Board (IMPCB) and the Imaging Physics Committee of the AAPM in addition to being affiliated to several International medical physics and nuclear medicine organisations. He is developer of physics web-based instructional modules for the RSNA and Editor of IPEM’s Nuclear Medicine web-based instructional modules. He is involved in the evaluation of research proposals for European and International granting organisations and participates in the organisation of International symposia and conferences. His academic accomplishments in the area of quantitative PET imaging have been well recognized by his peers and by the medical imaging community at large since he is a recipient of many awards and distinctions among which the prestigious 2003 Bruce Hasegawa Young Investigator Medical Imaging Science Award given by the Nuclear Medical and Imaging Sciences Technical Committee of the IEEE, the 2004 Mark Tetalman Memorial Award given by the Society of Nuclear Medicine, the 2007 Young Scientist Prize in Biological Physics given by the International Union of Pure and Applied Physics (IUPAP), the prestigious (100’000$) 2010 Kuwait Prize of Applied sciences (known as the Middle Eastern Nobel Prize) given by the Kuwait Foundation for the Advancement of Sciences (KFAS) for "outstanding accomplishments in Biomedical technology", the 2013 John S. Laughlin Young Scientist Award given by the AAPM, the 2013 Vikram Sarabhai Oration Award given by the Society of Nuclear Medicine, India (SNMI), the 2015 Sir Godfrey Hounsfield Award given by the British Institute of Radiology (BIR), the 2017 IBA-Europhysics Prize given by the European Physical Society (EPS) and the 2019 Khwarizmi International Award given by the Iranian Research Organization for Science and Technology (IROST). Prof. Zaidi has been an invited speaker of over 160 keynote lectures and talks at an International level, has authored over 830+ publications (he is the senior or first author in a majority of these publications), including 365 peer-reviewed journal articles in prominent journals (ISI-h index=55|71 Web of Science™|Google scholar, >18’450+ citations), 425 conference proceedings and 42 book chapters and is the editor of four textbooks on Therapeutic Applications of Monte Carlo Calculations in Nuclear Medicine (2 Editions), Quantitative Analysis in Nuclear Medicine Imaging, Molecular Imaging of Small Animals and Computational anatomical animal models.
Presented by: Xiaohong (Sharon) W. Gao
Colour plays a key role in detecting objects in an image. Yet it is also a leading attributor to the insufficiency of generalisation of developed deep learning systems when testing images are from collections different from the training sets.
This tutorial entails the fusion of human colour vision models to a deep learning system to detect colour sensitive objects, e.g, skin cancer, in an aim to develop a robust, performant and more generalised systems. This is in line with the current advancement on explainable AI by drawing attentions to the development of human centred systems, for example, Transformers with the embedding of attention models.
The change of colour, at many cases, constitutes an indicator of something significant, especially in the medical domain. While a deep learning system can learn from coloured images, its ability of generalisation suffers considerably when evaluated using images with changed viewing environment or collected from different research centres. These colour changes are usually non-linear and hence hardly captured using the traditional data argumentation techniques by alternating RGB values of an image, which is linear and at some cases worsens the trained systems. For example, a skin colour rarely goes to green whereas many unrelated augmented data can exacerbate data imbalance between trained classes. On the other hand, a human being can still see the concerned objects regardless viewing conditions. Hence, this talk introduces the fusion of the standardised human colour appearance model, CIECAM, to a deep learning system to perform early detection of cancers from endoscopic videos, which shows a significant improvement in object detection, generalisation, and data size requirement.
This 3-hour talk will begin with the introduction of basic theory of human colour vision, insight of colour adaptation and the establishment of colour appearance models. Then in the 2nd part of this tutorial, the state of the art deep learning architectures and their applications in computer vision field are brought forward, including classification, segmentation and benchmark datasets. Lastly, the fusion of human colour models to build a deep learning system is elaborated and demonstrated in detection of early stage of gastrointestinal cancers. Challenges include artefacts, real-time processing and colour variations of videos collected from different centres, leading to a more robust, transparent and performant system.
Prof. Gao obtained her PhD on modelling colour appearance in Loughborough University, UK in 1994. This model was later standardised by International committee on illumination (CIE) as CIECAM97, CIECAM02 and recently CAM16 for predicting human perception of colours under different viewing conditions. Since then, she has been working on image processing over the last 25 years. Currently, her focus is on explainable AI with the fusion of human vision and/or cognitive models, in particular with the application to medical domain.
Presented by: Francesco Banterle, Alessandro Artusi
In this tutorial, we introduce how the High Dynamic Range (HDR) imaging field has evolved in this new era where machine learning approaches have become dominant. The main reason for this success is that the use of machine learning and deep learning has automatized many tedious tasks achieving high-quality results overperforming classic methods.
After an introduction to classic HDR imaging and its open problem, we will summarize the main approaches for merging of multiple exposures, single image reconstructions or inverse tone mapping, tone mapping, and display visualization.
Francesco Banterle is a Researcher at the Visual Computing Laboratory at ISTI-CNR, Italy. He received a Ph.D. in Engineering from Warwick University in 2009. During his Ph.D. he developed Inverse Tone Mapping that bridges the gap between Low Dynamic Range Imaging and High Dynamic Range (HDR) Imaging. He holds two patents, one sold to Dolby, and the other one of these was transferred to goHDR and then sold. His main research fields are high dynamic range (HDR) imaging (acquisition, tone mapping, HDR video compression, and HDR monitors), augmented reality on mobile, and image-based lighting. Recently, he has been working on applying Deep Learning to imaging and HDR imaging proposing the first Deep-Learning based metrics with and without reference. He is co-author of two books on imaging. The first one is "Advanced High Dynamic Range Imaging" (first edition 2011, second edition 2017), which is extensively used as a reference book in the field together with its MATLAB toolbox called the HDR Toolbox The second book "Image Content Retargeting", which shows how to re-target content to different displays in terms of colors, dynamic range, and spatial resolution.
Alessandro Artusi received a Ph.D. in Computer Science from the Vienna University of Technology in 2004. He is currently the Managing Director of the DeepCamera Lab at CYENS (Cyprus) who recently has joined, as a funding member, the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), a not-for-profit standards organization established in Geneva. He is currently the Cyprus representative in the ISO/IEC/SC 29 imaging/Video compression standardization committee, as well as representing Cyprus in two main working groups WGs 4 and 5. Prior to the above, he has been committee member of the IST37 of the British Standard Institute (BSI) and representing the UK in the JPEG and MPEG committee's. He is the recipient, for his work on the JPEG-XT standard, an image compression system for HDR content, of the prestigious BSI Award. His research interests include visual perception, image/video processing, HDR technology, objective/subjective imaging/video evaluation, deep-learning, computer vision and color science, with a particular focus to deploy the next generation of imaging/video pipeline. He is also the co-author of the "Advanced High Dynamic Range Imaging" book (first edition 2011, second edition 2017), which is a reference book in the HDR field, and author of the "Image Content Retargeting" book, which shows how to re-target content to different displays in terms of colors, dynamic range, and spatial resolution.
Presented by: Iole Moccagatta, Yan Ye
The state of video compression standards is strong and dynamic, and more compression is coming in their future. This tutorial will start with an introduction explaining why that is, followed by 2 parts. In the first of these 2 parts we will review the two most recent video compression standards: AV1 and VVC. Before deep diving into AV1 and VVC tools and performance, a high-level overview of block-based video coding concepts and terminologies will be presented. Deployment and market adoption of these two video codec standards will be presented as well. In the second part we will present the status of exploratory activities carried out in MPEG/ITU-T and in the Alliance for Open Media (AOM) and looking into new technologies, including NN-based ones, to improves compression and enable new applications. Will close the tutorial with conclusions and take aways. List of references will be provided for those who are interested in deep diving into the exploratory activities.
Dr. Iole Moccagatta is a Principal Engineer at Intel working on HW Multimedia IPs that are integrated on Intel platforms. Prior to Intel she held the position of Senior Video Architect at NVIDIA, and that of Science Director at IMEC, Belgium.
Dr. Moccagatta has been a very active member of MPEG, ITU-T, and JPEG, where she has represented US interests and companies and made many technical contributions. A number of those have been included in MPEG and JPEG standards. She is currently Co-chair of the MPEG/ITU-T Joint Video Experts Team (JVET) Ad-Hoc Group on H.266/VVC Conformance and Co-editor of the H.266/VVC Conformance Testing document.
Dr. Moccagatta has also been an active participant of the Alliance for Open Media (AOM) AV1 Codec WG, where she has co-authored two adopted proposals. She currently represents Intel in the AOM Board.
Dr. Moccagatta is also serving as IEEE Signal Processing Society (SPS) Regional Director-at-Large Regions 1-6, supporting and advising Chapters and their officers, providing input on how to serve and engage the SPS community in general, and the SPS industry members in particular, and using her professional network to attract new volunteers to serve in SPS subcommittees and task forces.
Dr. Moccagatta is the author or co-author of more than 30 publications, 2 book chapters, and more than 10 talks and tutorials in the field of image and video coding. She holds more than 10 patents in the same fields. For more details see Dr. Moccagatta professional site at http://alfiole.users.sonic.net/iole/.
Dr. Moccagatta received a Diploma of Electronic Engineering from the University of Pavia, Italy, and a PhD from the Swiss Federal Institute of Technology in Lausanne, Switzerland.
Yan Ye is currently a Senior Director at Alibaba Group U.S. and the Head of Video Technology Lab of Alibaba’s Damo Academy in Sunnyvale California. Prior to Alibaba, she held various management and technical positions at InterDigital, Dolby Laboratories, and Qualcomm.
Throughout her career, Dr. Ye has been actively involved in developing international video coding and video streaming standards in ITU-T SG16/Q.6 Video Coding Experts Group (VCEG) and ISO/IEC JTC 1/SC 29 Moving Picture Experts Group (MPEG). She holds various chairperson positions in international and U.S. national standards development organizations, where she is currently an Associate Rapporteur of the ITU-T SG16/Q.6 (since 2020), the Group Chair of INCITS/MPEG task group (since 2020), and a focus group chair of the ISO/IEC SC 29/AG 5 MPEG Visual Quality Assessment (since 2020). She has made many technical contributions to well-known video coding and streaming standards such as H.264/AVC, H.265/HEVC, H.266/VVC, MPEG DASH and MPEG OMAF. She is an Editor of the VVC test model, the 360Lib algorithm description, and the scalable extensions and the screen content coding extensions of the HEVC standard. She is a prolific inventor with hundreds of granted U.S. patents and patent applications, many of which highly cited by other researchers and inventors in the field of video coding. She is the co-author of more than 60 conference and journal papers.
Dr. Ye is currently a Distinguished Industrial Speaker of the IEEE Signal Processing Society (since 2022). She was a guest editor of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) special section on “the joint Call for Proposals on video compression with capability beyond HEVC” in 2020 and TCSVT special section on “Versatile Video Coding” in 2021. She has been a program committee member of the IEEE Data Compression Conference (DCC) since 2014, and has organized the special session on “advances in video coding” at DCC for more than five years. She is a conference subcommittee co-chair of the IEEE Visual Signal Processing and Communication Technical Committee (VSPC-TC) (since 2022) and was an area chair of “multimedia standards and related research” of the IEEE International Conference on Multimedia and Expo (ICME) in 2021, the publicity chair of the IEEE Video Coding and Image Process (VCIP) in 2021, an industry chair of the IEEE Picture Coding Symposium (PCS) in 2019, an organizing committee member of the IEEE International Conference on Multimedia and Expo (ICME) in 2018, and a technical program committee member of the IEEE Picture Coding Symposium (PCS) in 2013 and 2019.
Dr. Ye is devoted to multimedia standards development, hardware and software video codec implementations, as well as deep learning-based video research. Her research interests include advanced video coding, processing and streaming algorithms, real-time and immersive video communications, AR/VR/MR, and deep learning-based video coding, processing, and quality assessment algorithms.
Dr. Ye received her Ph.D. degree from the University of California, San Diego, in 2002, and her B.S. and M.S. degrees from the University of Science and Technology of China in 1994 and 1997, respectively.
Presented by: Mihai Datcu
At present, quantum computing and AI are the key technologies in the digital era. The progress and transfer of quantum resources for use in practical applications is in constant acceleration. Quantum computing, quantum annealing, quantum circuits, or simulators for quantum computing are currently easily accessible. The exploitation of quantum physics effects such as superposition and entanglement opens new, still unexplored perspectives. Yet, with very limited capacities, hundreds of qubits, they draw the attention stimulating the new area of quantum machine learning.
In this context the presentation will focus on relevant aspects of quantum technologies for image understnading. With the goal to identify if a quantum algorithm may bring any advantage compared with classical methods, will be firstly analysed the data complexity (i.e. data as prediction advantage). Secondly, it will be presented the classes of complexity of the algorithms. Thirdly, it will be identify major challenges in EO which could not yet be solved by classical methods, as for instance the causality analysis.
Data embedding is of key importance. Non-quantum data are many times “artificially” encoded at the input of quantum computers, thus quantum algorithms may not be efficient. For instance the polarimetric images are represented on the Poincare sphere which maps in a natural way to the qubit Bloch sphere. Thus, polarimetric images will not be any more processed as “signal” but directly as a physical signature. Further will be discussed the advantages of quantum annealing (D-Wave) for solving local optimization for non-convex problems. Also, the potential and advantage of the recent TensorFlow Quantum and the implementation of parametrized quantum circuits (PQC). The presentation will address the entire image analyis cycle encompassing the particular features from data acquisition, understanding and modelling of the image sensor, followed by information extraction. The quantum ML techniques are practically implemented using the open access to various quantum computers, as D-Wave, IBM, or Google. Hybrid methods will be discussed for satellite observations, i.e. managing the I/O of the data and maximally use the resources of quantum computers and quantum algorithms.
Mihai Datcu received the M.S. and Ph.D. degrees in electronics and telecommunications from the University POLITEHNICA of Bucharest (UPB), Romania, in 1978 and 1986, respectively, and the habilitation a Diriger des Recherches degree in computer science from the University Louis Pasteur, Strasbourg, France, in 1999. Since 1981, he has been with the Department of Applied Electronics and Information Engineering, Faculty of Electronics, Telecommunications and Information Technology, UPB, he is Full Professor and Director of the Research Center for Spatial Information (CEOSapceTech), UPB. Since 1993, he has been with the German Aerospace Center (DLR), he is a Senior Scientist with the Remote Sensing Technology Institute (IMF). From 1992 to 2002, he had a longer Invited Professor Assignment with the Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland.
Since 2001, he had been initiating and leading the Competence Center on Information Extraction and Image Understanding for Earth Observation, ParisTech, Paris Institute of Technology, France, a collaboration of DLR with the French Space Agency (CNES). From 2005 to 2013 has been Professor holder of the DLR-CNES Chair, at ParisTech, Paris Institute of Technology. From 2011 to 2018, he has been leading the Immersive Visual Information Mining Research Laboratory, Munich Aerospace Faculty. Between 2018 and 2020 he was the holder of the Blaise Pascal international chair of excellence at Conservatoire national des arts et métiers (CNAM), Paris. Between 2020 and 2022 he was involved in the DLR-French Aerospace Lab (ONERA) Joint Virtual Center for AI in Aerospace.
He was a Visiting Professor with the University of Oviedo, Spain; University Louis Pasteur and International Space University, Strasbourg, France; University of Siegen, Germany; University of Innsbruck, Austria; University of Alcala, Spain; University Tor Vergata, Rome, Italy; Universidad Pontificia de Salamanca, Madrid, Spain; University of Camerino, Italy; the University of Trento, Italy, China Academy of Sciences, Shenyang, Universidade Estadual de Campinas (UNICAMP), Brazil; University of Wuhan, China, and the Swiss Center for Scientific Computing, Manno, Switzerland.. He has initiated and implemented the European frame of projects for Earth Observation image information mining (IIM) and is involved in research programs for information extraction, data mining, Big EO Data knowledge discovery, and data understanding with the European Space Agency (ESA), NASA, and in a series of national and European projects. He and his team have developed the operational IIM processor in the Payload Ground Segment systems for the German mission TerraSAR-X, and data mining tools and systems for the Copernicus missions Sentinel-1 and Sentinel-2. He is developing algorithms for model-based information retrieval from high-complexity signals and methods for scene understanding from very-high-resolution synthetic aperture radar (SAR) and interferometric SAR data. His research interests include, information theory, signal processing, explainable and physics aware Artificial Intelligence, computational imaging, and qauntum machine learning with applications in EO. Dr. Datcu is a member of the ESA Working Group Big Data from Space and Visiting Professor withe ESA’s Φ-Lab. He was the recipient of the Romanian Academy Prize Traian Vuia for the development of the SAADI image analysis system and his activity in image processing in 1987, of the Best Paper Award and the IEEE Geoscience and Remote Sensing Society Prize in 2006, the National Order of Merit with the rank of Knight, for outstanding international research results, awarded by the President of Romania in 2008, and. He was also the recipient of the Chaire d'excellence internationale Blaise Pascal 2017 for international recognition in the field of data science in EO and the 2018 Ad Astra Award for Excellence in Science. He has served as a Co-organizer for international conferences and workshops and as Guest and Associate Editor for IEEE and other journals. In 2022 he got the IEEE GRSS David Landgrebe Award in recognition of outstanding contributions to Earth Observation analysis using innovative concepts for big data analysis, image mining, machine learning, smart sensors, and quantum resources. He is IEEE Fellow.
Presented by: Ghassan AlRegib, Mohit Prabhushankar
In this tutorial, we motivate, analyze and apply gradients of neural networks as features to understand image data. Traditionally, gradients are utilized as a computationally effective methodology to learn billions of parameters in large scale neural networks. Recently, gradients in neural networks have shown applicability in understanding and evaluating trained networks. For example, while gradients with respect to network parameters are used for learning image semantics, gradients with respect to input images are used to break the network parameters by creating adversarial data. Similarly, gradients with respect to logits provide predictive explanations while gradients with respect to loss function provide contrastive explanations. We hypothesize that once a neural network is trained, it acts as a knowledge base through which different types of gradients can be used to traverse adversarial, contrastive, explanatory, counterfactual representation spaces. Several image understanding and robustness applications including anomaly, novelty, adversarial, and out-of-distribution image detection, and noise recognition experiments among others use multiple types of gradients as features. In this tutorial, we examine the types, visual meanings, and interpretations of gradients along with their applicability in multiple applications.
The tutorial is composed of four major parts. Part 1 discusses the different interpretations of gradients extracted from trained neural networks with respect to input data, loss, and logits. Part 2 covers in detail a theoretical analysis of gradients. Part 3 describes the utility of gradient types in robustness applications of detection, recognition and explanations. Newer and emerging fields like machine teaching and active learning will be discussed with methodologies that use gradients. Part 4 connects the human visual perception with machine perception. Specifically, we discuss the expectancy-mismatch principle in neuroscience and empirically discuss this principle with respect to gradients. Results from Image Quality Assessment and Human Visual Saliency will be discussed to demonstrate the value of gradient-based methods. The outline as well as the expected time for each part is presented below.
- Part 1: Types of gradient information in neural networks (1.5 hrs)
- In the numerator: Gradients by backpropagating logits, activations, and empirical loss.
- In the denominator: Gradients with respect to inputs, activations, and network parameters
- Confounding labels: Backpropagating the wrong classes and their effect on contrastive and counterfactual representations
- Gradients as information in neural networks
- Gradients for epistemic (network based) and aleatoric (image based) uncertainty estimation
- Gradients as distance measures in representation spaces
- Detection: Adversarial, novelty, anomaly, and out-of-distribution detection
- Recognition: Recognition under noise, domain shift, Calibration, open set recognition
- Explanations: Predictive, contrastive, and counterfactual explanations
- Emerging applications: Active Learning, Machine Teaching
- Expectancy mismatch principle and gradients based on confounding loss functions
- Human visual saliency
- Image Quality Assessment
Ghassan AlRegib is currently the John and MCarty Chair Professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology. He received the ECE Outstanding Junior Faculty Member Award, in 2008 and the 2017 Denning Faculty Award for Global Engagement. His research group, the Omni Lab for Intelligent Visual Engineering and Science (OLIVES) works on research projects related to machine learning, image and video processing, image and video understanding, seismic interpretation, machine learning for ophthalmology, and video analytics. He has participated in several service activities within the IEEE. He served as the TP co-Chair for ICIP 2020. He is an IEEE Fellow.
Mohit Prabhushankar received his Ph.D. degree in electrical engineering from the Georgia Institute of Technology (Georgia Tech), Atlanta, Georgia, 30332, USA, in 2021. He is currently a Postdoctoral Research Fellow in the School of Electrical and Computer Engineering at the Georgia Institute of Technology in the Omni Lab for Intelligent Visual Engineering and Science (OLIVES). He is working in the fields of image processing, machine learning, active learning, healthcare, and robust and explainable AI. He is the recipient of the Best Paper award at ICIP 2019 and Top Viewed Special Session Paper Award at ICIP 2020. He is the recipient of the ECE Outstanding Graduate Teaching Award, the CSIP Research award, and of the Roger P Webb ECE Graduate Research Excellence award, all in 2022.
Presented by: Muhammad Haroon Yousaf, Muhammad Saad Saeed, Muhammad Naeem Mumtaz Awan
This tutorial has been planned to acquaint the audience with latest tools in edge computing for robot vision. Initially, the attendees will be introduced to the basics of robot vision and the challenges to be solved by robot vision. Diving deeper, application agnostic models will be discussed and in the end the model’s optimization and deployment on the Jetson devices will be presented.
This tutorial will enable participants how to prepare custom datasets, train their own custom models, and deploy custom/pre-trained machine vision models on edge devices. This tutorial will also brief participants to make applications by leveraging the strength of edge computing devices.
The Learning outcome of this course will be as following
- Enhanced knowledge about robot vision
- Problems that can be solved in combination of robots and vision
- Challenges in aerial vision
- Object detection from aerial-view
- Real-world application of robotics
- Optimization of Vision Models
Introduction to Robot Vision
- Robot vision
- Need of robot vision
- Robots with vision sensors
- Autonomous cars
- Under water ROV
- Challenges and Application Areas
Robot Vision Models
- Introduction to real-time vision Models
- Vision in Aerial Robotics
- Object Detection from Aerial-view using Edge Computing
Model Optimization and Deployment on Edge Devices
- Optimization tools
- Tensor RT
- Implementation of Optimized Model
- Jetson Nano
- Jetson Xavier
- Computer Vision Application in Robotics
- Introduction to Object Detection:
- Understanding of Object Detection Based on CNN Family and YOLO
Muhammad Haroon Yousaf is working as a Professor of Computer Engineering at University of Engineering and Technology Taxila, Pakistan. He has more than 17 years of teaching/research experience. His research interests are Image Processing, Computer Vision and Robotics. He is also the Director of Swarm Robotics Lab under National Centre for Robotics and Automation Pakistan. He has secured many funded (Govt. & Industry) research projects in his areas of interest. He has published more than 70 research papers in peer-reviewed international conferences and journals. He has supervised 03 PhDs and more than 30 MS thesis in the domain of image processing and computer vision. He is Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology. He has been providing reviewing services for many prestigious peer- reviewed journals. He has served on technical program committees of many international conferences. He has been the PhD examiner/evaluator for different international universities. Prof. Haroon has received the BEST UNIVERSITY TEACHER AWARD by HIGHER EDUCATION COMMISSION, PAKISTAN in 2014. He is working as a Mentor to couple of Tech Startups in the domain of Robotics and Computer Vision. He has served on the National level curriculum development committees in 2014, 2019 and 2021. He has also served on the national/international level experts panel/board to review research grants. He is Senior Member of IEEE and member of IEEE SPS. He was the General Chair of IEEE SPS Seasonal School on Computer Vision Applications in Robotics (CVAR).
Muhammad Naeem Mumtaz is working as a Research Associate in Swarm Robotics Lab. He has more than two years of experience in Computer Vision. He received his MS Electrical Engineering degree from NUST in 2019 and BS degree (Gold Medal) in Electrical Engineering in 2016 from Riphah International University. His area of Research is Object Detection, Semantic Segmentation, Artificial Intelligence on edge and Computer Vision.
Muhammad Saad Saeed is working as a Research Associate in Swarm Robotics Lab. He is also working as Chief Technology Officer (CTO) in a Computer Vision based startup “BeeMantis.” Saad has more than three years of R&D experience in Deep Learning with applications in Computer Vision, Multimodal Learning, AI (Artificial Intelligence) on Edge, Speech, and Audio Processing. He is a Professional Member of IEEE (Institute of Electrical and Electronics Engineers) and member of IEEE SPS.