VASBSD 2021 - Vision Applications & Solutions to Biased or Scarce Data

Speaker Details

Andrew D. Bagdanov

University of Florence

Bio:
Andrew D. Bagdanov received a PhD in Computer Science in 2004 from the University of Amsterdam. He held postdoctoral positions at the University of Florence and the Universidad Autonoma de Barcelona, and a senior development position at the FAO of the United Nations. He is currently an Associate Professor of Information Engineering at the University of Florence, Italy. His research spans a broad gamut of image processing, computer vision, and machine learning.

Keynote Title:
Self-supervised Learning: Getting More for Less out of Your CNNs

Keynote Abstract:
The advent of viable, deep neural network architectures for visual recognition continues to radically reshape the landscape of the state-of-the-art in computer vision. These developments have brought amazing new possibilities and formidable new challenges to the table. In this talk we will see how self-supervised learning approaches can be designed and implemented in order to mitigate the data-hungry nature of Convolutional Neural Networks and dramatically reduce the burden of supervision. We will see how to design proxy-tasks for supervision and how they can be exploited to learn semantically meaningful visual representations for recognition problems. Self-supervised learning, with carefully crafted proxy tasks, can address critical issues in training and deployment of state-of-the-art CNNs and enable one to get more from CNNs for less -- that is, to obtain state-of-the-art results on multiple visual recognition tasks with fewer labeled training examples.

[Talk Video]

Ehsan Elhamifar

Northeastern University

Bio:
Ehsan Elhamifar is currently an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical Data Science (MCADS) Lab at Northeastern University. Dr. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. He obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University. Dr. Elhamifar's research areas are machine learning, computer vision and optimization. He develops scalable, robust and interpretable methods for learning from complex and massive high-dimensional data and works on applications of these tools to structured visual data summarization, learning instructions from videos and large-scale recognition with small/no labeled data.

Keynote Title:
Recognizing More Labels with No Samples: Fine-Grained and Multi-Label Zero-Shot Learning

Keynote Abstract:
The success of deep neural networks (DNNs) on a variety of tasks, such as recognition, detection, semantic segmentation and captioning, relies on the availability of large amounts of annotated data. However, to successfully support real-world applications, DNNs must learn tens of thousands of labels, handle a priori unseen labels and localize them in images. In this talk, I discuss learning high performance and robust DNNs for recognition of a large number of labels, many of which have small or no annotated images. I will focus on two challenging scenarios: i) fine-grained zero-shot recognition and localization where unseen and seen labels are visually similar, requiring extremely expensive annotations by experts, ii) multi-label zero-shot recognition and localization where all existing labels in an image must be found, as opposed to the conventional problem of finding the dominant class. I discuss methods based on an ensemble of label-agnostic and attribute-based attention models that capture both common and discriminative information about labels, hence not only perform well for prediction of seen labels, but also transfer attention to unseen labels. I show that these new methods significantly improve the state of the art on large-scale image datasets and standard benchmarks.

[Talk Video]

Mehrtash Harandi

Monash University

Bio:
Dr. Mehrtash Harandi is a senior lecturer in the department of Electrical and Computer Systems Eng. (ECSE) at Monash University. He is also a contributing research scientist at the Machine Learning Research Group (MLRG) at Data61-CSIRO and an associated investigator at Australian Center for Robotic Vision (ACRV).

Before joining Monash University, he spent 5 years at Canberra Research Laboratory-NICTA, working with Prof. Richard Hartley and Prof. Fatih Porikli. Prior to that, he worked at Queensland Research Laboratory-NICTA with Prof. Brian Lovell.

He is interested in various aspects of learning, especially with a flavor of visual data. He is an associate editor of the IET-CV and Journal of Imaging, and he regularly reviews papers for top conferences and journals in ML/CV including CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, IEEE TPAMI, IEEE TNNLS, IEEE TIP.

Keynote Title:
Poincaré Kernels for Hyperbolic Representations

Keynote Abstract:
Embedding data in hyperbolic spaces has proven beneficial for many advanced machine learning applications such as image classification and word embeddings. However, working in hyperbolic spaces is not without difficulties as a result of its curved geometry (eg, computing the mean of a set of points requires an iterative algorithm). In Euclidean spaces, one can resort to kernel machines that not only enjoy rich theoretical properties but also can lead to a superior representational power (eg, infinite-width neural networks). In this talk, we introduce positive definite kernel functions for hyperbolic spaces. This brings in two major advantages, 1. kernelization will pave the way to seamlessly benefit from kernel machines in conjunction with hyperbolic embeddings, and 2. the rich structure of the Hilbert spaces associated with kernel machines enables us to simplify various operations involving hyperbolic data. That said, identifying valid kernel functions is not straightforward and is indeed considered an open-problem in the learning community. Our work here addresses this gap and develops several valid positive definite Poincaré kernels for hyperbolic representations, including the universal ones (eg, RBF). We study the effectiveness of the proposed hyperbolic kernels on a variety of challenging tasks including few-shot learning and zero-shot learning.

Gim Hee Lee

National University of Singapore

Bio:
Gim Hee Lee is currently an Assistant Professor at the Department of Computer Science at the National University of Singapore (NUS), where he heads the Computer Vision and Robotic Perception (CVRP) Laboratory. He is also affiliated with the NUS Institute of Data Science. He was a researcher at Mitsubishi Electric Research Laboratories (MERL), USA. Prior to MERL, he did his PhD in Computer Science at ETH Zurich, and B.Eng with first class honors and M.Eng degrees in Mechanical Engineering at NUS. His research interests include computer vision, robotic perception and machine learning. He serves as an Area Chair for BMVC 2020, 3DV 2020 and CVPR 2021, and will be part of the organizing committee as the Exhibition/Demo Chair for CVPR 2023.

Keynote Title:
Semi-supervised 3D Object Detection

Keynote Abstract:
Despite the promising results, many existing deep learning-based 3D object detection algorithms are highly dependent on strongly supervised learning with large amounts of ground truth training data. However, the 3D object bounding box ground truth labels are often laborious and costly to obtain due to the sparsity and amodality of the 3D point clouds. In this talk, I will present two of our recent works on semi-supervised learning to alleviate the need for large amounts of ground truth labels for 3D object detection. The first work is on cross-category semi-supervised learning, where knowledge is transferred from strong classes with ground truth labels to weak classes without ground truth labels. The second work is on in-category semi-supervised learning, where knowledge is propagated within each class with few ground truth labels.

[Talk Video]

Xiaodan Liang

Sun Yat-sen University

Bio:
Currently, Dr. Xiaodan Liang is an Associate Professor at the School of Intelligent Systems Engineering, Sun Yat-sen University. She is co-supervising the HCP-I2 Lab SYSU. Before that, she was a Project Scientist in Machine Learning Department, Carnegie Mellon University, working with Prof. Eric P. Xing. She obtained her Ph.D. degree in the School of Data and Computer Science at Sun Yat-sen University, advised by Prof. Liang Lin. She was a visiting scholar in the Department of EECS of the National University of Singapore, working with Prof. Shuicheng Yan. She has closely collaborated with Dr. Xiaohui Shen in Adobe Research and Dr. Jianchao Yang in SnapChat Research.

Keynote Title:
Towards Efficient and Transferrable Network Architecture Search for Visual Recognition

Cees G. M. Snoek

University of Amsterdam

Bio:
Dr. Cees G.M. Snoek is a full professor in computer science at the University of Amsterdam, where he heads the Intelligent Sensory Information Systems Lab. He is also a director of three public-private AI research labs: QUVA Lab with Qualcomm, Atlas Lab with TomTom and AIM Lab with the Inception Institute of Artificial Intelligence. At University spin-off Kepler Vision Technologies he acts as Chief Scientific Officer. Professor Snoek is also the director of the master program in Artificial Intelligence and co-founder of the Innovation Center for Artificial Intelligence.

He received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands. His research interests focus on video and image recognition. He has published over 200 refereed book chapters, journal and conference papers, and frequently serves as an area chair of the major conferences in computer vision and multimedia.

Professor Snoek is the lead researcher of the award-winning MediaMill Semantic Video Search Engine, which is the most consistent top performer in the yearly NIST TRECVID evaluations. Please look at his website for the detailed record of his awards and past experiences.

Keynote Title:
De-biasing Algorithms for Images Seen and Unseen

Keynote Abstract:
It is well known that datasets in computer vision have a strong built-in bias as they can represent only a narrow view of the visual world. Even though addressing biases from the start of the dataset creation is highly recommended, models learned from such data can still be affected by spurious correlations and produce unfair decisions. In this talk I present two algorithms that try to mitigate this bias. For seen images we identify a bias direction in the feature space that corresponds to the main direction of maximum variance of class-specific prototypes. In light of this, we propose to learn to map inputs to domain-specific embeddings, where each value of a protected attribute has its own domain. For unseen images, in a generalized zero-shot learning setting, we propose a bias-aware learner to map inputs to a semantic embedding space. During training, the model learns to regress to real-valued class prototypes in the embedding space with temperature scaling, while a margin-based bidirectional entropy term regularizes seen and unseen probabilities. Experiments demonstrate the benefits of the proposed de-biased classifiers in multi-label and zero-label settings, as well as their ability to improve fairness of the predictions.

[Talk Video]

Zhangyang (Atlas) Wang

University of Texas at Austin

Bio:
Zhangyang (Atlas) Wang is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of Texas at Austin. He graduated with a Ph.D. from UIUC, advised by Prof. Thomas S. Huang. Before joining the University of Texas at Austin, he was a Research Scientist in the Department of Industrial and Systems Engineering at the University of Washington. His research interests are mainly in machine learning and computer vision. His recent works focus on the following areas: (a) Enhancing deep learning robustness, efficiency, and privacy/fairness. (b) Deep learning for optimization, and optimization for deep learning. (c) Applications: Computer vision and interdisciplinary problems.

Keynote Title:
Robustness from Unlabeled Data

Keynote Abstract:
Pretrained models from self-supervision are prevalently used in fine-tuning downstream tasks faster or for better accuracy. However, gaining robustness from pretraining is left unexplored. We first introduce adversarial training into self-supervision, to provide general-purpose robust pretrained models for the first time. We find these robust pretrained models can benefit the subsequent fine-tuning in two ways: i) boosting final model robustness; ii) saving the computation cost, if proceeding towards adversarial fine-tuning. Further, we improve robustness-aware self-supervised pre-training by learning representations that are consistent under both data augmentations and adversarial perturbations. The new approach leverages a recent contrastive learning framework, which learns representations by maximizing feature consistency under differently augmented views. This fits particularly well with the goal of adversarial robustness, as one cause of adversarial fragility is the lack of feature invariance, i.e., small input perturbations can result in undesirable large changes in features or even predicted labels. Our results exemplify the important role that unlabeled data could play in advancing machine learning robustness.

[Talk Video]

Junsong Yuan

University at Buffalo, State University of New York

Bio:
Dr. Junsong Yuan is currently an Associate Professor and Director of Visual Computing Lab at the Department of Computer Science and Engineering (CSE), State University of New York at Buffalo, USA. Before that he was an Associate Professor at Nanyang Technological University (NTU), Singapore. He obtained his Ph.D. from Northwestern University, M.Eng. from National University of Singapore and B.Eng from the Special Program for the Gifted Young of Huazhong University of Science and Technology (HUST), China. His research interests include computer vision, pattern recognition, video analytics, gesture and action analysis, large-scale visual search and mining. He received Best Paper Award from IEEE Trans. on Multimedia, Nanyang Assistant Professorship from NTU, and Outstanding EECS Ph.D. Thesis award from Northwestern University. He is currently Senior Area Editor of Journal of Visual Communications and Image Representation (JVCI), Associate Editor of IEEE Trans. on Image Processing (T-IP) and IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT), and served as Guest Editor of International Journal of Computer Vision (IJCV). He is Program Co-Chair of IEEE Conf. on Multimedia Expo (ICME'18) and Steering Committee Member of ICME (2018-2019). He also served as Area Chair for CVPR, ICIP, ICPR, ACCV, ACM MM, WACV etc. He is a Fellow of International Association of Pattern Recognition (IAPR).

Keynote Title:
Visual Learning with Less Labeled Data

[Talk Video]