This paper addresses a significant limitation that prevents Contrastive Language-Image Pretrained Models (CLIP) from achieving optimal performance on downstream image classification tasks. The key problem with CLIP-style zero-shot classification is that it requires domain-specific context in the form of prompts to better align the class descriptions to the downstream data distribution. In particular, prompts for vision-language models are domain-level texts (e.g., “a centered satellite image of ...”) which, together with the class names, are fed into the text encoder to provide more context for the downstream dataset. These prompts are typically manually tuned, which is time consuming and often sub-optimal. To overcome this bottleneck, this paper proposes uCAP, a method to automatically learn domain-specific prompts/contexts using only unlabeled in-domain images. We achieve this by modeling the generation of images given the class names and a domain-specific prompt with an unsupervised likelihood distribution, and then performing inference of the prompts. We validate the proposed method across various models and datasets, showing that uCAP consistently outperforms manually tuned prompts and related baselines on the evaluated datasets: ImageNet, CIFAR-10, CIFAR-100, OxfordPets (up to 2%), SUN397 (up to 5%), and Caltech101 (up to 3%).
2023
CVPR
TIPI: Test Time Adaptation with Transformation Invariance
When deploying a machine learning model to a new environment, we often encounter the distribution shift problem – meaning the target data distribution is different from the model’s training distribution. In this paper, we assume that labels are not provided for this new domain, and that we do not store the source data (e.g., for privacy reasons). It has been shown that even small shifts in the data distribution can affect the model’s performance severely. Test Time Adaptation offers a means to combat this problem, as it allows the model to adapt during test time to the new data distribution, using only unlabeled test data batches. To achieve this, the predominant approach is to optimize a surrogate loss on the test-time unlabeled target data. In particular, minimizing the prediction’s entropy on target samples \citewang2020tent has received much interest as it is task-agnostic and does not require altering the model’s training phase (e.g., does not require adding a self-supervised task during training on the source domain). However, as the target data’s batch size is often small in real-world scenarios (e.g., autonomous driving models process each few frames in real-time), we argue that this surrogate loss is not optimal since it often collapses with small batch sizes. To tackle this problem, in this paper, we propose to use an invariance regularizer as the surrogate loss during test-time adaptation, motivated by our theoretical results regarding the model’s performance under input transformations. The resulting method (TIPI – Test tIme adaPtation with transformation Invariance) is validated with extensive experiments in various benchmarks (Cifar10-C, Cifar100-C, ImageNet-C, DIGITS, and VisDA17). Remarkably, TIPI is robust against small batch sizes (as small as 2 in our experiments), and consistently outperforms TENT \citewang2020tent in all settings.
2022
NeurIPS
FedSR: A Simple and Effective Domain Generalization Method for Federated Learning
Federated Learning (FL) refers to the decentralized and privacy-preserving machine learning framework in which multiple clients collaborate (with the help of a central server) to train a global model without sharing their data. However, most existing FL methods only focus on maximizing the model’s performance on the source clients’ data (e.g., mobile users) without considering its generalization ability to unknown target data (e.g., a new user). In this paper, we incorporate the problem of Domain Generalization (DG) into Federated Learning to tackle the aforementioned issue. However, virtually all existing DG methods require a centralized setting where data is shared across the domains, which violates the principles of decentralized FL and hence not applicable. To this end, we propose a simple yet novel representation learning framework, namely FedSR, which enables domain generalization while still respecting the decentralized and privacy-preserving natures of this FL setting. Motivated by classical machine learning algorithms, we aim to learn a simple representation of the data for better generalization. In particular, we enforce an L2-norm regularizer on the representation and a conditional mutual information (between the representation and the data given the label) regularizer to encourage the model to only learn essential information (while ignoring spurious correlations such as the background). Furthermore, we provide theoretical connections between the above two objectives and representation alignment in domain generalization. Extensive experimental results suggest that our method significantly outperforms relevant baselines in this particular problem.
Domain adaptation is an important problem and often needed for real-world ap-
plications. In this problem, instead of i.i.d. datapoints, we assume that the source
(training) data and the target (testing) data have different distributions. With that
setting, the empirical risk minimization training procedure often does not perform
well, since it does not account for the change in the distribution. A common
approach in the domain adaptation literature is to learn a representation of the input
that has the same distributions over the source and the target domain. However,
these approaches often require additional networks and/or optimizing an adversarial
(minimax) objective, which can be very expensive or unstable in practice. To tackle
this problem, we first derive a generalization bound for the target loss based on
the training loss and the reverse Kullback–Leibler (KL) divergence between the
source and the target representation distributions. Based on this bound, we derive
an algorithm that minimizes the KL term to obtain a better generalization to the
target domain. We show that with a probabilistic representation network, the KL
term can be estimated efficiently via minibatch samples without any additional
network or a minimax objective. This leads to a theoretically sound alignment
method which is also very efficient and stable in practice. Experimental results also
suggest that our method outperforms other representation-alignment approaches.
ICLR
Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
Thanh Nguyen-Tang, Sunil Gupta,
A. Tuan Nguyen, and Svetha Venkatesh
International Conference on Learning Representations, 2022
Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in function approximation settings remain under-developed. In this paper, we consider this problem on the axes of distributional shift, optimization, and generalization in offline contextual bandits with neural networks. In particular, we propose a provably efficient offline contextual bandit with neural network function approximation that does not require any functional assumption on the reward. We show that our method provably generalizes over unseen contexts under a milder condition for distributional shift than the existing OPL works. Notably, unlike any other OPL method, our method learns from the offline data in an online manner using stochastic gradient descent, allowing us to leverage the benefits of online learning into an offline setting. Moreover, we show that our method is more computationally efficient and has a better dependence on the effective dimension of the neural network than an online counterpart. Finally, we demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems.
ICML
Set Based Stochastic Subsampling
Bruno Andreis, Seanie Lee,
A. Tuan Nguyen, Juho Lee, Eunho Yang, and Sung Ju Hwang
International Conference on Machine Learning, 2022
Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained gloabl information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.
It has been reported that deep learning models are extremely vulnerable to small but intentionally chosen perturbations of its input. In particular, a deep network, despite its near-optimal accuracy on the clean images, often mis-classifies an image with a worst-case but humanly imperceptible perturbation (so-called adversarial examples). To tackle this problem, a great amount of research has been done to study the training procedure of a network to improve its robustness. However, most of the research so far has focused on the case of supervised learning. With the increasing popularity of self-supervised learning methods, it is also important to study and improve the robustness of their resulting representation on the downstream tasks. In this paper, we study the problem of robust representation learning with unlabeled data in a task-agnostic manner. Specifically, we first derive an upper bound on the adversarial loss of a prediction model (which is based on the learned representation) on any downstream task, using its loss on the clean data and a robustness regularizer. Moreover, the regularizer is task-independent, thus we propose to minimize it directly during the representation learning phase to make the downstream prediction model more robust. Extensive experiments show that our method achieves preferable adversarial performance compared to relevant baselines.
2021
NeurIPS
Domain Invariant Representation Learning with Domain Density Transformations
Domain generalization refers to the problem where we aim to train a model on
data from a set of source domains so that the model can generalize to unseen target
domains. Naively training a model on the aggregate set of data (pooled from all
source domains) has been shown to perform suboptimally, since the information
learned by that model might be domain-specific and generalize imperfectly to target
domains. To tackle this problem, a predominant domain generalization approach
is to learn some domain-invariant information for the prediction task, aiming at
a good generalization across domains. In this paper, we propose a theoretically
grounded method to learn a domain-invariant representation by enforcing the
representation network to be invariant under all transformation functions among
domains. We next introduce the use of generative adversarial networks to learn such
domain transformations in a possible implementation of our method in practice.
We demonstrate the effectiveness of our method on several widely used datasets for
the domain generalization problem, on all of which we achieve competitive results
with state-of-the-art models.
AAAI
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning
A. Tuan Nguyen, Hyewon Jeong, Eunho Yang, and Sung Ju Hwang
Proceedings of the AAAI Conference on Artificial Intelligence, 2021
Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since low loss could result from overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on the feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model.
IEEE TMC
Detection of Microsleep Events with a Behind-the-ear Wearable System
Nhat Pham, Tuan Dinh, Taeho Kim, Zohreh Raghebi, Nam Bui, Hoang Truong,
A. Tuan Nguyen, Farnoush Banaei-Kashani, Ann Halbower, Thang N. Dinh, Vp Nguyen, and Tam Vu
Every year, the U.S. economy loses more than $411 billion because of work performance reduction, injuries, and traffic accidents caused by microsleep. To mitigate microsleeps consequences, an unobtrusive, reliable, and socially acceptable microsleep detection solution throughout the day, every day is required. Unfortunately, existing solutions do not meet these requirements. In this paper, we propose WAKE, a novel behind-the-ear wearable device for microsleep detection. By monitoring biosignals from the brain, eye movements, facial muscle contractions, and sweat gland activities from behind the user’s ears, WAKE can detect microsleep with a high temporal resolution. We introduce a Three-fold Cascaded Amplifying (3CA) technique to tame the motion artifacts and environmental noises for capturing high fidelity signals. Through our prototyping, we show that WAKE can suppress motion and environmental noise in real-time by 9.74-19.47 dB while walking, driving, or staying in different environments ensuring that the biosignals are captured reliably. We evaluated WAKE using gold-standard devices on 19 sleep-deprived and narcoleptic subjects. The Leave-One-Subject-Out Cross-Validation results show the feasibility of WAKE in microsleep detection on an unseen subject with average precision and recall of 76% and 85%, respectively.