A. Tuan Nguyen | Publications

For the most up-to-date list, please refer to my Google Scholar.

2024

ECCV
uCAP: An Unsupervised Prompting Method for Vision-Language Models

A. Tuan Nguyen, Kai Sheng Tai, Sirius Chen, Satya Narayan Shukla, Hanchao Yu, Philip Torr, Taipeng Tian, and Ser-Nam Lim

European Conference on Computer Vision (Oral), 2024

Abs Bib HTML PDF

This paper addresses a significant limitation that prevents Contrastive Language-Image Pretrained Models (CLIP) from achieving optimal performance on downstream image classification tasks. The key problem with CLIP-style zero-shot classification is that it requires domain-specific context in the form of prompts to better align the class descriptions to the downstream data distribution. In particular, prompts for vision-language models are domain-level texts (e.g., “a centered satellite image of ...”) which, together with the class names, are fed into the text encoder to provide more context for the downstream dataset. These prompts are typically manually tuned, which is time consuming and often sub-optimal. To overcome this bottleneck, this paper proposes uCAP, a method to automatically learn domain-specific prompts/contexts using only unlabeled in-domain images. We achieve this by modeling the generation of images given the class names and a domain-specific prompt with an unsupervised likelihood distribution, and then performing inference of the prompts. We validate the proposed method across various models and datasets, showing that uCAP consistently outperforms manually tuned prompts and related baselines on the evaluated datasets: ImageNet, CIFAR-10, CIFAR-100, OxfordPets (up to 2%), SUN397 (up to 5%), and Caltech101 (up to 3%).
@article{nguyen2024ucap, abbr = {ECCV}, selected = {true}, html = {https://eccv.ecva.net/virtual/2024/poster/2005}, bibtex_show = {true}, pdf = {nguyen2024ucap.pdf}, title = {uCAP: An Unsupervised Prompting Method for Vision-Language Models}, author = {Nguyen, A. Tuan and Tai, Kai Sheng and Chen, Sirius and Shukla, Satya Narayan and Yu, Hanchao and Torr, Philip and Tian, Taipeng and Lim, Ser-Nam}, journal = {European Conference on Computer Vision (Oral)}, year = {2024} }

2023

CVPR
TIPI: Test Time Adaptation with Transformation Invariance

A. Tuan Nguyen, Thanh Nguyen-Tang, Ser-Nam Lim, and Philip Torr

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Abs Bib HTML PDF

When deploying a machine learning model to a new environment, we often encounter the distribution shift problem – meaning the target data distribution is different from the model’s training distribution. In this paper, we assume that labels are not provided for this new domain, and that we do not store the source data (e.g., for privacy reasons). It has been shown that even small shifts in the data distribution can affect the model’s performance severely. Test Time Adaptation offers a means to combat this problem, as it allows the model to adapt during test time to the new data distribution, using only unlabeled test data batches. To achieve this, the predominant approach is to optimize a surrogate loss on the test-time unlabeled target data. In particular, minimizing the prediction’s entropy on target samples \citewang2020tent has received much interest as it is task-agnostic and does not require altering the model’s training phase (e.g., does not require adding a self-supervised task during training on the source domain). However, as the target data’s batch size is often small in real-world scenarios (e.g., autonomous driving models process each few frames in real-time), we argue that this surrogate loss is not optimal since it often collapses with small batch sizes. To tackle this problem, in this paper, we propose to use an invariance regularizer as the surrogate loss during test-time adaptation, motivated by our theoretical results regarding the model’s performance under input transformations. The resulting method (TIPI – Test tIme adaPtation with transformation Invariance) is validated with extensive experiments in various benchmarks (Cifar10-C, Cifar100-C, ImageNet-C, DIGITS, and VisDA17). Remarkably, TIPI is robust against small batch sizes (as small as 2 in our experiments), and consistently outperforms TENT \citewang2020tent in all settings.
@article{nguyen2023tipi, abbr = {CVPR}, selected = {true}, html = {https://openaccess.thecvf.com/content/CVPR2023/html/Nguyen_TIPI_Test_Time_Adaptation_With_Transformation_Invariance_CVPR_2023_paper.html}, bibtex_show = {true}, pdf = {CVPR_nguyen2023tipi.pdf}, title = {TIPI: Test Time Adaptation with Transformation Invariance}, author = {Nguyen, A. Tuan and Nguyen-Tang, Thanh and Lim, Ser-Nam and Torr, Philip}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2023} }

2022

NeurIPS
FedSR: A Simple and Effective Domain Generalization Method for Federated Learning

A. Tuan Nguyen, Philip Torr, and Ser-Nam Lim

Advances in Neural Information Processing Systems, 2022

Abs Bib HTML PDF

Federated Learning (FL) refers to the decentralized and privacy-preserving machine learning framework in which multiple clients collaborate (with the help of a central server) to train a global model without sharing their data. However, most existing FL methods only focus on maximizing the model’s performance on the source clients’ data (e.g., mobile users) without considering its generalization ability to unknown target data (e.g., a new user). In this paper, we incorporate the problem of Domain Generalization (DG) into Federated Learning to tackle the aforementioned issue. However, virtually all existing DG methods require a centralized setting where data is shared across the domains, which violates the principles of decentralized FL and hence not applicable. To this end, we propose a simple yet novel representation learning framework, namely FedSR, which enables domain generalization while still respecting the decentralized and privacy-preserving natures of this FL setting. Motivated by classical machine learning algorithms, we aim to learn a simple representation of the data for better generalization. In particular, we enforce an L2-norm regularizer on the representation and a conditional mutual information (between the representation and the data given the label) regularizer to encourage the model to only learn essential information (while ignoring spurious correlations such as the background). Furthermore, we provide theoretical connections between the above two objectives and representation alignment in domain generalization. Extensive experimental results suggest that our method significantly outperforms relevant baselines in this particular problem.
@article{nguyen2022fedsr, abbr = {NeurIPS}, pdf = {NeurIPS_nguyen2022fedsr.pdf}, selected = {true}, html = {https://openreview.net/forum?id=mrt90D00aQX}, bibtex_show = {true}, title = {FedSR: A Simple and Effective Domain Generalization Method for Federated Learning}, author = {Nguyen, A. Tuan and Torr, Philip and Lim, Ser-Nam}, journal = {Advances in Neural Information Processing Systems}, year = {2022} }
ICLR
KL Guided Domain Adaptation

A. Tuan Nguyen, Toan Tran, Yarin Gal, Philip H. S. Torr, and Atılım Güneş Baydin

International Conference on Learning Representations, 2022

Abs Bib HTML PDF

Domain adaptation is an important problem and often needed for real-world ap- plications. In this problem, instead of i.i.d. datapoints, we assume that the source (training) data and the target (testing) data have different distributions. With that setting, the empirical risk minimization training procedure often does not perform well, since it does not account for the change in the distribution. A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain. However, these approaches often require additional networks and/or optimizing an adversarial (minimax) objective, which can be very expensive or unstable in practice. To tackle this problem, we first derive a generalization bound for the target loss based on the training loss and the reverse Kullback–Leibler (KL) divergence between the source and the target representation distributions. Based on this bound, we derive an algorithm that minimizes the KL term to obtain a better generalization to the target domain. We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples without any additional network or a minimax objective. This leads to a theoretically sound alignment method which is also very efficient and stable in practice. Experimental results also suggest that our method outperforms other representation-alignment approaches.
@article{nguyen2022kl, abbr = {ICLR}, bibtex_show = {true}, selected = {true}, pdf = {ICLR_nguyen2022kl.pdf}, html = {https://openreview.net/forum?id=0JzqUlIVVDd}, title = {KL Guided Domain Adaptation}, author = {Nguyen, A. Tuan and Tran, Toan and Gal, Yarin and Torr, Philip H. S. and Baydin, At{\i}l{\i}m G{\"u}ne{\c{s}}}, journal = {International Conference on Learning Representations}, year = {2022} }
ICLR
Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Thanh Nguyen-Tang, Sunil Gupta, A. Tuan Nguyen, and Svetha Venkatesh

International Conference on Learning Representations, 2022

Abs Bib HTML PDF

Offline policy learning (OPL) leverages existing data collected a priori for policy optimization without any active exploration. Despite the prevalence and recent interest in this problem, its theoretical and algorithmic foundations in function approximation settings remain under-developed. In this paper, we consider this problem on the axes of distributional shift, optimization, and generalization in offline contextual bandits with neural networks. In particular, we propose a provably efficient offline contextual bandit with neural network function approximation that does not require any functional assumption on the reward. We show that our method provably generalizes over unseen contexts under a milder condition for distributional shift than the existing OPL works. Notably, unlike any other OPL method, our method learns from the offline data in an online manner using stochastic gradient descent, allowing us to leverage the benefits of online learning into an offline setting. Moreover, we show that our method is more computationally efficient and has a better dependence on the effective dimension of the neural network than an online counterpart. Finally, we demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems.
@article{nguyentang2022offline, abbr = {ICLR}, pdf = {ICLR_nguyentang2022offline.pdf}, bibtex_show = {true}, html = {https://openreview.net/forum?id=sPIFuucA3F}, title = {Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization}, author = {Nguyen-Tang, Thanh and Gupta, Sunil and Nguyen, A. Tuan and Venkatesh, Svetha}, journal = {International Conference on Learning Representations}, year = {2022} }
ICML
Set Based Stochastic Subsampling

Bruno Andreis, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, and Sung Ju Hwang

International Conference on Machine Learning, 2022

Abs Bib HTML PDF

Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained gloabl information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.
@article{andreis2022set, abbr = {ICML}, bibtex_show = {true}, pdf = {ICML_andreis2022set.pdf}, html = {https://proceedings.mlr.press/v162/andreis22a}, title = {Set Based Stochastic Subsampling}, author = {Andreis, Bruno and Lee, Seanie and Nguyen, A. Tuan and Lee, Juho and Yang, Eunho and Hwang, Sung Ju}, journal = {International Conference on Machine Learning}, year = {2022} }
preprint
Task-Agnostic Robust Representation Learning

A. Tuan Nguyen, Ser Nam Lim, and Philip Torr

Under Review, 2022

Abs Bib HTML PDF

It has been reported that deep learning models are extremely vulnerable to small but intentionally chosen perturbations of its input. In particular, a deep network, despite its near-optimal accuracy on the clean images, often mis-classifies an image with a worst-case but humanly imperceptible perturbation (so-called adversarial examples). To tackle this problem, a great amount of research has been done to study the training procedure of a network to improve its robustness. However, most of the research so far has focused on the case of supervised learning. With the increasing popularity of self-supervised learning methods, it is also important to study and improve the robustness of their resulting representation on the downstream tasks. In this paper, we study the problem of robust representation learning with unlabeled data in a task-agnostic manner. Specifically, we first derive an upper bound on the adversarial loss of a prediction model (which is based on the learned representation) on any downstream task, using its loss on the clean data and a robustness regularizer. Moreover, the regularizer is task-independent, thus we propose to minimize it directly during the representation learning phase to make the downstream prediction model more robust. Extensive experiments show that our method achieves preferable adversarial performance compared to relevant baselines.
@article{nguyen2022task, abbr = {preprint}, pdf = {preprint_nguyen2022task.pdf}, html = {https://arxiv.org/abs/2203.07596}, bibtex_show = {true}, title = {Task-Agnostic Robust Representation Learning}, author = {Nguyen, A. Tuan and Lim, Ser Nam and Torr, Philip}, journal = {Under Review}, year = {2022} }

2021

NeurIPS
Domain Invariant Representation Learning with Domain Density Transformations

A. Tuan Nguyen, Toan Tran, Yarin Gal, and Atılım Güneş Baydin

Advances in Neural Information Processing Systems, 2021

Abs Bib HTML PDF

Domain generalization refers to the problem where we aim to train a model on data from a set of source domains so that the model can generalize to unseen target domains. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform suboptimally, since the information learned by that model might be domain-specific and generalize imperfectly to target domains. To tackle this problem, a predominant domain generalization approach is to learn some domain-invariant information for the prediction task, aiming at a good generalization across domains. In this paper, we propose a theoretically grounded method to learn a domain-invariant representation by enforcing the representation network to be invariant under all transformation functions among domains. We next introduce the use of generative adversarial networks to learn such domain transformations in a possible implementation of our method in practice. We demonstrate the effectiveness of our method on several widely used datasets for the domain generalization problem, on all of which we achieve competitive results with state-of-the-art models.
@article{nguyen2021domain, abbr = {NeurIPS}, bibtex_show = {true}, selected = {true}, pdf = {NeurIPS_nguyen2021domain.pdf}, html = {https://papers.nips.cc/paper/2021/hash/2a2717956118b4d223ceca17ce3865e2-Abstract.html}, title = {Domain Invariant Representation Learning with Domain Density Transformations}, author = {Nguyen, A. Tuan and Tran, Toan and Gal, Yarin and Baydin, At{\i}l{\i}m G{\"u}ne{\c{s}}}, journal = {Advances in Neural Information Processing Systems}, year = {2021} }
AAAI
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning

A. Tuan Nguyen, Hyewon Jeong, Eunho Yang, and Sung Ju Hwang

Proceedings of the AAAI Conference on Artificial Intelligence, 2021

Abs Bib HTML PDF

Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since low loss could result from overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on the feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model.
@article{nguyen2021clinical, abbr = {AAAI}, bibtex_show = {true}, selected = {true}, pdf = {AAAI_nguyen2021clinical.pdf}, html = {https://ojs.aaai.org/index.php/AAAI/article/view/17097}, title = {Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning}, volume = {35}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/17097}, number = {10}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, author = {Nguyen, A. Tuan and Jeong, Hyewon and Yang, Eunho and Hwang, Sung Ju}, year = {2021}, pages = {9081-9091} }
IEEE TMC
Detection of Microsleep Events with a Behind-the-ear Wearable System

Nhat Pham, Tuan Dinh, Taeho Kim, Zohreh Raghebi, Nam Bui, Hoang Truong, A. Tuan Nguyen, Farnoush Banaei-Kashani, Ann Halbower, Thang N. Dinh, Vp Nguyen, and Tam Vu

IEEE Transactions on Mobile Computing, 2021

Abs Bib HTML PDF

Every year, the U.S. economy loses more than $411 billion because of work performance reduction, injuries, and traffic accidents caused by microsleep. To mitigate microsleeps consequences, an unobtrusive, reliable, and socially acceptable microsleep detection solution throughout the day, every day is required. Unfortunately, existing solutions do not meet these requirements. In this paper, we propose WAKE, a novel behind-the-ear wearable device for microsleep detection. By monitoring biosignals from the brain, eye movements, facial muscle contractions, and sweat gland activities from behind the user’s ears, WAKE can detect microsleep with a high temporal resolution. We introduce a Three-fold Cascaded Amplifying (3CA) technique to tame the motion artifacts and environmental noises for capturing high fidelity signals. Through our prototyping, we show that WAKE can suppress motion and environmental noise in real-time by 9.74-19.47 dB while walking, driving, or staying in different environments ensuring that the biosignals are captured reliably. We evaluated WAKE using gold-standard devices on 19 sleep-deprived and narcoleptic subjects. The Leave-One-Subject-Out Cross-Validation results show the feasibility of WAKE in microsleep detection on an unseen subject with average precision and recall of 76% and 85%, respectively.
@article{9462324, abbr = {IEEE TMC}, pdf = {IEEETMC_pham2021detection.pdf}, bibtex_show = {true}, html = {https://ieeexplore.ieee.org/abstract/document/9462324}, author = {Pham, Nhat and Dinh, Tuan and Kim, Taeho and Raghebi, Zohreh and Bui, Nam and Truong, Hoang and Nguyen, A. Tuan and Banaei-Kashani, Farnoush and Halbower, Ann and Dinh, Thang N. and Nguyen, Vp and Vu, Tam}, journal = {IEEE Transactions on Mobile Computing}, title = {Detection of Microsleep Events with a Behind-the-ear Wearable System}, year = {2021}, volume = {}, number = {}, pages = {1-1}, doi = {10.1109/TMC.2021.3090829} }