Dang Nguyen

Hi, I’m a CS Ph.D. candidate at UCLA under the supervision of Professor Baharan Mirzasoleiman. My research focuses on improving data quality to enhance the performance and efficiency of large (vision-)language models. Specifically, I work on synthetic data generation and data selection to optimize training, making these models more effective and accessible. More recently, I have also become interested in advancing reasoning via test-time scaling and RL training.

Before joining UCLA, I was an AI Resident at VinAI (now Qualcomm AI). Prior to that, I received my BS degree, summa cum laude, from Toyo University. Going further back in time, I was a graduate of High School for Gifted Students (Hanoi University of Science) and a Maths Olympian (IMO 2015 Silver).

news

Jun 23, 2025	I have officially advanced to Ph.D. candidacy! Looking forward to the next stage of my research journey.
May 15, 2025	Our paper Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity is accepted to ACL Findings 2025.
May 01, 2025	Our paper Synthetic Text Generation for Training Large Language Models via Gradient Matching is accepted to ICML 2025.
Jan 22, 2025	Our paper Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures is accepted to ICLR 2025.
Sep 25, 2024	Our paper Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization is accepted to NeurIPS 2024.