Dang Nguyen

dang2.jpg

Hi, I’m a CS Ph.D. student at UCLA under the supervision of Professor Baharan Mirzasoleiman. My research focuses on improving data quality to enhance the performance and efficiency of large (vision-)language models. Specifically, I work on synthetic data generation and data selection to optimize training, making these models more effective and accessible.

Before joining UCLA, I was an AI Resident at VinAI. Prior to that, I received my BS degree, summa cum laude, from Toyo University. Going further back in time, I was a graduate of High School for Gifted Students (Hanoi University of Science) and a Maths Olympian (silver medal, IMO 2015).

news

May 01, 2025 Our paper Synthetic Text Generation for Training Large Language Models via Gradient Matching is accepted to ICML 2025.
Jan 22, 2025 Our paper Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures is accepted to ICLR 2025.
Sep 25, 2024 Our paper Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization is accepted to NeurIPS 2024.
Sep 23, 2024 I join Google Research as a Student Researcher.
Jun 17, 2024 I join Cisco as a PhD research intern.