Research Interests: Foundation models| Efficient Pre-training| Efficient Inference| Knowledge Distillation
Research Interests: Foundation models| Efficient Pre-training| Efficient Inference| Knowledge Distillation
I am a PhD student at the University of Texas at Austin, advised by Prof. Sujay Sanghavi in the Department of Electrical and Computer Engineering. For my research, I work on simple things and simple things work for me. I am currently working on efficient training strategies for large models (mostly LLMs). Some of my recent works have been featured in Ahead of AI magazine, Marktechpost, and the Interconnects newsletter.
Before moving to Austin, I graduated with an M.Eng. degree in Information and Communication Engineering from Chongqing University of Posts and Telecommunications, Chongqing, China in 2019, and received a B.Tech degree in Electronics and Communication Engineering from the Maulana Abul Kalam Azad University of Technology (formerly known as West Bengal University of Technology), Kolkata, India. During my undergrad, I gloriously failed to scale up my startup, Tronix India, and later worked at an Indian multinational IT firm, TechMahindra.
I am a person who stutters some info about stuttering here.
I am currently on the job market, seeking both industry and postdoctoral positions.
Student Researcher in Gemini team at Google Deepmind (May-Aug 2025) |Recurrence in Language Models.
Research Intern at Lightning AI (May-Aug 2024) | Topic: Efficient Fine-tuning and Continual training of LLM.
Applied science Intern at Amazon Science Alexa (May-Aug 2022) | Topic: Vision Language pre-training and finetuning.
[ICML'25 Spotlight🏆] Sunny Sanyal, Hayden Prairie, Rudrajit Das, Ali Kavis and Sujay Sanghavi, "Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting". [paper] [code] [tweet]
This work has been selected as a Spotlight Poster at ICML 2025, placing it among the top 2.6% of all 12,107 submissions.
[COLM'24] Sunny Sanyal, Atula Tejaswi, Jean Kaddour, Abhishek Kumar, and Sujay Sanghavi, “Early Weight Averaging Meets High Learning Rates for LLM Pre-training”. [paper] [code]
Also presented at NeurIPS 2023 WANT workshop.
This paper is also featured in three popular newsletters. [media] [media] [media]
[Neurips'24 Dataset Track] Jeffery Li, Alex Fang, ... Sunny Sanyal et al. "DataComp-LM: In search of the next generation of language model training sets". [paper][tweet][code] Our work inspired Apple's DCLM-7B model (here).
I have also co-authored some papers in the field of wireless networks. You can find them on my Google Scholar.
Gave a talk at Prof. Tom Goldstein's Group at UMD on Inheritune: Training Smaller Yet More Attentive Language Models. [slides]
Gave a talk at Lightning AI's NYC office on Training Smaller and Efficient Language models. [slides]
Gave a talk at ml collective on Pre-training with a little less Data and Compute. [slides] [recordings]
Demo at Art Gallery CVPR 2023 on Generative Masking and In-painting for Videos. [demo]
Poster at 6G@UT symposium 2023 on Understanding the Effectiveness of Early Weight Averaging for Training LLMs.
Gave a talk (in-person) at Austin Deep learning community’s main event on Do Neural Networks Overthink? [link]
Organized Broadening Research Collaborations in ML, NeurIPS workshop 2022, New Orleans, US.
Member of Student Board, Diversity Equity and Inclusivity, Cockrell School of Engineering, UT Austin.
Reviewer: NeurIPs 2024, 2025 || ICML 2025 || COLM 2024
Technical Program Committee: EAI Fabulous 2019, Bulgaria.
Program Committee: ACM/SIGAPP SAC 2019, Cyprus.
Publicity Co-Chair: EAI UBICNET 2019, India.
Reviewer: IEEE Access || IEEE VTC Fall 2019, USA || IEEE ICC 2019, China || IEEE ICCCN 2019, Spain || EAI INTERSOL 2019, Egypt.
Moving to SF this summer 2025 to join Gemini team at Deepmind.
LAWA accepted at COLM conference.
Moving to NYC this summer.
May 2024: Gave a talk at ml collective's Deep Learning: Classics and Trends.
Our work featured in two newsletters.
Watch my CVPR 2023 demo in art gallery both at Demo hall area.