主题：How to Train Your Vicuna
– Finetuning and Serving LLMs in the Wild
摘要:While deep learning achieves great success in many applications, there is still lack of theoretical understandings. In this talk I will present our recent works on the theories of the representation power, optimization and generalization of deep learning. I first show deep neural networks with bounded width are universal approximators. Then I will talk about the training of a deep neural network. Traditional wisdom says that training deep nets is a highly nonconvex optimization problem. However, empirically one can often find global minima simply using gradient descent. I show that if the deep net is sufficiently wide, then starting from a random initialization, gradient descent provably finds global optima with a linear convergence rate. Finally, I will talk about why overparameterized deep neural networks can have good generalization.
简介：Hao Zhang is an Assistant Professor at Halıcıoğlu Data Science Institute and the Department of Computer Science and Engineering at UCSD. His research interests are in the intersection of machine learning and systems, focusing on improving the performance and ease-of-use of today’s distributed ML systems. Recently, Hao has been working actively on democratizing access to large language models (LLMs). Hao has created several popular open-source LLM projects, such as Alpa, Vicuna, and Fastchat. Hao’s research has been recognized with an NVIDIA pioneer research award at NeurIPS’17, and the Jay Lepreau best paper award at OSDI’21. Hao's previous open-source artifacts in ML systems have been used by organizations such as AI2, Meta, and Google. Parts of Hao's research have been commercialized at multiple start-ups including Petuum and AnyScale。