BU Today Featured Events
Sign Up
Biological Science Center, 2 Cummington Mall, Room 107

Training Large Language Models (LLMs) requires a large neural network, large data, and large compute. We will discuss these difficulties. We’ll look at the Transformer architecture in detail to develop a quantitative understanding of how it works and how specifically tools like ChatGPT, DeepSeek, Llama, etc. work. We will then use a pre-trained SentenceTransformer model to do a range of classification on real-world data.

Event Details

See Who Is Interested

0 people are interested in this event

User Activity

No recent activity