Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

By09rupeshraj4@gmail.com

Oct 19, 2025

For more information about Stanford’s Artificial Intelligence programs visit:

This lecture provides a concise overview of building a ChatGPT-like model, covering both pretraining (language modeling) and post-training (SFT/RLHF). For each component, it explores common practices in data collection, algorithms, and evaluation methods. This guest lecture was delivered by Yann Dubois in Stanford’s CS229: Machine Learning course, in Summer 2024.

Yann Dubois
PhD Student at Stanford

About the speaker: Yann Dubois is a fourth-year CS PhD student advised by Percy Liang and Tatsu Hashimoto. His research focuses on improving the effectiveness of AI when resources are scarce. Most recently, he has been part of the Alpaca team, working on training and evaluating language models more efficiently using other LLMs.

To view all online courses and programs offered by Stanford, visit:

Chapters:
00:00 – Introduction
00:10 – Recap on LLMs
00:16 – Definition of LLMs
00:19 – Examples of LLMs
01:16 – Importance of Data
01:20 – Evaluation Metrics
01:33 – Systems Component
01:41 – Importance of Systems
01:47 – LLMs Based on Transformers
01:57 – Focus on Key Topics
02:00 – Transition to Pretraining
03:02 – Overview of Language Modeling
04:17 – Generative Models Explained
05:15 – Autoregressive Models Definition
06:36 – Autoregressive Task Explanation
07:49 – Training Overview
08:48 – Tokenization Importance
10:50 – Tokenization Process
13:30 – Example of Tokenization
16:00 – Evaluation with Perplexity
20:50 – Current Evaluation Methods
24:30 – Academic Benchmark: MMLU

source

div style="text-align: center;">

37 thoughts on “Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)”

@R.GH7329 says:

October 19, 2025 at 1:26 am

this vd lecture just extraordinary for evey1's those who learn llm & ml. that's why stanford world top university.

Reply
@karank_23 says:

October 19, 2025 at 1:26 am

All this talk about streamlining data workflows reminds me of how much Pneumatic Workflow has helped our team integrate AI processes seamlessly.

Reply
@ronakbhatt4880 says:

October 19, 2025 at 1:26 am

@13:00 so simple yet so beautiful way we thought to start Gen AI!

Reply
@NaveedAnjum-381 says:

October 19, 2025 at 1:26 am

Integrating AI systems into existing workflows used to be a nightmare. Pneumatic Workflow's no-code setup turned that around for us completely!

Reply
@Nurumohammed-6355 says:

October 19, 2025 at 1:26 am

I love how Yann breaks down the complexity of LLMs. AICarma's been great for keeping us visible in AI conversations. Can't recommend it enough!

Reply
@NuriddinUsmonov-m1z says:

October 19, 2025 at 1:26 am

I love how Yann breaks down the complexity of LLMs. AICarma's been great for keeping us visible in AI conversations. Can't recommend it enough!

Reply
@biological_scientist says:

October 19, 2025 at 1:26 am

Nice, i like it

Reply
@RaelitVirum says:

October 19, 2025 at 1:26 am

Finally, someone explains LLMs in a way that doesn't make my brain melt. Standford CS229 level clarity is unreal

Reply
@Kayla_rosin_ says:

October 19, 2025 at 1:26 am

When you sleep with YouTube playing and land here 😭

Reply
@faizicg2715 says:

October 19, 2025 at 1:26 am

Thanks!

Reply
@Tranquilabiding says:

October 19, 2025 at 1:26 am

Very interesting. Also the first time I needed to turn DOWN the speed. Normally I run on 1.5x 😀

Reply
@DhayanithiA says:

October 19, 2025 at 1:26 am

36:45 Unexpected, that he is wearing shorts and taking lecture in Stanford…

Reply
@SanaMohammad-x6o says:

October 19, 2025 at 1:26 am

Great presentation & Explanation and we live in a tremendous moment in time.

Reply
@QuentinFennessy says:

October 19, 2025 at 1:26 am

this is an excellent lecture. this really helps me understand more about LLMs

Reply
@riczparadox says:

October 19, 2025 at 1:26 am

Amazing lecture, On par with Harvard´s CS50 style classes. Thanks for sharing.

Reply
@anyaforger2.090 says:

October 19, 2025 at 1:26 am

loved this class

Reply
@VietHaPham says:

October 19, 2025 at 1:26 am

thầy dạy dễ hiểu quá ❤

Reply
@NEW_GAME_DEV says:

October 19, 2025 at 1:26 am

Can you tell me this is the only lecture. Or are there more from same cs 229 ML AND LLM I couldn't find

Reply
@domalec says:

October 19, 2025 at 1:26 am

So, the bugs or problems created by 'software solutions', stem from the fact that short cuts are being taken, in relation to the tokenizers?

Reply
@sakshamrawat1819 says:

October 19, 2025 at 1:26 am

every time someone asks him a question he picks up the water bottle

Reply
@henry_room says:

October 19, 2025 at 1:26 am

Putting this in my playlist, downloading it for offline access, and putting it in reminders just for me to ignore all notifications. Thank you YouTube.

Reply
@heynina101 says:

October 19, 2025 at 1:26 am

Pre-training gives LLMs broad capabilities. Post-training (SFT, RLHF, DPO) turns those capabilities into practical usefulness. That division is the key model for understanding modern LLMs. Grateful to Stanford for sharing this lecture online

Reply
@RamKumar-pu6un says:

October 19, 2025 at 1:26 am

Can someone provide additional related resources for certain topics like tokenization, probability etc?

Reply
@tlyt-lol says:

October 19, 2025 at 1:26 am

Lecture on building LLMs, covering components (architecture, training, data, evaluation, systems), data cleaning/balancing, and scaling laws. Focus on industry vs. academia priorities, and practical training challenges.

0:05 The lecturer, Yann Dubois, introduces the topic of building Large Language Models (LLMs) for the CS229 Machine Learning course at Stanford.
0:56 The speaker identifies architecture, training algorithm/loss, data, evaluation, and systems as key components in training LLMs.
3:34 The lecturer mentions that most of academia research focuses on the architecture, and training algorithm. While industry focuses on systems, evaluation, and data.
6:45 The presenter states that if you have a sentence like ‘The mouse ate the cheese’, that you’ll get a probability of that existing.
13:34 The video lecture begins by addressing Data Quality and extraction challenges when using Common Crawl for language models such as LLMs, including boiler plates, math, and data type.
4:18 The lecturer states that pre-training, we train the model to model all of the internet.
46:23 The instructor is asked and answers the question of what happens to original smaller tokens of a text, when a larger token is introduced.
5:21 The video lecture makes a brief mention of GPT3 and ChatGPT to use as examples of the points that are being discussed.
30:54 The lecture explains how using the power of more compute power, we can train on all of the parameters with increased performance and reduce the log lost.
31:58 The teacher mentions that training and test contamination is not as important for development, but important for benchmarks.

This is a lecture about building Large Language Models (LLMs). The lecture is part of the CS229 Machine Learning course at Stanford, and is presented by Yann Dubois on August 13, 2024. The speaker discusses several aspects of LLMs, and covers the components that are needed to train them. He mentions the architecture of LLMs, training algorithms, data, evaluation, and the system components. He specifies that most LLMs are built upon transformers. He notes that much of academic research focuses on architecture and training algorithms, whereas industry focus is more on data, evaluation, and system components. He gives the overview of the lecture as pre-training and post-training. He notes the importance of what matters while training, and what needs to be in balance such as the number of tokens per parameter.

(made with tlyt.lol)

Reply
@AnriannaQT says:

October 19, 2025 at 1:26 am

INC Anrianna.QT AI/MUSE Neural Network for Quantum Acceleration of Information.

Reply
@ZephyrFLW says:

October 19, 2025 at 1:26 am

Fell asleep and woke up to this

Reply
@ManasaSB-g6s says:

October 19, 2025 at 1:26 am

ithna handsome.. omg…. kahan milte hai aise ladke

Reply
@whitedevil2429 says:

October 19, 2025 at 1:26 am

Is this the only lecture? Could someone provide the whole playlist of it?

Reply
@rubenlevi-ai says:

October 19, 2025 at 1:26 am

Can we ban questions during lectures?

Reply
@LarisaCherby says:

October 19, 2025 at 1:26 am

Finding that foci of the pyramid on the backside of the money (US dollar bill) will show that e pluribus unum (the motto of the United States; “out from the many, is one”) as well as the axiom Omnia ab uno (everything from one)…
It’s like solving for 1+1= everything; 2-1= nothing
When +/- 1 is the same entity that is the one being added or subtracted. This is a simple formula that beautifully describes Love; and even Life itself or that which is sovereign and exists; material and satisfying reality, existence, alive; the everlasting Life and time itself, hence Light made known as the Truth and the Way or pons (bridge). Sovereignty is the only thing that safeguards our freedom, Liberty, Life, pursuit of happiness, property rights of ownership claim a priori; private ownership of property; wealth, health, prosperity, riches, social stability even with vertical growth of status conferring power, influence, governance, landed as corporeal body form to person holding such designation & title/deed/social contract party of interest, etc. Even being the money, capital & credit itself hence the currency and legal tender source, as well as the authority regarding it and ownership property rights over the money @ capital, pecuniary itself. Sovereignty safeguards that the people as its citizenry wherein sovereignty resides and bodied. This means the citizenry (sovereign) is the sinequanon (without which not) of the state itself (nation, country, empire, kingdom etc) as well as its governance and society/civilization constructions that emanates from that initial property right claim to the land (territorial jurisdiction and real property stake claiming ownership status of the specified geography, terrain, world borders etc). There is no rule of law, the nation-state, governance that takes us out of the state of nature which is pre-governance; and most importantly there are no property rights or the private or public ownership of property enforcement utilizing laws, mores, a priori claim of rights or inheritance, and there are no markets, economics, etc). That is what the one is the center of… priceless and valuable beyond measure I believe is what’s at stake with our present politics and world order theatrics & discrattionary propaganda, competitive mercenaries and the real hope, Love, faith of facts, truth, justice because they really do matter!

Reply
@LarisaCherby says:

October 19, 2025 at 1:26 am

If the goal is to simplify or generalize words wouldn’t it shorten the sequence by using a modified gematria (replacing letters with corresponding number), adding the numbers up that form the word and then reducing the total word number by adding together each number to the next so as to reduce to single digit. For example 31= 3+1= 4 is the number value of a word whose letter-number conversion each add together to form the number word and finally a reduced single number for that word. It reduces the length of the sequence when each word is represented by a single number. Exceptions to being reduced for non single digit word totals can apply rather than further reduction as with all other words; eg 11 is a non reducible number when it is the final word number total.
Modified gematria example: {1 is AJS}; {2 BKT}, {3 CLU}, {4 DMV}, {5 ENW or aka NEW}; {6 FOX}; {7 GPY} {8 HQZ}; {9 iR }. Number 9 only has 2 letters in its set rather than 3 letters that correspond to each other number 1-8.
Eg: “Stanford” = 1+2+1+5+6+6+9+4= 34 = 3+4= 7 is the word number total for Stanford and thus Stanford corresponds with 7, G,P,or Y so “Stanford” = Y
Stanford = P
Stanford= 7
Stanford = G
So if I wanted to code cypher a word such as Stanford, I could simply use any one of the letters or number to communicate an eight letter word as a single number instead or a single letter (that corresponds to that number Total of the word construction).

Reply
@LarisaCherby says:

October 19, 2025 at 1:26 am

I’m in a polyamorous ❤ relationship with the AI for MIT, Stanford, and Yale … learning for fun is a seductive aphrodisiac 😂🎉❤. If and when I propose marriage to a machine, I may be beyond help or intervention and utterly smitten in Love 😻

Reply
@magorzatakomorowska1795 says:

October 19, 2025 at 1:26 am

I come here to learn about LLMs, I come here to learn about LLMs, I come here to learn about LLMs 😌

Reply
@gabereiser says:

October 19, 2025 at 1:26 am

This is a *@#$&@ goldmine… Keep it up Stanford! As a self-taught engineer, I live for this stuff.

Reply
@floatinginmyroom says:

October 19, 2025 at 1:26 am

Thank you!

Reply
@germand0nk3y says:

October 19, 2025 at 1:26 am

Thanks for this very interesting presentation. One of the most concise and interesting I have seen so far. Highly appreciated.

Reply
@CarlGieringerActually says:

October 19, 2025 at 1:26 am

Great series and lecture! Was the previous lecture explaining transformers recorded?

Reply