For more information about Stanford’s Artificial Intelligence programs visit:

This lecture provides a concise overview of building a ChatGPT-like model, covering both pretraining (language modeling) and post-training (SFT/RLHF). For each component, it explores common practices in data collection, algorithms, and evaluation methods. This guest lecture was delivered by Yann Dubois in Stanford’s CS229: Machine Learning course, in Summer 2024.

Yann Dubois
PhD Student at Stanford

About the speaker: Yann Dubois is a fourth-year CS PhD student advised by Percy Liang and Tatsu Hashimoto. His research focuses on improving the effectiveness of AI when resources are scarce. Most recently, he has been part of the Alpaca team, working on training and evaluating language models more efficiently using other LLMs.

To view all online courses and programs offered by Stanford, visit:

Chapters:
00:00 – Introduction
00:10 – Recap on LLMs
00:16 – Definition of LLMs
00:19 – Examples of LLMs
01:16 – Importance of Data
01:20 – Evaluation Metrics
01:33 – Systems Component
01:41 – Importance of Systems
01:47 – LLMs Based on Transformers
01:57 – Focus on Key Topics
02:00 – Transition to Pretraining
03:02 – Overview of Language Modeling
04:17 – Generative Models Explained
05:15 – Autoregressive Models Definition
06:36 – Autoregressive Task Explanation
07:49 – Training Overview
08:48 – Tokenization Importance
10:50 – Tokenization Process
13:30 – Example of Tokenization
16:00 – Evaluation with Perplexity
20:50 – Current Evaluation Methods
24:30 – Academic Benchmark: MMLU

source

div style="text-align: center;">
37 thoughts on “Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)”
  1. Pre-training gives LLMs broad capabilities. Post-training (SFT, RLHF, DPO) turns those capabilities into practical usefulness. That division is the key model for understanding modern LLMs. Grateful to Stanford for sharing this lecture online

  2. Lecture on building LLMs, covering components (architecture, training, data, evaluation, systems), data cleaning/balancing, and scaling laws. Focus on industry vs. academia priorities, and practical training challenges.

    0:05 The lecturer, Yann Dubois, introduces the topic of building Large Language Models (LLMs) for the CS229 Machine Learning course at Stanford.
    0:56 The speaker identifies architecture, training algorithm/loss, data, evaluation, and systems as key components in training LLMs.
    3:34 The lecturer mentions that most of academia research focuses on the architecture, and training algorithm. While industry focuses on systems, evaluation, and data.
    6:45 The presenter states that if you have a sentence like ‘The mouse ate the cheese’, that you’ll get a probability of that existing.
    13:34 The video lecture begins by addressing Data Quality and extraction challenges when using Common Crawl for language models such as LLMs, including boiler plates, math, and data type.
    4:18 The lecturer states that pre-training, we train the model to model all of the internet.
    46:23 The instructor is asked and answers the question of what happens to original smaller tokens of a text, when a larger token is introduced.
    5:21 The video lecture makes a brief mention of GPT3 and ChatGPT to use as examples of the points that are being discussed.
    30:54 The lecture explains how using the power of more compute power, we can train on all of the parameters with increased performance and reduce the log lost.
    31:58 The teacher mentions that training and test contamination is not as important for development, but important for benchmarks.

    This is a lecture about building Large Language Models (LLMs). The lecture is part of the CS229 Machine Learning course at Stanford, and is presented by Yann Dubois on August 13, 2024. The speaker discusses several aspects of LLMs, and covers the components that are needed to train them. He mentions the architecture of LLMs, training algorithms, data, evaluation, and the system components. He specifies that most LLMs are built upon transformers. He notes that much of academic research focuses on architecture and training algorithms, whereas industry focus is more on data, evaluation, and system components. He gives the overview of the lecture as pre-training and post-training. He notes the importance of what matters while training, and what needs to be in balance such as the number of tokens per parameter.

    (made with tlyt.lol)

  3. Finding that foci of the pyramid on the backside of the money (US dollar bill) will show that e pluribus unum (the motto of the United States; “out from the many, is one”) as well as the axiom Omnia ab uno (everything from one)…
    It’s like solving for 1+1= everything; 2-1= nothing
    When +/- 1 is the same entity that is the one being added or subtracted. This is a simple formula that beautifully describes Love; and even Life itself or that which is sovereign and exists; material and satisfying reality, existence, alive; the everlasting Life and time itself, hence Light made known as the Truth and the Way or pons (bridge). Sovereignty is the only thing that safeguards our freedom, Liberty, Life, pursuit of happiness, property rights of ownership claim a priori; private ownership of property; wealth, health, prosperity, riches, social stability even with vertical growth of status conferring power, influence, governance, landed as corporeal body form to person holding such designation & title/deed/social contract party of interest, etc. Even being the money, capital & credit itself hence the currency and legal tender source, as well as the authority regarding it and ownership property rights over the money @ capital, pecuniary itself. Sovereignty safeguards that the people as its citizenry wherein sovereignty resides and bodied. This means the citizenry (sovereign) is the sinequanon (without which not) of the state itself (nation, country, empire, kingdom etc) as well as its governance and society/civilization constructions that emanates from that initial property right claim to the land (territorial jurisdiction and real property stake claiming ownership status of the specified geography, terrain, world borders etc). There is no rule of law, the nation-state, governance that takes us out of the state of nature which is pre-governance; and most importantly there are no property rights or the private or public ownership of property enforcement utilizing laws, mores, a priori claim of rights or inheritance, and there are no markets, economics, etc). That is what the one is the center of… priceless and valuable beyond measure I believe is what’s at stake with our present politics and world order theatrics & discrattionary propaganda, competitive mercenaries and the real hope, Love, faith of facts, truth, justice because they really do matter!

  4. If the goal is to simplify or generalize words wouldn’t it shorten the sequence by using a modified gematria (replacing letters with corresponding number), adding the numbers up that form the word and then reducing the total word number by adding together each number to the next so as to reduce to single digit. For example 31= 3+1= 4 is the number value of a word whose letter-number conversion each add together to form the number word and finally a reduced single number for that word. It reduces the length of the sequence when each word is represented by a single number. Exceptions to being reduced for non single digit word totals can apply rather than further reduction as with all other words; eg 11 is a non reducible number when it is the final word number total.
    Modified gematria example: {1 is AJS}; {2 BKT}, {3 CLU}, {4 DMV}, {5 ENW or aka NEW}; {6 FOX}; {7 GPY} {8 HQZ}; {9 iR }. Number 9 only has 2 letters in its set rather than 3 letters that correspond to each other number 1-8.
    Eg: “Stanford” = 1+2+1+5+6+6+9+4= 34 = 3+4= 7 is the word number total for Stanford and thus Stanford corresponds with 7, G,P,or Y so “Stanford” = Y
    Stanford = P
    Stanford= 7
    Stanford = G
    So if I wanted to code cypher a word such as Stanford, I could simply use any one of the letters or number to communicate an eight letter word as a single number instead or a single letter (that corresponds to that number Total of the word construction).

  5. I’m in a polyamorous ❤ relationship with the AI for MIT, Stanford, and Yale … learning for fun is a seductive aphrodisiac 😂🎉❤. If and when I propose marriage to a machine, I may be beyond help or intervention and utterly smitten in Love 😻

Leave a Reply

Your email address will not be published. Required fields are marked *