evolution of system design

topics 100-데이터분석 & AI 101 머신러닝
types 이론 스크랩 학습
source www.youtube.com/watch
speaker Yann Dubois (OpenAI researcher)
tags

AI 지식과 영어 실력 향상을 위해 UC Berkeley 강의를 영어로 정리하는 post다.
한번 보기 > 한국어 해석 보기 > 다시 영문 강의 보면서 정리 순으로 진행하고 있다.

general LLM training pipeline

3 main steps

The figures are rough estimates to get a sense of scale.

pretraining

  • general uses : predicting next word(eg : autocomplete in smartphone)
  • Learning cost
    • data : > 10T tokens
    • time : months
    • compute cost : >$10M
  • eg model : LLaMA3
  • Bottleneck
    • need alot of data and compute
    • more data , better performance

classing post-training(RLHF)

  • RLHF : Reinforcement Learning from human feadback
  • Learning cost
    • data : > 100k tokens
    • time : days
    • compute cost : >$100k
  • eg model : LLaMA instruct models
  • Bottleneck
    • high quality data
    • evaluate

Reasoning RL

aka reasoning reinforcement learning

  • they try to get answer in objective question
    • What is objective question
      • has answer
      • has ground truth
  • Learning cost
    • data : ~ 1M
    • time : weeks
    • compute cost : > $1M
  • eg model : deepseek r1
  • Bottleneck
    • learning environment
    • hack(in kor : 편법)
      • ≒ trick
      • eg : Manipulating the test

5 considerations

  1. architecture
  2. training algorithm/loss
  3. data & RL env
  4. evaluation
  5. systems and infra to scale

관련 문서