evolution of system design

topics	100-데이터분석 & AI 101 머신러닝
types	이론 스크랩 학습
source	www.youtube.com/watch
speaker	Yann Dubois (OpenAI researcher)
tags	#llm #rlhf #pretraining #reinforcement-learning #deepseek

AI 지식과 영어 실력 향상을 위해 UC Berkeley 강의를 영어로 정리하는 post다.
한번 보기 > 한국어 해석 보기 > 다시 영문 강의 보면서 정리 순으로 진행하고 있다.

general LLM training pipeline

The figures are rough estimates to get a sense of scale.

aka reasoning reinforcement learning

they try to get answer in objective question
- What is objective question
  - has answer
  - has ground truth
Learning cost
- data : ~ 1M
- time : weeks
- compute cost : > $1M
eg model : deepseek r1
Bottleneck
- learning environment
- hack(in kor : 편법)
  - ≒ trick
  - eg : Manipulating the test