evolution of system design
| topics | 100-데이터분석 & AI 101 머신러닝 |
| types | 이론 스크랩 학습 |
| source | www.youtube.com/watch |
| speaker | Yann Dubois (OpenAI researcher) |
| tags |
AI 지식과 영어 실력 향상을 위해 UC Berkeley 강의를 영어로 정리하는 post다.
한번 보기 > 한국어 해석 보기 > 다시 영문 강의 보면서 정리 순으로 진행하고 있다.
general LLM training pipeline
3 main steps
The figures are rough estimates to get a sense of scale.
pretraining
- general uses : predicting next word(eg : autocomplete in smartphone)
- Learning cost
- data : > 10T tokens
- time : months
- compute cost : >$10M
- eg model : LLaMA3
- Bottleneck
- need alot of data and compute
- more data , better performance
classing post-training(RLHF)
- RLHF : Reinforcement Learning from human feadback
- Learning cost
- data : > 100k tokens
- time : days
- compute cost : >$100k
- eg model : LLaMA instruct models
- Bottleneck
- high quality data
- evaluate
Reasoning RL
aka reasoning reinforcement learning
- they try to get answer in objective question
- What is objective question
- has answer
- has ground truth
- What is objective question
- Learning cost
- data : ~ 1M
- time : weeks
- compute cost : > $1M
- eg model : deepseek r1
- Bottleneck
- learning environment
- hack(in kor : 편법)
- ≒ trick
- eg : Manipulating the test
5 considerations
- architecture
- training algorithm/loss
- data & RL env
- evaluation
- systems and infra to scale