Skip to main content

Command Palette

Search for a command to run...

Understanding large language models

Build a Large Language Model: Chapter 1

Updated
โ€ข4 min read
Understanding large language models
๐Ÿ’ก
์ด ๊ธ€์€ Build a Large Language Model์„ ์ฝ๊ณ  ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

This chapter covers

  • ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(Large Language Models, LLM)์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์— ๋Œ€ํ•œ ๊ณ ์ˆ˜์ค€ ์„ค๋ช…

  • ChatGPT์™€ ๊ฐ™์€ LLM ๋ชจ๋ธ์ด ์‚ฌ์šฉํ•˜๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ํ†ต์ฐฐ

  • LLM์„ ๋ฐ‘๋ฐ”๋‹ฅ ๋ถ€ํ„ฐ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ๊ณ„ํš

1.1. What is an LLM?

  • ์ธ๊ฐ„์ด ํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ํ…์ŠคํŠธ๋ฅผ ์ดํ•ด/์ƒ์„ฑ/์‘๋‹ตํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์‹ ๊ฒฝ๋ง

1.2. Applications of LLMs

  • ํ…์ŠคํŠธ ๋‹ค๋ฃจ๋Š” ๋ชจ๋“  ์ž‘์—…์„ ์ž๋™ํ™”ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•จ

1.3. Stages of building and using LLMs

  • LLM ๋Œ€๋ถ€๋ถ„ PyTorch๋กœ ๊ตฌํ˜„ํ•˜๋Š”๊ฒŒ De facto

  • ํŠน์ • ๋„๋ฉ”์ธ ์šฉ๋„๋กœ ์„ค๊ณ„๋œ LLM์€ ChatGPT์™€ ๊ฐ™์€ Foundation model๋ณด๋‹ค ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ž„

  • LLM ๋งŒ๋“œ๋Š” ๊ณผ์ •

    • Pretraining (์‚ฌ์ „ ํ›ˆ๋ จ)

      • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ฒ˜์Œ ํ›ˆ๋ จํ•˜๋Š” ์ดˆ๊ธฐ ๋‹จ๊ณ„

      • ์‚ฌ์ „ ํ›ˆ๋ จ ์ดํ›„ Foundation Model์ด ๋จ, ์—ฌ๊ธฐ์„œ ์ถ”๊ฐ€๋กœ Fine-tuning ๊ฐ€๋Šฅ

      • Text Completion ๊ฐ€๋Šฅ

        • Input: โ€œThe weather today is reallyโ€

        • output: โ€œnice and sunny.โ€

      • Few-shot Capabilities ๋ณด์œ 

      •         English: Hello
                Korean: ์•ˆ๋…•ํ•˜์„ธ์š”
        
                English: Thank you
                Korean: ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค
        
                English: Good morning
                Korean: [๋ชจ๋ธ์ด "์ข‹์€ ์•„์นจ์ž…๋‹ˆ๋‹ค" ์ƒ์„ฑ]
        
      • Foundation Model = Base Model (์‚ฌ์ „ ํ›ˆ๋ จ๋งŒ ์™„๋ฃŒ๋œ ์ƒํƒœ)

      • Foundation Model + Fine-tuning = Fine-tuned Model

        • e.g. GPT-4
      • ๊ทผ๋ฐ ๋งˆ์ผ€ํŒ… ๋•Œ๋ฌธ์ธ์ง€ GPT ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ๋ชจ๋ธ๋„ Foundation Model์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๋“ฏ

    • Fine-tuning (๋ฏธ์„ธ ์กฐ์ •): LLM์„ ๋ผ๋ฒจ์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ๋กœ ์ถ”๊ฐ€ ํ›ˆ๋ จ์‹œํ‚ด

    • Fine-tuning ์ดํ›„ RLHF๋„ ๊ฐ€๋Šฅ (๊ฐ•ํ™” ํ•™์Šต)

      • ์ธ๊ฐ„์ด ๋งค๊ธด ์„ ํ˜ธ๋„ ์ ์ˆ˜๋ฅผ ๋ณด์ƒ์œผ๋กœ ์‚ฌ์šฉํ•จ

      • ๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ bias๊ฐ€ ๋†’์•„์ง€๊ณ  ์„ฑ๋Šฅ์ด ๋” ๋‚ฎ์•„์งˆ ์ˆ˜๋„ ์žˆ์„ ๋“ฏ

      • GPT๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์—์„œ ์œค๋ฆฌ์  ๊ธฐ์ค€์„ ์ค€์ˆ˜ํ•˜๋„๋ก ์กฐ์ •๋จ

1.4. Introducing the transformer architecture

  • ํ˜„๋Œ€ LLM์€ โ€œAttention Is All You Needโ€์—์„œ ์†Œ๊ฐœ๋œ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜์— ๊ธฐ๋ฐ˜ํ•˜๋Š” ๊ฒŒ De facto

    • ํ•ด๋‹น ๋…ผ๋ฌธ์„ ๊ธฐ์ ์œผ๋กœ Attention์„ ์“ฐ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋Œ€๋ถ€๋ถ„์€ Self-Attention ๋ฐฉ์‹์„ ์ฑ„ํƒ
  • Transformer Architecture๋Š” ์ธ์ฝ”๋”, ๋””์ฝ”๋”๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์„œ๋ธŒ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋จ

    • ๋‘˜ ๋‹ค Self-Attention mechanism์— ์˜ํ•ด ์—ฐ๊ฒฐ๋œ ์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ๊ตฌ์„ฑ

    • Self-Attention: ๋ฌธ์žฅ ๋‚ด์˜ ๊ฐ ๋‹จ์–ด๊ฐ€ ๊ฐ™์€ ๋ฌธ์žฅ์˜ ๋‹ค๋ฅธ ๋ชจ๋“  ๋‹จ์–ด๋“ค๊ณผ ์–ผ๋งˆ๋‚˜ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€๋ฅผ ๊ณ„์‚ฐ

  • Transformer Architecture ํ›„์† ๋ณ€ํ˜• (BERT, GPT)

    • BERT: ๋‹จ์–ด ์˜ˆ์ธก ํŠนํ™”, GPT: ํ…์ŠคํŠธ ์ƒ์„ฑ ํŠนํ™”

    • ๊ทผ๋ฐ ๊ฐ‘์ž๊ธฐ GPT๊ฐ€ ์˜ˆ์ธก๋„ ์ž˜ ํ•จ !!

1.5. Utilizing large datasets

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ทœ๋ชจ์™€ ๋‹ค์–‘์„ฑ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•˜๋Š” ์ง€ ๊ถ๊ธˆํ•จ

    • Constitutional AI(CAI) ํ›ˆ๋ จ์œผ๋กœ ๊ทน๋‹จ์  bias๋‚˜ misinformation ๋“ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ

    • Self-Supervised Learning ํ•™์Šต ๋ฐฉ์‹ ์‚ฌ์šฉ ์‹œ ์ธ๊ฐ„์ด ์–ด๋–ค ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•  ์ง€๋งŒ ๊ฒฐ์ •ํ•˜๋ฉด ๋จ

    • ๊ทผ๋ฐ ๊ฒฐ๊ตญ ์—ฌ๊ธฐ์„œ ์ธ๊ฐ„์ด ๋ฐ์ดํ„ฐ ์„ ์ •ํ•˜๋ฉด ํŽธํ–ฅ์ด ์žˆ์„ ์ˆ˜ ๋ฐ–์— ์—†์ง€ ์•Š๋‚˜? ๊ทธ ์ „์—, ๋ฌด์—‡์„ ๋ฌธ์ œ๋ผ๊ณ  ์ •์˜ํ•ด์•ผ ํ•˜๋Š”๊ฐ€? ์ธ๊ฐ„์ด ๋ฌด์—‡์„ ๋ฌธ์ œ๋ผ๊ณ  ์ •์˜ํ•˜๋Š” ๊ณผ์ •์—์„œ๋„ ํŽธํ–ฅ์ด ์žˆ์„ ์ˆ˜ ์žˆ์ง€ ์•Š๋‚˜?

1.6. A closer look at the GPT architecture

  • GPT(Generative Pretrained Transformer)๋Š” ๋‹ค์Œ ๋‹จ์–ด ์˜ˆ์ธก์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋จ

  • ๋ณธ์งˆ์ ์œผ๋กœ, ์ธ์ฝ”๋” ์—†์ด ๋””์ฝ”๋” ๋ถ€๋ถ„๋งŒ์„ ์‚ฌ์šฉํ•จ

  • ํ…์ŠคํŠธ๋ฅผ ํ•œ ๋ฒˆ์— ํ•œ ๋‹จ์–ด์”ฉ ์˜ˆ์ธกํ•˜์—ฌ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ž๋™ ํšŒ๊ท€ ๋ชจ๋ธ(auto-regressive model)๋กœ ๊ฐ„์ฃผ๋จ

    • ์ž๋™ ํšŒ๊ท€ ๋ชจ๋ธ์€ ์ด์ „ ์ถœ๋ ฅ์„ ๋ฏธ๋ž˜ ์˜ˆ์ธก์˜ ์ž…๋ ฅ์œผ๋กœ ํฌํ•จํ•จ

    • ๊ฐ ์ƒˆ๋กœ์šด ๋‹จ์–ด๊ฐ€ ์ด์ „ ์‹œํ€€์Šค๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒ๋˜์–ด ๊ฒฐ๊ณผ ํ…์ŠคํŠธ์˜ coherence์„ ํ–ฅ์ƒ์‹œํ‚ด

    •         ์ž…๋ ฅ: "์˜ค๋Š˜ [MASK] ์ •๋ง ์ข‹๋‹ค"
      
              GPT (์ž๋™ํšŒ๊ท€): "์˜ค๋Š˜" โ†’ "๋‚ ์”จ๊ฐ€" ์˜ˆ์ธก (์ˆœ์ฐจ์ )
              BERT: "์˜ค๋Š˜ + ์ •๋ง + ์ข‹๋‹ค" โ†’ "[MASK]" ์˜ˆ์ธก (์–‘๋ฐฉํ–ฅ)
      
    • ๊ทธ๋Ÿผ ์ด์ „ ์‹œํ€€์Šค๊ฐ€ ๊ธธ ์ˆ˜๋ก ๋ฉ”๋ชจ๋ฆฌ, ํ† ํฐ ๋น„์šฉ ์ด์Šˆ๊ฐ€ ์žˆ์ง€ ์•Š๋‚˜?

      • ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ = ๋ชจ๋ธ์ด ํ•œ ๋ฒˆ์— ๊ธฐ์–ตํ•  ์ˆ˜ ์žˆ๋Š” ํ† ํฐ ์ˆ˜

      • ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ = ์ตœ๊ทผ N๊ฐœ ํ† ํฐ๋งŒ ๋ณด๊ธฐ

      • ์–ดํ…์…˜ ํšจ์œจํ™” - Sparse attention, Linear attention ๋“ฑ์œผ๋กœ ๊ณ„์‚ฐ๋Ÿ‰ ์ค„์ด๊ธฐ

      • ๊ณ„์ธต์  ์ฒ˜๋ฆฌ - ์ค‘์š”ํ•œ ๋ถ€๋ถ„๋งŒ ์„ ๋ณ„ํ•ด์„œ ์••์ถ•

      • ๋ชจ๋“  Transformer๋Š” attention์„ ์‚ฌ์šฉํ•˜๊ณ , attention์ด O(nยฒ) ๋ณต์žก๋„๋ฅผ ๊ฐ€์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ/๊ณ„์‚ฐ ์ œ์•ฝ์œผ๋กœ ์ธํ•œ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๊ฐ€ ํ•„์š”

    • Few-Shot ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ

1.7. Building a large language model

  • ์ด์ œ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” GPT์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด๋ฅผ ์ฒญ์‚ฌ์ง„์œผ๋กœ ์‚ผ์•„ ์„ธ ๋‹จ๊ณ„๋ฅผ ๋‹ค๋ฃฐ ๊ฒƒ

    • ๊ธฐ๋ณธ์ ์ธ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„

    • GPT ์œ ์‚ฌ LLM ์ฝ”๋”ฉ, LLM ํ‰๊ฐ€

    • ์งˆ์˜์‘๋‹ต์ด๋‚˜ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ์ง€์‹œ๋ฅผ ๋”ฐ๋ฅด๋„๋ก Fine-tuning

Summary

  • ํ˜„๋Œ€ LLM์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ํ›ˆ๋ จ

    • ๋จผ์ €, ๋ฌธ์žฅ ๋‚ด ๋‹ค์Œ ๋‹จ์–ด ์˜ˆ์ธก์„ "๋ผ๋ฒจ"๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ผ๋ฒจ์ด ์—†๋Š” ๋Œ€๊ทœ๋ชจ ํ…์ŠคํŠธ ์ฝ”ํผ์Šค์—์„œ ์‚ฌ์ „ ํ›ˆ๋ จ

    • ๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋” ์ ์€ ์ˆ˜์˜ ๋ผ๋ฒจ์„ ๊ฐ€์ง„ ์ง€์ •๋œ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ง€์‹œ๋ฅผ ๋”ฐ๋ฅด๊ฑฐ๋‚˜ ๋ถ„๋ฅ˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๋ฏธ์„ธ ์กฐ์ •

  • LLM์€ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜ ๊ธฐ๋ฐ˜

  • ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ, ์ด๋Š” LLM์ด ์ถœ๋ ฅ์„ ํ•œ ๋‹จ์–ด์”ฉ ์ƒ์„ฑํ•  ๋•Œ ์ „์ฒด ์ž…๋ ฅ ์‹œํ€€์Šค์— ์„ ํƒ์ ์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์คŒ

  • ๋งž์ถคํ˜• ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ฏธ์„ธ ์กฐ์ •๋œ LLM์€ ํŠน์ • ์ž‘์—…์—์„œ ์ผ๋ฐ˜ LLM๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ์Œ