LoRA Notes

Low-Rank Adaptation guide and scripts

LoRA Fundamentals

MLXLoRAFine-tuning

What is LoRA?

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for large language models. Instead of updating all parameters of a pre-trained model, LoRA adds trainable low-rank matrices to specific layers, significantly reducing the number of trainable parameters.

Key Benefits

Memory Efficient: Requires ~0.5-1% of original model parameters
Fast Training: Significantly faster than full fine-tuning
Modular: Easy to combine multiple LoRA adapters
Storage Friendly: Small adapter files (~MBs vs GBs)

Quick Start Summary

LoRA enables efficient fine-tuning by adding small, trainable matrices to specific layers instead of updating all model parameters. This approach reduces memory usage by 99% while maintaining model performance.

Basic LoRA Concept

# Instead of updating W (full weight matrix)
# LoRA adds: W + ΔW where ΔW = A × B

# A: m × r matrix (trainable)
# B: r × n matrix (trainable)
# r << min(m, n) = low rank dimension

LoRA Architecture Diagram

Traditional Fine-tuning          vs          LoRA Fine-tuning

┌─────────────────┐              ┌─────────────────┐
│                 │              │                 │
│  Pre-trained    │              │  Pre-trained    │
│    Model W      │              │    Model W      │
│                 │              │                 │
└─────────────────┘              └─────────────────┘
          │                               │
          │                               │
          ▼                               ▼
┌─────────────────┐              ┌─────────────────┐
│                 │              │                 │
│  Update ALL     │              │  Add LoRA       │
│  parameters     │              │  adapters       │
│  (expensive)    │              │  (efficient)    │
└─────────────────┘              └─────────────────┘
                                       │
                                       ▼
                               ┌─────────────────┐
                               │  A × B = ΔW     │
                               │  (small matrices)│
                               └─────────────────┘

Result: W + ΔW = Fine-tuned Model

🔄 LoRA Matrix Decomposition Process

Loading diagram...

LoRA Training Pipeline

Loading diagram...