What is LoRA?
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for large language models. Instead of updating all parameters of a pre-trained model, LoRA adds trainable low-rank matrices to specific layers, significantly reducing the number of trainable parameters.
Key Benefits
- Memory Efficient: Requires ~0.5-1% of original model parameters
- Fast Training: Significantly faster than full fine-tuning
- Modular: Easy to combine multiple LoRA adapters
- Storage Friendly: Small adapter files (~MBs vs GBs)
Basic LoRA Concept
# Instead of updating W (full weight matrix)
# LoRA adds: W + ΔW where ΔW = A × B
# A: m × r matrix (trainable)
# B: r × n matrix (trainable)
# r << min(m, n) = low rank dimension