[May 29] Init doc
Available LLM Fine-tuning Frameworks#
- LLaMA-Factory
- xtuner
- unsloth
Brief Introduction#
- Llama_Factory offers the most fine-tuning methods, with many from the latest academic papers, including LongLora, etc.; the latest framework, including Unsloth.
- Xtuner provides relatively rich documentation and many optimization tips, but the fine-tuning techniques are quite limited, only basic Lora and QLora.
- Unsloth offers decent documentation but also provides only a small number of fine-tuning options.
- If your requirements are simple, such as fine-tuning a short dialogue instruction dataset (e.g., alpaca) on a common model like Llama3 with 24G of GPU memory, any of the above libraries can be used.
General Steps#
Creating the Dataset#
Datasets can generally be divided into two types based on format: alpaca and sharegpt.
According to the type of fine-tuning, they can be divided into Supervised Fine-Tuning Dataset and Pretraining Dataset, the former for instruction fine-tuning dialogue purposes, and the latter for incremental pre-training.
For methods of creating datasets, you can fully refer to LLaMA-Factory/data/README.md at main · hiyouga/LLaMA-Factory.
Choosing Fine-tuning Techniques#
The most basic fine-tuning method is Lora; if you want to use less GPU memory, you can use QLora, where Q means Quantized.
If there are long sequence requirements but only limited GPU memory, consider Unsloth + Flash Attention 2.
Llama_factory offers a wide variety of fine-tuning techniques to choose from.
Follow the Framework Documentation#
Common Fine-tuning Techniques#
- RoPE Scaling
- It supports fine-tuning of arbitrary lengths; for example, Llama3 is pre-trained only at 8K length, but it can be fine-tuned at any length using this.
- FlashAttention
- Reduces training time and GPU memory usage.
Solutions to encountered problems:
- Issues in the repo