Here are some basic operations.
Methods for LLMs API Call#
Obtaining OpenAI API#
- OpenAI Official
https://openai.com/blog/openai-api
Register with a non-CN/HK phone number to get a $5 trial credit, but there are strict rate limits, with less than 3 calls per minute.
Then recharge the credit with a non-CN/HK Credit Card to increase the rate limits.
There may be accounts available for sale on platforms like Xianyu, but note that trial accounts have rate limits. - 3rd Party
There are some service providers in China that provide OpenAI API transit services. They can provide domestic direct services at a much lower price than the official price, with no rate limits.
For example, ohmygpt.com, you need to modify the corresponding base_url.
import openai
openai.base_url = "https://your-api-provider.com/v1/"
Other APIs#
- Google's Gemini Pro
The application address is at https://ai.google.dev/
Advantages: Free, high call limit
Disadvantages: Strict risk control, difficult to access; not widely used, poor performance in Chinese - Together AI
Application address: Together AI, provides $25 free credit, can call most open source models.
Advantages: Partially free, can be used to verify open source models.
Disadvantages: Performance is partially inferior to gpt-3.5-turbo. - OpenRouter
Provides calls to open source and closed source models, including OpenAI API and Anthropic API, and accepts Visa, MasterCard payments.
Some choices for LLMs models#
Ranking#
LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys
I think this ranking is more reliable and valuable. To summarize:
Claude 3 Opus > GPT-4 > Claude 3 Sonet > GPT-3.5 == Claude 3 Haiku
For open source models, those that perform similarly to GPT-3.5 are
Qwen-72B, Comand R(35B), Starling-LM(7B), Mixtral-8x7b(MoE)
(Qwen/Qwen1.5-MoE-A2.7B-Chat is an interesting small MoE model recently.)
Using open source models#
Inference and fine-tuning#
The SOTA is HuggingFace's open source library Transformers
huggingface.co This website can help "Understanding how big of a model can fit on your machine", whether a model can run generally depends on its number of parameters, such as Llama-2-7b, which is a 7B parameter model.
Fine-tuning#
The SOTA is hiyouga/LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs
Provides ways to fine-tune a large number of open source models, such as LoRA and QLoRA. (Currently does not include multimodal models like Llava)
This content is only supported in a Feishu Docs
Inference only (quantization)#
The SOTA is Ollama.
It provides one-line command installation, one-line command running instructions for LLMs, and directly provides quantized models.
Refer to the available models in the library, and see the recommended models above.
Workflow construction for projects (engineering)#
The SOTA is LangChain
Introduction to open source models#
- GPT2
A relatively small model, can be easily fine-tuned with RTX3090, it is an economical choice and is also used in many papers. - Llama-2-7b
A reasonable large model, can also be fine-tuned with RTX3090 using Lora, the choice of most papers. - Mixtral 8x7B
A model based on MoE, very powerful, surpasses GPT-3.5 in many rankings. - Vicuna 7B
A model based on Llama for instruction fine-tuning, it seems that many models are based on it for instruction fine-tuning. (Perhaps because there are more examples, it is easier to use it for instruction fine-tuning).