Basic Operations of Large Language Models

Here are some basic operations.

Methods for LLMs API Call#

How to Obtain OpenAI API#

OpenAI Official
https://openai.com/blog/openai-api
Register with a new phone number that is not from CN/HK to receive a $5 trial credit, but there are strict call frequency limits, less than 3 calls per minute.
Then, use a non-CN/HK Credit Card to recharge the credit and increase the call frequency limit.
Accounts may be sold on platforms like Xianyu or others, but be aware that trial accounts have call frequency limits.
3rd Party
There are some domestic service providers that offer OpenAI API proxy services. They can provide direct access within the country at prices much lower than the official ones, with no call frequency restrictions.
For example, ohmygpt.com, which requires modifying the corresponding base_url.

import openai
openai.base_url = "https://your-api-provider.com/v1/"

Other APIs#

Google’s Gemini Pro
Application address at https://ai.google.dev/
Advantages: Free, high call limits
Disadvantages: Strict risk control, making it difficult to access; not widely used, poor performance in Chinese
Together AI
Application address: Together AI, offers $25 free credit, can call most open-source models.
Advantages: Free to some extent, can validate using open-source models.
Disadvantages: Performance (only) somewhat inferior to gpt-3.5-turbo.
OpenRouter
Provides access to both open-source and closed-source models, including OpenAI API and Anthropic API, and accepts Visa and MasterCard payments.

Recommended LLMs Chat Service Providers#

ChatGPT
Claude
Poe
Coral | Cohere
HuggingChat
GroqChat

Some Choices for LLMs Models#

Ranking#

LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys
I find this ranking to be relatively credible and valuable. To summarize:
Claude 3 Opus == GPT-4 > GPT-4o > Llama3 70B > Qwen2 72B
For open-source models, those close to GPT-3.5 in performance include
Llama3 70B, Qwen2 72B
For smaller models, consider:
Qwen2 7B (Chinese and English), Llama3 8B

Using Open-Source Models#

Inference and Fine-tuning#

The state-of-the-art is HuggingFace's open-source library Transformers
The website huggingface.co can help with "Understanding how big of a model can fit on your machine," whether a model can run generally depends on its parameter count, for example, Llama-2-7b, which is a 7B parameter model.

Fine-tuning#

The state-of-the-art is hiyouga/LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs
It provides a large number of open-source models for LoRA and QLoRA fine-tuning. (Currently does not include multimodal models like Llava)
This content is only supported in a Feishu Docs

Inference Only (Quantization)#

The state-of-the-art is Ollama.
It provides a one-line command to install and a one-line command to run LLMs, and directly offers quantized models.
Refer to the library for available models, model recommendations are mentioned above.

Process Setup for Projects (Engineering)#

The state-of-the-art is LangChain

Introduction to Open-Source Models#

GPT2
A relatively small model, can be easily fine-tuned on RTX3090, is an economical choice, and is used in many papers.
Llama-3-8b
A reasonably large model, can be fine-tuned with Lora on RTX3090, a choice for most papers.
Qwen-2-7b
Supports Chinese, performance close to Llama3.
Phi-3-mini (3B)
A size of 3B, used for lightweight fine-tuning.