Basic Operations of Large Language Models

Here are some basic operations.

Methods for LLMs API Call#

Obtaining OpenAI API#

  • OpenAI Official
    Register with a non-CN/HK phone number to get a $5 trial credit, but there are strict rate limits, with less than 3 calls per minute.
    Then recharge the credit with a non-CN/HK Credit Card to increase the rate limits.
    There may be accounts available for sale on platforms like Xianyu, but note that trial accounts have rate limits.
  • 3rd Party
    There are some service providers in China that provide OpenAI API transit services. They can provide domestic direct services at a much lower price than the official price, with no rate limits.
    For example,, you need to modify the corresponding base_url.
import openai
openai.base_url = ""

Other APIs#

  • Google's Gemini Pro
    The application address is at
    Advantages: Free, high call limit
    Disadvantages: Strict risk control, difficult to access; not widely used, poor performance in Chinese
  • Together AI
    Application address: Together AI, provides $25 free credit, can call most open source models.
    Advantages: Partially free, can be used to verify open source models.
    Disadvantages: Performance is partially inferior to gpt-3.5-turbo.
  • OpenRouter
    Provides calls to open source and closed source models, including OpenAI API and Anthropic API, and accepts Visa, MasterCard payments.

Some choices for LLMs models#


LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys
I think this ranking is more reliable and valuable. To summarize:
Claude 3 Opus > GPT-4 > Claude 3 Sonet > GPT-3.5 == Claude 3 Haiku
For open source models, those that perform similarly to GPT-3.5 are
Qwen-72B, Comand R(35B), Starling-LM(7B), Mixtral-8x7b(MoE)
(Qwen/Qwen1.5-MoE-A2.7B-Chat is an interesting small MoE model recently.)

Using open source models#

Inference and fine-tuning#

The SOTA is HuggingFace's open source library Transformers This website can help "Understanding how big of a model can fit on your machine", whether a model can run generally depends on its number of parameters, such as Llama-2-7b, which is a 7B parameter model.


The SOTA is hiyouga/LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs
Provides ways to fine-tune a large number of open source models, such as LoRA and QLoRA. (Currently does not include multimodal models like Llava)
This content is only supported in a Feishu Docs

Inference only (quantization)#

The SOTA is Ollama.
It provides one-line command installation, one-line command running instructions for LLMs, and directly provides quantized models.
Refer to the available models in the library, and see the recommended models above.

Workflow construction for projects (engineering)#

The SOTA is LangChain

Introduction to open source models#

  • GPT2
    A relatively small model, can be easily fine-tuned with RTX3090, it is an economical choice and is also used in many papers.
  • Llama-2-7b
    A reasonable large model, can also be fine-tuned with RTX3090 using Lora, the choice of most papers.
  • Mixtral 8x7B
    A model based on MoE, very powerful, surpasses GPT-3.5 in many rankings.
  • Vicuna 7B
    A model based on Llama for instruction fine-tuning, it seems that many models are based on it for instruction fine-tuning. (Perhaps because there are more examples, it is easier to use it for instruction fine-tuning).
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.