软件源更新

cp /etc/apt/sources.list /etc/apt/sources.list.backup

#替换为以下内容
# 阿里云镜像源 for Ubuntu 22.04 (Jammy)
deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse

驱动更新

sudo apt update
sudo ubuntu-drivers devices

#按照推荐更新
sudo ubuntu-drivers autoinstall

#或者指定驱动
sudo apt install nvidia-driver-535
#安装多显卡并行支持
sudo apt install libnccl2 libnccl-dev

安装必要工具

apt install lshw vim

jupyter(可选）

pip install jupyterlab
jupyter lab --ip=0.0.0.0 --port=17733

配置 JupyterLab

1. 生成配置文件

jupyter lab --generate-config

配置文件位置：~/.jupyter/jupyter_lab_config.py

2. 设置密码（可选）

jupyter lab password

3. 常用配置修改

编辑配置文件：

nano ~/.jupyter/jupyter_lab_config.py

常用配置项：

# 允许所有IP访问
c.ServerApp.ip = '0.0.0.0'
# 设置端口
c.ServerApp.port = 8888
# 不自动打开浏览器
c.ServerApp.open_browser = False
# 设置工作目录
c.ServerApp.root_dir = '/home/username/projects'
# 允许远程访问
c.ServerApp.allow_remote_access = True

安装扩展

1. 安装扩展管理器

pip install jupyterlab-lsp
# 或
conda install -c conda-forge jupyterlab-lsp

2. 安装常用扩展

# 代码格式化
pip install jupyterlab_code_formatter

# 目录增强
pip install jupyterlab-git

# 主题
pip install jupyterlab-theme-solarized-dark

设置开机自启动（可选）

创建 systemd 服务：

sudo nano /etc/systemd/system/jupyterlab.service

添加以下内容：

[Unit]
Description=Jupyter Lab

[Service]
Type=simple
User=你的用户名
WorkingDirectory=/home/你的用户名
ExecStart=/home/你的用户名/jupyter_env/venv/bin/jupyter lab --config=/home/你的用户名/.jupyter/jupyter_lab_config.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

启用服务：

sudo systemctl daemon-reload
sudo systemctl enable jupyterlab
sudo systemctl start jupyterlab

访问 JupyterLab

启动后，在浏览器中访问：

本地：http://localhost:8888
远程：http://你的IP地址:8888

vllm

安装vllm

pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly

开启 API 服务

vllm serve Qwen/Qwen3-8B

llama.cpp

安装llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON    # NVIDIA GPU
cmake --build build --config Release

如果遇到gcc与cuda的兼容性问题可以考虑降级使用gcc10编译

cmake -B build -DGGML_CUDA=ON -DCMAKE_C_COMPILER=gcc-10 -DCMAKE_CXX_COMPILER=g++-10
cmake --build build --config Release

如果提示：CUDA Toolkit not found，则需要安装cuda工具

sudo apt-get update
sudo apt-get install nvidia-cuda-toolkit

运行服务

llama-cli

llama-cli 是一个可用于与大型语言模型聊天的控制台程序。只需在放置 llama.cpp 程序的位置运行以下命令

# 下载GGUF模型并运行
./llama-cli \
-hf Qwen/Qwen3-8B-GGUF:Q8_0 \
--jinja \
--color \
-ngl 99 \
-fa \
-sm row \
--temp 0.6 \
--top-k 20 \
--top-p 0.95 \
--min-p 0 \
-c 40960 \
-n 32768 \
--no-context-shift

llama-server

llama-server 是一个简单的 HTTP 服务器，包含一组 LLM REST API 和一个简单的 Web 前端，用于通过 llama.cpp 与大型语言模型交互。

# 启动服务
nohup ./llama-server \
  -m Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.Q8_0.gguf \
  --host 0.0.0.0 \
  --port 12345 \
  -c 262144 \   # 256K上下文
  -b 1024 \     # 最大batch size
  --ubatch-size 512 \   # 较大unbatch size提升吞吐
  -t 14 \      # 使用所有CPU线程
  -ngl -1 \    # 所有层放GPU
  -fa on \     # Flash Attention开启
  --flash-attn 1 \    # 确保Flash Attention
  --cache-type-k q8_0 \
  --no-mmap \      # 禁用mmap，提升加载速度
  --mlock \        # 锁定内存，避免swap
  --cont-batching \
  --batch-size 512 \
  --tensor-split 12,12 \
  --split-mode layer \
  > /tmp/llama-server.log 2>&1 &

默认情况下，服务器将在 http://localhost:8080 监听，可以通过传递 --host 和 --port 更改。Web 前端可以通过浏览器访问 http://localhost:8080/。兼容 OpenAI 的 API 位于 http://localhost:8080/v1/。

测试连接

openai格式

 curl -X POST http://123.23.23.23:12345/v1/completions   -H "Content-Type: application/json"   -d '{
      "prompt": "Hello, how are you?",
      "max_tokens": 50
  }'

ollama格式

curl http://123.23.23.23:12345/api/tags

Anthropic格式

 curl -X POST http://123.23.23.23:12345/v1/messages   -H "Content-Type: application/json"   -d '{
      "prompt": "Hello, how are you?",
      "max_tokens": 50,
      "messages": [
      {"role": "user", "content": "Write a Python function to check if a number is prime"}
    ]
  }'

导入到编程工具

codex配置config.toml

###如果遇到type' of tool must be 'function可以加下面这句
web_search = "disabled" 

model_provider = "ughostx"
model_reasoning_effort = "high"
model = "Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.Q8_0.gguf"
 
[model_providers.ughostx]
name = "ughostx"
base_url = "http://23.23.23.23:12345/v1"
wire_api = "responses"

claude code 配置setting.json

{
  "env": {
    "ANTHROPIC_API_KEY": "sk-no-key-required",
    "ANTHROPIC_BASE_URL": "http://23.23.23.23:12345",
    "ANTHROPIC_MODEL": "Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.Q8_0.gguf",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1"
  },
  "language": "简体中文"
}

ai部署相关教程