rayandrew/ai-servers

Fork 0

Ray Andrew 964f9e43cd

feat: replace LiteLLM with new-api for LLM proxy and add monitoring stack

2026-02-14 12:54:58 -06:00

4.2 KiB

Raw Blame History

new-api Channel Configuration

After first start, access the new-api web UI at http://<server>:4000 to configure channels.

Default admin credentials: root / 123456 — change immediately.

API Token for Open WebUI

Create an API token in new-api's token management. Use this token as OPENWEBUI_API_KEY in .env.

Channels to Create

Configure each channel via Channels > Add Channel in the web UI.

1. DeepInfra (Priority 1)

Field	Value
Name	DeepInfra
Type	OpenAI
Base URL	`https://api.deepinfra.com/v1/openai`
Key	`$DEEPINFRA_API_KEY`
Priority	1
Models	See model mapping below

2. SiliconFlow (Priority 2)

Field	Value
Name	SiliconFlow
Type	OpenAI
Base URL	`https://api.siliconflow.com/v1`
Key	`$SILICONFLOW_API_KEY`
Priority	2
Models	See model mapping below

3. OpenRouter (Priority 3)

Field	Value
Name	OpenRouter
Type	OpenAI
Base URL	`https://openrouter.ai/api/v1`
Key	`$OPENROUTER_API_KEY`
Priority	3
Models	See model mapping below

4. Groq (Priority 1)

Field	Value
Name	Groq
Type	OpenAI
Base URL	`https://api.groq.com/openai/v1`
Key	`$GROQ_API_KEY`
Priority	1
Models	`llama-3.3-70b`

5. Cerebras (Priority 1)

Field	Value
Name	Cerebras
Type	OpenAI
Base URL	`https://api.cerebras.ai/v1`
Key	`$CEREBRAS_API_KEY`
Priority	1
Models	`llama-3.3-70b-cerebras`

Model Mapping per Channel

new-api uses model aliasing: the "model name" is what clients see, the "actual model" is what's sent to the provider.

DeepInfra Models

Client Model Name	Actual Provider Model
`deepseek-v3.2`	`deepseek-ai/DeepSeek-V3.2`
`deepseek-r1`	`deepseek-ai/DeepSeek-R1`
`gpt-oss`	`openai/gpt-oss-120b`
`gpt-oss-20b`	`openai/gpt-oss-20b`
`nemotron-super`	`nvidia/Llama-3.3-Nemotron-Super-49B-v1.5`
`nemotron-nano`	`nvidia/NVIDIA-Nemotron-Nano-9B-v2`
`devstral`	`mistralai/Devstral-Small-2505`
`glm-4.6`	`zai-org/GLM-4.6`
`glm-4.7`	`zai-org/GLM-4.7`
`glm-5`	`zai-org/GLM-5`
`kimi-k2`	`moonshotai/Kimi-K2-Instruct-0905`
`kimi-k2.5`	`moonshotai/Kimi-K2.5`
`deepseek-v3-free`	`deepseek-ai/DeepSeek-V3`

SiliconFlow Models

Client Model Name	Actual Provider Model
`deepseek-v3.2`	`deepseek-ai/DeepSeek-V3.2`
`glm-4.7`	`THUDM/GLM-4-32B-0414`
`kimi-k2`	`moonshotai/Kimi-K2-Instruct-0905`
`qwen3-coder`	`Qwen/Qwen3-Coder-480B-A35B-Instruct`
`qwen3-coder-30b`	`Qwen/Qwen3-Coder-30B-A3B-Instruct`

OpenRouter Models

Client Model Name	Actual Provider Model
`deepseek-v3.2`	`deepseek/deepseek-chat-v3-0324`
`deepseek-v3-free`	`deepseek/deepseek-chat-v3-0324:free`
`kimi-k2.5`	`moonshotai/kimi-k2.5`
`minimax-m2.5`	`minimax/minimax-m2.5`
`gpt-4.1-mini`	`openai/gpt-4.1-mini`
`gpt-4.1`	`openai/gpt-4.1`
`gemini-3-flash-preview`	`google/gemini-3-flash-preview`
`gemini-2.5-pro`	`google/gemini-2.5-pro-preview`
`claude-sonnet`	`anthropic/claude-sonnet-4`
`trinity-large-preview`	`arcee-ai/trinity-large-preview`

Groq Models

Client Model Name	Actual Provider Model
`llama-3.3-70b`	`llama-3.3-70b-versatile`

Cerebras Models

Client Model Name	Actual Provider Model
`llama-3.3-70b-cerebras`	`llama-3.3-70b`

Fallback Behavior

new-api handles fallbacks via priority levels:

When a model exists on multiple channels, the highest priority (lowest number) channel is tried first
If it fails, it automatically falls back to the next priority level

For example, deepseek-v3.2 exists on:

DeepInfra (priority 1) — tried first
SiliconFlow (priority 2) — fallback
OpenRouter (priority 3) — last resort

Grafana Setup

After first start, access Grafana at http://<server>:3001:

Login with admin / $GRAFANA_ADMIN_PASSWORD
Add data source: Prometheus with URL http://victoriametrics:8428
Import dashboards:
- Node Exporter Full: dashboard ID 1860
- Redis: dashboard ID 763

4.2 KiB Raw Blame History