Production-ready AI inference

AI Inference API
Built for Scale

OpenAI-compatible API for chat completions, embeddings, and RAG. Deploy your own custom endpoints or use our shared infrastructure. Start free, scale infinitely.

Start Building Read the Docs

curl

# Chat completion - OpenAI compatible
curl https://api.solidrust.ai/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vllm-primary",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

99.9%

Uptime SLA

<50ms

P99 Latency

GPU Nodes

Trusted by innovative teams

MyAshes.ai Gaming AI Assistant Aidiant.com AI Council Platform SolidRusT.net Zone AI Companion

Powering real applications

See how teams are building with our inference platform

The vLLM-powered chat API handles thousands of gaming queries daily. Response times are incredible, and the RAG pipeline means our AI assistant always has up-to-date game knowledge.

🎮

MyAshes.ai

Gaming AI Assistant

10K+ daily queries

Running our AI Council with per-member tool access was seamless. The agent endpoints let each council member have their own specialized capabilities while sharing the same inference backend.

🤖

Aidiant.com

AI Council Platform

Multi-agent orchestration

Hybrid RAG search transformed our community wiki. Players can ask natural language questions and get instant, accurate answers from our knowledge base.

⚔️

SolidRusT.net

Zone AI Companion

Semantic + keyword search

Everything you need to build with AI

Production-ready APIs with enterprise-grade reliability. OpenAI compatible, so you can switch with zero code changes.

💬

Chat Completions

OpenAI-compatible chat API powered by Qwen3-4B. Streaming support, function calling, and context management.

/v1/chat/completions

🔤

Text Embeddings

1024-dimensional vectors using BAAI/bge-m3. Perfect for semantic search, clustering, and RAG applications.

/v1/embeddings

🔍

RAG Pipeline

Built-in retrieval-augmented generation. Semantic search, keyword search, and knowledge graph queries.

/data/v1/query/*

🤖

AI Agents

Tool-enabled agents with automatic function execution. Build complex workflows with natural language.

/v1/agent/chat

⚡

Custom Endpoints

Deploy your own backend with dedicated routes. PostgreSQL database, custom APIs, and priority support.

/your-api/*

📊

Usage Analytics

Real-time dashboards, token tracking, and cost monitoring. Export data for your own analysis.

Dashboard

OpenAI Compatible — Use your existing code and SDKs

Simple, powerful API

Get started in minutes with our OpenAI-compatible endpoints. No vendor lock-in.

APIInteractive Playground

Try it live!

API Key

Get your API key from console.solidrust.ai

Quick Examples

Prompt

View cURL equivalent

curl -X POST https://api.solidrust.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "vllm-primary",
    "messages": [{"role": "user", "content": "Say hello and introduce yourself briefly."}],
    "max_tokens": 512,
    "temperature": 0.7
  }'

Py Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.solidrust.ai/v1",
    api_key="your-api-key"
)

# Chat completion
response = client.chat.completions.create(
    model="vllm-primary",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

JS JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.solidrust.ai/v1',
  apiKey: 'your-api-key',
});

// Chat completion with streaming
const stream = await client.chat.completions.create({
  model: 'vllm-primary',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Available Endpoints

Endpoint	Description	Model
`/v1/chat/completions`	Chat completions (OpenAI compatible)	vllm-primary
`/v1/embeddings`	Text embeddings (1024-dim)	bge-m3
`/data/v1/query/semantic`	Semantic vector search	-
`/data/v1/query/hybrid`	Hybrid search (vector + keyword + graph)	-
`/v1/agent/chat`	Tool-enabled AI agent	vllm-primary
`/v1/models`	List available models	-

View full documentation →

Simple, transparent pricing

Start free with shared infrastructure, or deploy your own custom backend for premium features.

Standard

Perfect for getting started

Free to start

Chat completions API
Text embeddings API
RAG pipeline access
AI agent endpoint
1,000 free requests/month
Community support

Start Free

Custom Starter

Your own dedicated backend

$49 /month

Everything in Standard
Custom API routes
Dedicated K8s namespace
PostgreSQL database
Redis cache allocation
Email support

Get Started

Custom Pro

For production workloads

$199 /month

Everything in Starter
Dedicated database
Custom data connectors (2)
100K requests included
20% usage discount
Priority Slack support

Contact Sales

Usage-Based Pricing

Endpoint	Standard	Custom	Unit
Chat Completions	$0.01	$0.05	per 1K requests
Embeddings	$0.005	$0.02	per 1K requests
RAG Queries	$0.02	$0.05	per 1K requests
Agent Calls	$0.03	$0.08	per 1K requests

Custom tier pricing applies to requests routed through your dedicated backend.

AI Inference API Built for Scale