Production-ready AI inference

AI Inference API
Built for Scale

OpenAI-compatible API for chat completions, embeddings, and RAG. Deploy your own custom endpoints or use our shared infrastructure. Start free, scale infinitely.

Start Building View API Docs

curl

# Chat completion - OpenAI compatible
curl https://api.solidrust.ai/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-4b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

99.9%

Uptime SLA

<50ms

P99 Latency

GPU Nodes

Trusted by innovative teams

MyAshes.ai Gaming AI Assistant Aidiant.com AI Council Platform SolidRusT.net Zone AI Companion

Everything you need to build with AI

Production-ready APIs with enterprise-grade reliability. OpenAI compatible, so you can switch with zero code changes.

💬

Chat Completions

OpenAI-compatible chat API powered by Qwen3-4B. Streaming support, function calling, and context management.

/v1/chat/completions

🔤

Text Embeddings

1024-dimensional vectors using BAAI/bge-m3. Perfect for semantic search, clustering, and RAG applications.

/v1/embeddings

🔍

RAG Pipeline

Built-in retrieval-augmented generation. Semantic search, keyword search, and knowledge graph queries.

/data/v1/query/*

🤖

AI Agents

Tool-enabled agents with automatic function execution. Build complex workflows with natural language.

/v1/agent/chat

⚡

Custom Endpoints

Deploy your own backend with dedicated routes. PostgreSQL database, custom APIs, and priority support.

/your-api/*

📊

Usage Analytics

Real-time dashboards, token tracking, and cost monitoring. Export data for your own analysis.

Dashboard

OpenAI Compatible — Use your existing code and SDKs

Simple, powerful API

Get started in minutes with our OpenAI-compatible endpoints. No vendor lock-in.

Py Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.solidrust.ai/v1",
    api_key="your-api-key"
)

# Chat completion
response = client.chat.completions.create(
    model="qwen3-4b",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

JS JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.solidrust.ai/v1',
  apiKey: 'your-api-key',
});

// Chat completion with streaming
const stream = await client.chat.completions.create({
  model: 'qwen3-4b',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Available Endpoints

Endpoint	Description	Model
`/v1/chat/completions`	Chat completions (OpenAI compatible)	Qwen3-4B
`/v1/embeddings`	Text embeddings (1024-dim)	bge-m3
`/data/v1/query/semantic`	Semantic vector search	—
`/data/v1/query/hybrid`	Hybrid search (vector + keyword + graph)	—
`/v1/agent/chat`	Tool-enabled AI agent	Qwen3-4B
`/v1/models`	List available models	—

Simple, transparent pricing

Start free with shared infrastructure, or deploy your own custom backend for premium features.

Standard

Perfect for getting started

Free to start

Chat completions API
Text embeddings API
RAG pipeline access
AI agent endpoint
1,000 free requests/month
Community support

Start Free

Custom Starter

Your own dedicated backend

$49 /month

Everything in Standard
Custom API routes
Dedicated K8s namespace
PostgreSQL database
Redis cache allocation
Email support

Get Started

Custom Pro

For production workloads

$199 /month

Everything in Starter
Dedicated database
Custom data connectors (2)
100K requests included
20% usage discount
Priority Slack support

Contact Sales

Usage-Based Pricing

Endpoint	Standard	Custom	Unit
Chat Completions	$0.01	$0.05	per 1K requests
Embeddings	$0.005	$0.02	per 1K requests
RAG Queries	$0.02	$0.05	per 1K requests
Agent Calls	$0.03	$0.08	per 1K requests

Custom tier pricing applies to requests routed through your dedicated backend.

AI Inference API Built for Scale