Guide

1. Product Overview

TokenBay is an API relay platform that aggregates leading large language models. We provide developers with models from OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and many others through a unified OpenAI-compatible API. With one API key and one Base URL, you can call leading models from around the world.

Core Advantages

Unified API: Fully compatible with the OpenAI API format, so existing code can usually migrate by changing one line
Transparent pricing: Every model is clearly priced; pay only for what you use, with no subscription or hidden conditions
Stable and fast: Multi-route load balancing with direct access from mainland China and no proxy required
Broad model selection: Aggregates dozens of models, including GPT, Claude, Gemini, DeepSeek, and Qwen
Flexible management: Supports multiple keys, credit limits, usage statistics, and team collaboration

Intended Users

AI application developers
Enterprise engineering teams
Individuals learning and researching AI
Products that need fast access to multiple models

2. Quick Integration

Step 1: Create an API Key

Register or sign in to the console
Create an API key in “API Keys” and copy it

Step 2: Enter the URL and Key

Prepare the tool you want to connect
Enter the URL and key

Base URL: https://api.tokenbay.com/v1
API Key:  sk-xxxxxxxxxxxxxxxxxxxxxxxx

Step 3: Make a Request

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tokenbay.com/v1",
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
)
 
print(response.choices[0].message.content)

3. Billing

Billing Model

Pay as you go: Charges are based on actual usage; pay only for what you use
No subscription or monthly fee: No mandatory subscription, and your account balance is always available
Prepaid balance: Top up your account balance in advance, and charges are deducted in real time

Pricing Rules

Pricing for every model is publicly displayed on the “Model Pricing” page
Input and output prices are usually different, with output generally costing more
Some models support discounts for cache hits through Prompt Caching

Top-ups and Balance

Supports multiple payment methods
Top-ups are credited in real time
Balance remains valid permanently and does not expire
Supports low-balance alert notifications

4. Models

Supported Model Categories

See the “Model Pricing” page for the complete model list and real-time prices.

Model Selection Recommendations

Goal	Recommended Models
Highest quality	gpt-5.5, gpt-5.4, claude-opus-4.8, claude-opus-4.7, gemini-3.1-pro-preview, and more
Best value	gpt-5.4-mini, gemini-2.5-flash, gemini-3.5-flash, qwen3.5-flash, and more
Complex reasoning	gpt-5.5, claude-opus-4.8, deepseek-v4-pro, qwen3.7-max, and more
Long-text processing	gemini-3.1-pro-preview, gemini-2.5-pro, claude-sonnet-4.6, kimi-k2.6, and more

API Compatibility

TokenBay is fully compatible with the OpenAI API format and supports the following endpoints:

Endpoint	Purpose
`/v1/chat/completions`	Chat completions with streaming support
`/v1/images/generations`	Image generation
`/v1/models`	Get the model list

The public return scope of /v1/models is still being finalized. Use the Model Pricing page and the console Models page as the source of truth for model IDs, pricing, modalities, and availability.

Model Debugging

On the “Model Usage” page, you can:

Debug online: Test model responses directly through chat without writing code
Adjust parameters: Configure parameters required by text, image, video, and other models
Compare multiple models: Ask multiple models the same question and compare their responses
Copy code quickly: Generate the corresponding code snippet after debugging

5. Features

1. API Key Management

On the “API Key Management” page, you can:

Create multiple keys: Create separate keys for different projects or applications to simplify management and statistics
Set credit limits: Set a maximum available balance for each key to prevent accidental overspending
Set model permissions: Restrict a key to specific models
Set an expiration date: Specify when a key expires
Enable / disable / delete: Control the status of a key at any time
View independent usage: View usage for each individual key

Security recommendation: Treat an API key like a password. Do not commit it to GitHub or expose it publicly. If it is leaked, disable it in the console and generate a new one immediately.

2. Usage Statistics and Billing

On the “Usage Logs” page, you can:

View daily, weekly, and monthly usage trends
View each project’s usage by key
View the usage of different models
View details for every request, including time, model, token count, and cost
Export billing data as CSV

3. Top-ups, Invoices, and Billing

Supports cards, Google Pay, Apple Pay, Link, and other payment methods
Top-ups are credited in real time, and the balance remains valid permanently
A payment bill is automatically sent by email after the top-up is completed

4. Teams and Invitations (Being Upgraded)

On the “Organization Management” page, you can:

Organization management: Create organizations and manage members
Invite members: Invite colleagues through an invitation link or email
Member keys: Allow organization members to create their own keys
Credit management: Set a separate monthly credit allowance for each member
Permission control: Distinguish administrators, regular members, and other roles
Unified billing: Deduct all member usage from the primary account

6. Best Practices

1. Control Costs

Use cost-effective models for simple tasks first
Use flagship models for complex tasks
Use Prompt Caching to reduce the cost of repeated input
Set a credit limit for every key to prevent overspending if a key is leaked

2. Improve Reliability

Implement request retries, with one retry recommended and exponential backoff
Set a reasonable timeout, with 60–120 seconds recommended
Monitor error rates and switch models when necessary

3. Streaming Responses

For chat applications, use stream=True so users can see the response as it is generated:

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tokenbay.com/v1",
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[],
    stream=True,
)

4. Error Handling

Common status codes and typical scenarios:

Status	Meaning	Typical Scenario
200	Success	The request succeeded; an asynchronous task may still report failure through `status: "failed"`
400	Client request error	Invalid parameters, missing required fields, body parsing failure, or a missing task
401	Authentication failed	The token is missing, invalid, or disabled
403	Permission denied	The user is blocked, credit is insufficient, the group or model is unauthorized, or an IP restriction failed
404	Resource not found	No route matched, or a task is missing from the video content proxy
413	Request body too large	The request body exceeds the size limit
429	Rate limited	Model-level rate limiting, global API limiting, or saturated upstream capacity
500	Internal server error	Request conversion, upstream calls, response parsing, or serialization failed
501	Not implemented	The endpoint or conversion is not implemented
502	Gateway error	The video content proxy could not fetch an upstream URL, or the upstream returned an invalid response
503	Service unavailable	No channel is available, system resources are overloaded, or a channel has no available key
504	Upstream timeout	The channel exceeded its response-time limit

For retryable errors, use exponential backoff and cap the retry count. Fix authentication, permission, and parameter errors before sending the request again.