Guide
1. Product Overview
TokenBay is an API relay platform that aggregates leading large language models. We provide developers with models from OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and many others through a unified OpenAI-compatible API. With one API key and one Base URL, you can call leading models from around the world.
Core Advantages
- Unified API: Fully compatible with the OpenAI API format, so existing code can usually migrate by changing one line
- Transparent pricing: Every model is clearly priced; pay only for what you use, with no subscription or hidden conditions
- Stable and fast: Multi-route load balancing with direct access from mainland China and no proxy required
- Broad model selection: Aggregates dozens of models, including GPT, Claude, Gemini, DeepSeek, and Qwen
- Flexible management: Supports multiple keys, credit limits, usage statistics, and team collaboration
Intended Users
- AI application developers
- Enterprise engineering teams
- Individuals learning and researching AI
- Products that need fast access to multiple models
2. Quick Integration
Step 1: Create an API Key
Step 2: Enter the URL and Key
- Prepare the tool you want to connect
- Enter the URL and key
Base URL: https://api.tokenbay.com/v1
API Key: sk-xxxxxxxxxxxxxxxxxxxxxxxxStep 3: Make a Request
from openai import OpenAI
client = OpenAI(
base_url="https://api.tokenbay.com/v1",
api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)3. Billing
Billing Model
- Pay as you go: Charges are based on actual usage; pay only for what you use
- No subscription or monthly fee: No mandatory subscription, and your account balance is always available
- Prepaid balance: Top up your account balance in advance, and charges are deducted in real time
Pricing Rules
- Pricing for every model is publicly displayed on the “Model Pricing” page
- Input and output prices are usually different, with output generally costing more
- Some models support discounts for cache hits through Prompt Caching
Top-ups and Balance
- Supports multiple payment methods
- Top-ups are credited in real time
- Balance remains valid permanently and does not expire
- Supports low-balance alert notifications
4. Models
Supported Model Categories
- See the “Model Pricing” page for the complete model list and real-time prices.
Model Selection Recommendations
| Goal | Recommended Models |
|---|---|
| Highest quality | gpt-5.5, gpt-5.4, claude-opus-4.8, claude-opus-4.7, gemini-3.1-pro-preview, and more |
| Best value | gpt-5.4-mini, gemini-2.5-flash, gemini-3.5-flash, qwen3.5-flash, and more |
| Complex reasoning | gpt-5.5, claude-opus-4.8, deepseek-v4-pro, qwen3.7-max, and more |
| Long-text processing | gemini-3.1-pro-preview, gemini-2.5-pro, claude-sonnet-4.6, kimi-k2.6, and more |
API Compatibility
TokenBay is fully compatible with the OpenAI API format and supports the following endpoints:
| Endpoint | Purpose |
|---|---|
/v1/chat/completions | Chat completions with streaming support |
/v1/images/generations | Image generation |
/v1/models | Get the model list |
The public return scope of
/v1/modelsis still being finalized. Use the Model Pricing page and the console Models page as the source of truth for model IDs, pricing, modalities, and availability.
Model Debugging
On the “Model Usage” page, you can:
- Debug online: Test model responses directly through chat without writing code
- Adjust parameters: Configure parameters required by text, image, video, and other models
- Compare multiple models: Ask multiple models the same question and compare their responses
- Copy code quickly: Generate the corresponding code snippet after debugging
5. Features
1. API Key Management
On the “API Key Management” page, you can:
- Create multiple keys: Create separate keys for different projects or applications to simplify management and statistics
- Set credit limits: Set a maximum available balance for each key to prevent accidental overspending
- Set model permissions: Restrict a key to specific models
- Set an expiration date: Specify when a key expires
- Enable / disable / delete: Control the status of a key at any time
- View independent usage: View usage for each individual key
Security recommendation: Treat an API key like a password. Do not commit it to GitHub or expose it publicly. If it is leaked, disable it in the console and generate a new one immediately.
2. Usage Statistics and Billing
On the “Usage Logs” page, you can:
- View daily, weekly, and monthly usage trends
- View each project’s usage by key
- View the usage of different models
- View details for every request, including time, model, token count, and cost
- Export billing data as CSV
3. Top-ups, Invoices, and Billing
- Supports cards, Google Pay, Apple Pay, Link, and other payment methods
- Top-ups are credited in real time, and the balance remains valid permanently
- A payment bill is automatically sent by email after the top-up is completed
4. Teams and Invitations (Being Upgraded)
On the “Organization Management” page, you can:
- Organization management: Create organizations and manage members
- Invite members: Invite colleagues through an invitation link or email
- Member keys: Allow organization members to create their own keys
- Credit management: Set a separate monthly credit allowance for each member
- Permission control: Distinguish administrators, regular members, and other roles
- Unified billing: Deduct all member usage from the primary account
6. Best Practices
1. Control Costs
- Use cost-effective models for simple tasks first
- Use flagship models for complex tasks
- Use Prompt Caching to reduce the cost of repeated input
- Set a credit limit for every key to prevent overspending if a key is leaked
2. Improve Reliability
- Implement request retries, with one retry recommended and exponential backoff
- Set a reasonable timeout, with 60–120 seconds recommended
- Monitor error rates and switch models when necessary
3. Streaming Responses
For chat applications, use stream=True so users can see the response as it is generated:
from openai import OpenAI
client = OpenAI(
base_url="https://api.tokenbay.com/v1",
api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
stream = client.chat.completions.create(
model="gpt-5.4",
messages=[],
stream=True,
)4. Error Handling
Common status codes and typical scenarios:
| Status | Meaning | Typical Scenario |
|---|---|---|
| 200 | Success | The request succeeded; an asynchronous task may still report failure through status: "failed" |
| 400 | Client request error | Invalid parameters, missing required fields, body parsing failure, or a missing task |
| 401 | Authentication failed | The token is missing, invalid, or disabled |
| 403 | Permission denied | The user is blocked, credit is insufficient, the group or model is unauthorized, or an IP restriction failed |
| 404 | Resource not found | No route matched, or a task is missing from the video content proxy |
| 413 | Request body too large | The request body exceeds the size limit |
| 429 | Rate limited | Model-level rate limiting, global API limiting, or saturated upstream capacity |
| 500 | Internal server error | Request conversion, upstream calls, response parsing, or serialization failed |
| 501 | Not implemented | The endpoint or conversion is not implemented |
| 502 | Gateway error | The video content proxy could not fetch an upstream URL, or the upstream returned an invalid response |
| 503 | Service unavailable | No channel is available, system resources are overloaded, or a channel has no available key |
| 504 | Upstream timeout | The channel exceeded its response-time limit |
For retryable errors, use exponential backoff and cap the retry count. Fix authentication, permission, and parameter errors before sending the request again.
