GuideGuide

Guide

1. Product Overview

TokenBay is an API relay platform that aggregates leading large language models. We provide developers with models from OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and many others through a unified OpenAI-compatible API. With one API key and one Base URL, you can call leading models from around the world.

Core Advantages

  • Unified API: Fully compatible with the OpenAI API format, so existing code can usually migrate by changing one line
  • Transparent pricing: Every model is clearly priced; pay only for what you use, with no subscription or hidden conditions
  • Stable and fast: Multi-route load balancing with direct access from mainland China and no proxy required
  • Broad model selection: Aggregates dozens of models, including GPT, Claude, Gemini, DeepSeek, and Qwen
  • Flexible management: Supports multiple keys, credit limits, usage statistics, and team collaboration

Intended Users

  • AI application developers
  • Enterprise engineering teams
  • Individuals learning and researching AI
  • Products that need fast access to multiple models

2. Quick Integration

Step 1: Create an API Key

  • Register or sign in to the console
  • Create an API key in “API Keys” and copy it

Step 2: Enter the URL and Key

  • Prepare the tool you want to connect
  • Enter the URL and key
Base URL: https://api.tokenbay.com/v1
API Key:  sk-xxxxxxxxxxxxxxxxxxxxxxxx

Step 3: Make a Request

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tokenbay.com/v1",
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
)
 
print(response.choices[0].message.content)

3. Billing

Billing Model

  • Pay as you go: Charges are based on actual usage; pay only for what you use
  • No subscription or monthly fee: No mandatory subscription, and your account balance is always available
  • Prepaid balance: Top up your account balance in advance, and charges are deducted in real time

Pricing Rules

  • Pricing for every model is publicly displayed on the “Model Pricing” page
  • Input and output prices are usually different, with output generally costing more
  • Some models support discounts for cache hits through Prompt Caching

Top-ups and Balance

  • Supports multiple payment methods
  • Top-ups are credited in real time
  • Balance remains valid permanently and does not expire
  • Supports low-balance alert notifications

4. Models

Supported Model Categories

  • See the “Model Pricing” page for the complete model list and real-time prices.

Model Selection Recommendations

GoalRecommended Models
Highest qualitygpt-5.5, gpt-5.4, claude-opus-4.8, claude-opus-4.7, gemini-3.1-pro-preview, and more
Best valuegpt-5.4-mini, gemini-2.5-flash, gemini-3.5-flash, qwen3.5-flash, and more
Complex reasoninggpt-5.5, claude-opus-4.8, deepseek-v4-pro, qwen3.7-max, and more
Long-text processinggemini-3.1-pro-preview, gemini-2.5-pro, claude-sonnet-4.6, kimi-k2.6, and more

API Compatibility

TokenBay is fully compatible with the OpenAI API format and supports the following endpoints:

EndpointPurpose
/v1/chat/completionsChat completions with streaming support
/v1/images/generationsImage generation
/v1/modelsGet the model list

The public return scope of /v1/models is still being finalized. Use the Model Pricing page and the console Models page as the source of truth for model IDs, pricing, modalities, and availability.

Model Debugging

On the “Model Usage” page, you can:

  • Debug online: Test model responses directly through chat without writing code
  • Adjust parameters: Configure parameters required by text, image, video, and other models
  • Compare multiple models: Ask multiple models the same question and compare their responses
  • Copy code quickly: Generate the corresponding code snippet after debugging

5. Features

1. API Key Management

On the “API Key Management” page, you can:

  • Create multiple keys: Create separate keys for different projects or applications to simplify management and statistics
  • Set credit limits: Set a maximum available balance for each key to prevent accidental overspending
  • Set model permissions: Restrict a key to specific models
  • Set an expiration date: Specify when a key expires
  • Enable / disable / delete: Control the status of a key at any time
  • View independent usage: View usage for each individual key

Security recommendation: Treat an API key like a password. Do not commit it to GitHub or expose it publicly. If it is leaked, disable it in the console and generate a new one immediately.

2. Usage Statistics and Billing

On the “Usage Logs” page, you can:

  • View daily, weekly, and monthly usage trends
  • View each project’s usage by key
  • View the usage of different models
  • View details for every request, including time, model, token count, and cost
  • Export billing data as CSV

3. Top-ups, Invoices, and Billing

  • Supports cards, Google Pay, Apple Pay, Link, and other payment methods
  • Top-ups are credited in real time, and the balance remains valid permanently
  • A payment bill is automatically sent by email after the top-up is completed

4. Teams and Invitations (Being Upgraded)

On the “Organization Management” page, you can:

  • Organization management: Create organizations and manage members
  • Invite members: Invite colleagues through an invitation link or email
  • Member keys: Allow organization members to create their own keys
  • Credit management: Set a separate monthly credit allowance for each member
  • Permission control: Distinguish administrators, regular members, and other roles
  • Unified billing: Deduct all member usage from the primary account

6. Best Practices

1. Control Costs

  • Use cost-effective models for simple tasks first
  • Use flagship models for complex tasks
  • Use Prompt Caching to reduce the cost of repeated input
  • Set a credit limit for every key to prevent overspending if a key is leaked

2. Improve Reliability

  • Implement request retries, with one retry recommended and exponential backoff
  • Set a reasonable timeout, with 60–120 seconds recommended
  • Monitor error rates and switch models when necessary

3. Streaming Responses

For chat applications, use stream=True so users can see the response as it is generated:

from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.tokenbay.com/v1",
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
stream = client.chat.completions.create(
    model="gpt-5.4",
    messages=[],
    stream=True,
)

4. Error Handling

Common status codes and typical scenarios:

StatusMeaningTypical Scenario
200SuccessThe request succeeded; an asynchronous task may still report failure through status: "failed"
400Client request errorInvalid parameters, missing required fields, body parsing failure, or a missing task
401Authentication failedThe token is missing, invalid, or disabled
403Permission deniedThe user is blocked, credit is insufficient, the group or model is unauthorized, or an IP restriction failed
404Resource not foundNo route matched, or a task is missing from the video content proxy
413Request body too largeThe request body exceeds the size limit
429Rate limitedModel-level rate limiting, global API limiting, or saturated upstream capacity
500Internal server errorRequest conversion, upstream calls, response parsing, or serialization failed
501Not implementedThe endpoint or conversion is not implemented
502Gateway errorThe video content proxy could not fetch an upstream URL, or the upstream returned an invalid response
503Service unavailableNo channel is available, system resources are overloaded, or a channel has no available key
504Upstream timeoutThe channel exceeded its response-time limit

For retryable errors, use exponential backoff and cap the retry count. Fix authentication, permission, and parameter errors before sending the request again.