Execution-Layer Rate Limiting
LongTrainer 1.3.0 introduces a robust, two-layer execution rate limiting system using a token-bucket algorithm. This system prevents noisy-neighbor degradation in multi-tenant deployments and ensures fair resource distribution.
Two-Layer Enforcement
- Layer 1: Tenant Ceiling: Every tenant has a hard RPM (Requests Per Minute) cap per resource. This prevents a single malicious or runaway tenant from exhausting the global infrastructure (LLM API keys, vector DB limits, etc.).
- Layer 2: Bot Equal Share: Inside a single tenant, the global budget is equally divided among their active bots (
tenant_rpm / active_bots). This prevents a single noisy bot from starving a tenant's other production bots.
Configuration
Rate limiting is disabled by default to ensure backward compatibility. You can enable it in your longtrainer.yaml:
rate_limiting:
enabled: true
llm_rpm: 60 # Max LLM generation calls per minute
embedding_rpm: 120 # Max embedding calls per minute
tool_rpm: 30 # Max tool executions per minute
ingestion_rpm: 10 # Max document chunking/ingestion ops per minute
In-Code Overrides
You can programmatically override limits for specific tenants or bots using the SDK:
from longtrainer.rate_limiter import RateLimitConfig
config = RateLimitConfig(
enabled=True,
llm_rpm=60,
tenant_overrides={
"enterprise_tenant_1": {"llm_rpm": 500}
},
bot_overrides={
"critical_bot_A": {"llm_rpm": 100}
}
)
Error Handling
API Mode
When using the FastAPI server (longtrainer serve), rate-limited requests are caught by a global exception handler. The server returns a 429 Too Many Requests with a Retry-After header indicating exactly when the bucket will have tokens again.
HTTP/1.1 429 Too Many Requests
Retry-After: 15
{
"detail": "Bot 'bot-123' rate limit exceeded for llm_calls. Retry after 14.5s."
}
CLI Mode
When using the interactive terminal (longtrainer chat), rate limits are automatically caught. Instead of crashing, the CLI displays a user-friendly countdown progress bar and auto-retries your message when the budget is restored.