What is the difference between rate limiting and throttling?

Rate limiting restricts the number of requests a client can make within a specific time window (e.g., 100 requests per minute), typically rejecting requests that exceed the limit with a 429 status. Throttling slows down request processing when limits are approached, adding delays rather than outright rejections to smooth out traffic bursts.

Why do AI code generators fail to implement proper rate limiting?

AI code generators prioritize happy-path functionality over resilience. Rate limiting and retry logic are defensive patterns underrepresented in training data. Most code examples online demonstrate basic API calls without handling edge cases like 429 responses, network failures, or cascading failures.

What is exponential backoff with jitter and why is it important?

Exponential backoff increases wait time exponentially between retries (1s, 2s, 4s, 8s). Adding jitter introduces randomness to prevent thundering herd problems where multiple clients retry simultaneously. This combination prevents server overload during outages and improves system stability.

API Rate Limiting and Throttling Oversights: Building Resilient API Clients

Modern web applications rely heavily on external APIs for payment processing, authentication, data enrichment, AI services, and countless other functions. Yet when AI tools generate code to consume these APIs, they consistently produce clients that work perfectly in development but catastrophically fail in production. The missing ingredient? Rate limiting and throttling resilience patterns.

In this comprehensive guide, we'll explore why AI-generated API clients lack retry logic, exponential backoff, and proper timeout configurations. More importantly, we'll build robust solutions using token bucket algorithms, adaptive rate limiting, queue-based processing, and automatic retry mechanisms that can withstand real-world API constraints.

The Hidden Crisis in AI-Generated API Clients

Ask ChatGPT, GitHub Copilot, or Claude to write code that fetches data from an API, and you'll typically receive something like this:

// AI-generated API client - looks fine, but has critical flaws
async function fetchUserData(userId) {
    const response = await fetch(`https://api.example.com/users/${userId}`, {
        headers: { 'Authorization': `Bearer ${API_KEY}` }
    });
    return response.json();
}

// Batch processing - disaster waiting to happen
async function processAllUsers(userIds) {
    const results = await Promise.all(
        userIds.map(id => fetchUserData(id))
    );
    return results;
}

This code has at least seven critical flaws that will cause production failures:

No rate limiting awareness - Will trigger 429 errors immediately with large batches
No retry logic - Single network hiccup causes complete failure
No exponential backoff - Retries would hammer the server
No timeout configuration - Requests can hang indefinitely
No error handling - Non-2xx responses treated as success
No circuit breaker - Cascading failures will bring down your system
No request queuing - Unbounded parallelism overwhelms both client and server

The Statistics Are Alarming

67% of AI-generated API clients lack any form of retry logic
89% don't implement exponential backoff
94% ignore rate limit headers (X-RateLimit-Remaining, Retry-After)
78% of production API outages involve cascading failures from uncontrolled retries
The average cost of API-related downtime is $5,600 per minute

Understanding Rate Limiting vs Throttling

Before implementing solutions, let's clarify these often-confused concepts:

Rate Limiting

Rate limiting enforces hard boundaries on request frequency. When you exceed the limit, the server rejects your request with HTTP 429 (Too Many Requests). Common patterns include:

Fixed Window: 100 requests per minute, resets at minute boundaries
Sliding Window: 100 requests in any 60-second rolling period
Per-endpoint limits: /api/search allows 10/min, /api/users allows 100/min

Throttling

Throttling is a softer approach that slows down processing rather than rejecting requests outright. It smooths traffic bursts by introducing delays:

Request delays: Adding 100ms between requests as you approach limits
Queue-based processing: Buffering requests and processing at a sustainable rate
Adaptive throttling: Automatically adjusting speed based on server responses

Implementing the Token Bucket Algorithm

The token bucket algorithm is the industry standard for client-side rate limiting. Think of it as a bucket that fills with tokens at a steady rate. Each request consumes a token. If no tokens are available, the request must wait.

// Token Bucket Rate Limiter Implementation
class TokenBucket {
    constructor(options) {
        this.capacity = options.capacity || 10;        // Max tokens
        this.refillRate = options.refillRate || 1;     // Tokens per second
        this.tokens = this.capacity;                   // Start full
        this.lastRefill = Date.now();
    }

    refill() {
        const now = Date.now();
        const elapsed = (now - this.lastRefill) / 1000;
        const tokensToAdd = elapsed * this.refillRate;

        this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
        this.lastRefill = now;
    }

    async acquire(tokensNeeded = 1) {
        this.refill();

        if (this.tokens >= tokensNeeded) {
            this.tokens -= tokensNeeded;
            return true;
        }

        // Calculate wait time for tokens to become available
        const tokensDeficit = tokensNeeded - this.tokens;
        const waitTime = (tokensDeficit / this.refillRate) * 1000;

        await this.sleep(waitTime);
        return this.acquire(tokensNeeded);
    }

    sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }

    getTokens() {
        this.refill();
        return this.tokens;
    }
}

// Usage Example
const rateLimiter = new TokenBucket({
    capacity: 100,      // Burst capacity
    refillRate: 10      // 10 requests per second sustained
});

async function makeApiRequest(url) {
    await rateLimiter.acquire();  // Wait for token
    return fetch(url);
}

Leaky Bucket Alternative

The leaky bucket algorithm provides smoother output by processing requests at a constant rate:

// Leaky Bucket Implementation for Smooth Request Processing
class LeakyBucket {
    constructor(options) {
        this.capacity = options.capacity || 10;
        this.leakRate = options.leakRate || 1;  // Requests per second
        this.queue = [];
        this.processing = false;
    }

    async add(requestFn) {
        return new Promise((resolve, reject) => {
            if (this.queue.length >= this.capacity) {
                reject(new Error('Queue full - request rejected'));
                return;
            }

            this.queue.push({ requestFn, resolve, reject });
            this.processQueue();
        });
    }

    async processQueue() {
        if (this.processing || this.queue.length === 0) return;

        this.processing = true;

        while (this.queue.length > 0) {
            const { requestFn, resolve, reject } = this.queue.shift();

            try {
                const result = await requestFn();
                resolve(result);
            } catch (error) {
                reject(error);
            }

            // Wait before processing next request
            await new Promise(r => setTimeout(r, 1000 / this.leakRate));
        }

        this.processing = false;
    }
}

// Usage
const bucket = new LeakyBucket({ capacity: 50, leakRate: 5 });

async function fetchWithThrottling(url) {
    return bucket.add(() => fetch(url));
}

Implementing Exponential Backoff with Jitter

When requests fail, naive retry logic that immediately retries can worsen server overload. Exponential backoff progressively increases wait times, while jitter prevents the "thundering herd" problem where all clients retry simultaneously.

// Exponential Backoff with Full Jitter
class RetryStrategy {
    constructor(options = {}) {
        this.baseDelay = options.baseDelay || 1000;      // 1 second
        this.maxDelay = options.maxDelay || 32000;       // 32 seconds
        this.maxRetries = options.maxRetries || 5;
        this.jitterType = options.jitterType || 'full'; // full, equal, decorrelated
    }

    calculateDelay(attempt) {
        // Exponential: 1s, 2s, 4s, 8s, 16s, 32s...
        const exponentialDelay = Math.min(
            this.maxDelay,
            this.baseDelay * Math.pow(2, attempt)
        );

        switch (this.jitterType) {
            case 'full':
                // Full jitter: random between 0 and exponential delay
                return Math.random() * exponentialDelay;

            case 'equal':
                // Equal jitter: half exponential + random half
                return (exponentialDelay / 2) + (Math.random() * exponentialDelay / 2);

            case 'decorrelated':
                // Decorrelated jitter: based on previous delay
                return Math.min(
                    this.maxDelay,
                    this.baseDelay + Math.random() * (exponentialDelay * 3 - this.baseDelay)
                );

            default:
                return exponentialDelay;
        }
    }

    isRetryable(error, response) {
        // Network errors are always retryable
        if (!response) return true;

        // Retry on rate limiting and server errors
        const retryableStatuses = [429, 500, 502, 503, 504];
        return retryableStatuses.includes(response.status);
    }
}

// Retry-enabled Fetch Wrapper
async function fetchWithRetry(url, options = {}, retryStrategy = new RetryStrategy()) {
    let lastError;

    for (let attempt = 0; attempt <= retryStrategy.maxRetries; attempt++) {
        try {
            const response = await fetch(url, {
                ...options,
                signal: AbortSignal.timeout(options.timeout || 30000)
            });

            // Check for rate limiting
            if (response.status === 429) {
                const retryAfter = response.headers.get('Retry-After');
                const delay = retryAfter
                    ? parseInt(retryAfter) * 1000
                    : retryStrategy.calculateDelay(attempt);

                console.log(`Rate limited. Waiting ${delay}ms before retry ${attempt + 1}`);
                await new Promise(r => setTimeout(r, delay));
                continue;
            }

            // Check for server errors
            if (response.status >= 500) {
                throw new Error(`Server error: ${response.status}`);
            }

            return response;

        } catch (error) {
            lastError = error;

            if (attempt < retryStrategy.maxRetries) {
                const delay = retryStrategy.calculateDelay(attempt);
                console.log(`Request failed. Retry ${attempt + 1}/${retryStrategy.maxRetries} in ${delay}ms`);
                await new Promise(r => setTimeout(r, delay));
            }
        }
    }

    throw new Error(`All ${retryStrategy.maxRetries} retries failed: ${lastError.message}`);
}

Properly Handling 429 Responses

HTTP 429 responses often include helpful headers that AI-generated code completely ignores:

// Comprehensive 429 Response Handler
class RateLimitHandler {
    constructor() {
        this.limitInfo = new Map();  // Track per-endpoint limits
    }

    parseRateLimitHeaders(response, endpoint) {
        const headers = {
            limit: response.headers.get('X-RateLimit-Limit'),
            remaining: response.headers.get('X-RateLimit-Remaining'),
            reset: response.headers.get('X-RateLimit-Reset'),
            retryAfter: response.headers.get('Retry-After')
        };

        if (headers.limit) {
            this.limitInfo.set(endpoint, {
                limit: parseInt(headers.limit),
                remaining: parseInt(headers.remaining),
                resetAt: headers.reset ? new Date(parseInt(headers.reset) * 1000) : null,
                retryAfter: headers.retryAfter ? parseInt(headers.retryAfter) : null
            });
        }

        return headers;
    }

    shouldPreemptivelyWait(endpoint) {
        const info = this.limitInfo.get(endpoint);
        if (!info) return { wait: false };

        // Preemptively wait if we're at 10% remaining
        if (info.remaining <= info.limit * 0.1) {
            const waitTime = info.resetAt
                ? Math.max(0, info.resetAt - Date.now())
                : 60000;  // Default 1 minute

            return { wait: true, duration: waitTime, reason: 'approaching limit' };
        }

        return { wait: false };
    }

    getRetryDelay(response) {
        // Priority 1: Retry-After header (seconds)
        const retryAfter = response.headers.get('Retry-After');
        if (retryAfter) {
            // Could be seconds or HTTP date
            const seconds = parseInt(retryAfter);
            if (!isNaN(seconds)) {
                return seconds * 1000;
            }
            // Parse HTTP date
            const date = new Date(retryAfter);
            if (!isNaN(date.getTime())) {
                return Math.max(0, date.getTime() - Date.now());
            }
        }

        // Priority 2: X-RateLimit-Reset header
        const reset = response.headers.get('X-RateLimit-Reset');
        if (reset) {
            const resetTime = new Date(parseInt(reset) * 1000);
            return Math.max(0, resetTime.getTime() - Date.now());
        }

        // Default: exponential backoff
        return null;
    }
}

// Usage in API Client
class ResilientApiClient {
    constructor(baseUrl, options = {}) {
        this.baseUrl = baseUrl;
        this.rateLimitHandler = new RateLimitHandler();
        this.retryStrategy = new RetryStrategy(options.retry);
    }

    async request(endpoint, options = {}) {
        const url = `${this.baseUrl}${endpoint}`;

        // Check if we should preemptively wait
        const preemptive = this.rateLimitHandler.shouldPreemptivelyWait(endpoint);
        if (preemptive.wait) {
            console.log(`Preemptively waiting ${preemptive.duration}ms: ${preemptive.reason}`);
            await new Promise(r => setTimeout(r, preemptive.duration));
        }

        let lastResponse;

        for (let attempt = 0; attempt <= this.retryStrategy.maxRetries; attempt++) {
            try {
                const response = await fetch(url, {
                    ...options,
                    signal: AbortSignal.timeout(options.timeout || 30000)
                });

                // Parse and store rate limit info
                this.rateLimitHandler.parseRateLimitHeaders(response, endpoint);

                if (response.status === 429) {
                    const delay = this.rateLimitHandler.getRetryDelay(response)
                        || this.retryStrategy.calculateDelay(attempt);

                    console.log(`429 Rate Limited on ${endpoint}. Waiting ${delay}ms`);
                    await new Promise(r => setTimeout(r, delay));
                    continue;
                }

                if (response.ok) {
                    return response;
                }

                lastResponse = response;

                if (response.status >= 500) {
                    const delay = this.retryStrategy.calculateDelay(attempt);
                    await new Promise(r => setTimeout(r, delay));
                    continue;
                }

                // Client error - don't retry
                throw new ApiError(response.status, await response.text());

            } catch (error) {
                if (error instanceof ApiError) throw error;

                if (attempt < this.retryStrategy.maxRetries) {
                    const delay = this.retryStrategy.calculateDelay(attempt);
                    console.log(`Network error. Retry ${attempt + 1} in ${delay}ms`);
                    await new Promise(r => setTimeout(r, delay));
                }
            }
        }

        throw new Error(`Request failed after ${this.retryStrategy.maxRetries} retries`);
    }
}

class ApiError extends Error {
    constructor(status, body) {
        super(`API Error ${status}: ${body}`);
        this.status = status;
        this.body = body;
    }
}

Implementing a Circuit Breaker Pattern

The circuit breaker prevents cascading failures by stopping requests to a failing service. It has three states: Closed (normal), Open (blocking), and Half-Open (testing).

// Circuit Breaker Implementation
class CircuitBreaker {
    constructor(options = {}) {
        this.failureThreshold = options.failureThreshold || 5;
        this.recoveryTimeout = options.recoveryTimeout || 30000;
        this.monitoringPeriod = options.monitoringPeriod || 10000;

        this.state = 'CLOSED';
        this.failures = 0;
        this.successes = 0;
        this.lastFailureTime = null;
        this.nextAttemptTime = null;
    }

    async execute(operation) {
        if (this.state === 'OPEN') {
            if (Date.now() < this.nextAttemptTime) {
                throw new CircuitBreakerOpenError(
                    `Circuit breaker is OPEN. Retry after ${this.nextAttemptTime - Date.now()}ms`
                );
            }
            // Transition to half-open
            this.state = 'HALF_OPEN';
            console.log('Circuit breaker transitioning to HALF_OPEN');
        }

        try {
            const result = await operation();
            this.onSuccess();
            return result;
        } catch (error) {
            this.onFailure();
            throw error;
        }
    }

    onSuccess() {
        if (this.state === 'HALF_OPEN') {
            this.successes++;
            if (this.successes >= 3) {
                this.reset();
                console.log('Circuit breaker CLOSED after successful recovery');
            }
        } else {
            this.failures = 0;
        }
    }

    onFailure() {
        this.failures++;
        this.lastFailureTime = Date.now();

        if (this.state === 'HALF_OPEN') {
            this.trip();
        } else if (this.failures >= this.failureThreshold) {
            this.trip();
        }
    }

    trip() {
        this.state = 'OPEN';
        this.nextAttemptTime = Date.now() + this.recoveryTimeout;
        console.log(`Circuit breaker OPEN. Will attempt recovery at ${new Date(this.nextAttemptTime)}`);
    }

    reset() {
        this.state = 'CLOSED';
        this.failures = 0;
        this.successes = 0;
        this.lastFailureTime = null;
        this.nextAttemptTime = null;
    }

    getState() {
        return {
            state: this.state,
            failures: this.failures,
            nextAttemptTime: this.nextAttemptTime
        };
    }
}

class CircuitBreakerOpenError extends Error {
    constructor(message) {
        super(message);
        this.name = 'CircuitBreakerOpenError';
    }
}

// Usage with API Client
const circuitBreaker = new CircuitBreaker({
    failureThreshold: 5,
    recoveryTimeout: 30000
});

async function fetchWithCircuitBreaker(url) {
    return circuitBreaker.execute(async () => {
        const response = await fetchWithRetry(url);
        if (!response.ok) {
            throw new Error(`HTTP ${response.status}`);
        }
        return response.json();
    });
}

Building Queue-Based Request Processing

For high-volume API interactions, a queue-based approach provides ultimate control over request flow:

// Priority Queue for API Requests
class PriorityRequestQueue {
    constructor(options = {}) {
        this.concurrency = options.concurrency || 5;
        this.rateLimit = options.rateLimit || 10;  // requests per second
        this.queues = {
            high: [],
            normal: [],
            low: []
        };
        this.activeRequests = 0;
        this.processing = false;
        this.lastRequestTime = 0;
        this.minInterval = 1000 / this.rateLimit;
    }

    async enqueue(requestFn, priority = 'normal') {
        return new Promise((resolve, reject) => {
            this.queues[priority].push({
                execute: requestFn,
                resolve,
                reject,
                enqueuedAt: Date.now()
            });

            this.processQueue();
        });
    }

    getNextRequest() {
        // Priority order: high > normal > low
        for (const priority of ['high', 'normal', 'low']) {
            if (this.queues[priority].length > 0) {
                return this.queues[priority].shift();
            }
        }
        return null;
    }

    async processQueue() {
        if (this.processing) return;
        this.processing = true;

        while (true) {
            // Check if we can process more requests
            if (this.activeRequests >= this.concurrency) {
                await new Promise(r => setTimeout(r, 100));
                continue;
            }

            const request = this.getNextRequest();
            if (!request) break;

            // Enforce rate limit
            const timeSinceLastRequest = Date.now() - this.lastRequestTime;
            if (timeSinceLastRequest < this.minInterval) {
                await new Promise(r => setTimeout(r, this.minInterval - timeSinceLastRequest));
            }

            this.activeRequests++;
            this.lastRequestTime = Date.now();

            // Execute request asynchronously
            this.executeRequest(request);
        }

        this.processing = false;
    }

    async executeRequest(request) {
        try {
            const result = await request.execute();
            request.resolve(result);
        } catch (error) {
            request.reject(error);
        } finally {
            this.activeRequests--;
            this.processQueue();  // Check for more work
        }
    }

    getQueueStats() {
        return {
            high: this.queues.high.length,
            normal: this.queues.normal.length,
            low: this.queues.low.length,
            active: this.activeRequests
        };
    }
}

// Usage Example
const queue = new PriorityRequestQueue({
    concurrency: 5,
    rateLimit: 10
});

// High priority - user-facing requests
queue.enqueue(() => fetch('/api/user/profile'), 'high');

// Normal priority - background data
queue.enqueue(() => fetch('/api/analytics'), 'normal');

// Low priority - prefetching
queue.enqueue(() => fetch('/api/suggestions'), 'low');

Building a Complete Resilient API Client

Let's combine all these patterns into a production-ready API client:

// Complete Resilient API Client
class ResilientApiClient {
    constructor(config) {
        this.baseUrl = config.baseUrl;
        this.apiKey = config.apiKey;

        // Initialize components
        this.rateLimiter = new TokenBucket({
            capacity: config.rateLimit?.burst || 20,
            refillRate: config.rateLimit?.perSecond || 10
        });

        this.circuitBreaker = new CircuitBreaker({
            failureThreshold: config.circuitBreaker?.threshold || 5,
            recoveryTimeout: config.circuitBreaker?.timeout || 30000
        });

        this.retryStrategy = new RetryStrategy({
            maxRetries: config.retry?.maxRetries || 3,
            baseDelay: config.retry?.baseDelay || 1000,
            maxDelay: config.retry?.maxDelay || 30000,
            jitterType: 'full'
        });

        this.queue = new PriorityRequestQueue({
            concurrency: config.concurrency || 5,
            rateLimit: config.rateLimit?.perSecond || 10
        });

        this.rateLimitHandler = new RateLimitHandler();
        this.metrics = new ApiMetrics();
    }

    async request(endpoint, options = {}) {
        const startTime = Date.now();
        const priority = options.priority || 'normal';

        return this.queue.enqueue(async () => {
            // Acquire rate limit token
            await this.rateLimiter.acquire();

            // Execute through circuit breaker
            return this.circuitBreaker.execute(async () => {
                return this.executeWithRetry(endpoint, options, startTime);
            });
        }, priority);
    }

    async executeWithRetry(endpoint, options, startTime) {
        const url = `${this.baseUrl}${endpoint}`;
        let lastError;

        for (let attempt = 0; attempt <= this.retryStrategy.maxRetries; attempt++) {
            try {
                const response = await fetch(url, {
                    method: options.method || 'GET',
                    headers: {
                        'Authorization': `Bearer ${this.apiKey}`,
                        'Content-Type': 'application/json',
                        ...options.headers
                    },
                    body: options.body ? JSON.stringify(options.body) : undefined,
                    signal: AbortSignal.timeout(options.timeout || 30000)
                });

                // Track rate limit info
                this.rateLimitHandler.parseRateLimitHeaders(response, endpoint);

                // Handle rate limiting
                if (response.status === 429) {
                    const delay = this.rateLimitHandler.getRetryDelay(response)
                        || this.retryStrategy.calculateDelay(attempt);

                    this.metrics.recordRateLimit(endpoint);

                    if (attempt < this.retryStrategy.maxRetries) {
                        await new Promise(r => setTimeout(r, delay));
                        continue;
                    }
                }

                // Handle server errors
                if (response.status >= 500) {
                    this.metrics.recordError(endpoint, response.status);

                    if (attempt < this.retryStrategy.maxRetries) {
                        const delay = this.retryStrategy.calculateDelay(attempt);
                        await new Promise(r => setTimeout(r, delay));
                        continue;
                    }
                }

                // Success or client error
                const duration = Date.now() - startTime;
                this.metrics.recordRequest(endpoint, response.status, duration);

                if (response.ok) {
                    return {
                        data: await response.json(),
                        status: response.status,
                        headers: Object.fromEntries(response.headers),
                        duration
                    };
                }

                throw new ApiError(response.status, await response.text());

            } catch (error) {
                lastError = error;

                if (error instanceof ApiError && error.status < 500) {
                    throw error;  // Don't retry client errors
                }

                this.metrics.recordError(endpoint, error.message);

                if (attempt < this.retryStrategy.maxRetries) {
                    const delay = this.retryStrategy.calculateDelay(attempt);
                    await new Promise(r => setTimeout(r, delay));
                }
            }
        }

        throw lastError || new Error('Request failed after all retries');
    }

    // Convenience methods
    async get(endpoint, options = {}) {
        return this.request(endpoint, { ...options, method: 'GET' });
    }

    async post(endpoint, body, options = {}) {
        return this.request(endpoint, { ...options, method: 'POST', body });
    }

    async put(endpoint, body, options = {}) {
        return this.request(endpoint, { ...options, method: 'PUT', body });
    }

    async delete(endpoint, options = {}) {
        return this.request(endpoint, { ...options, method: 'DELETE' });
    }

    // Health and metrics
    getHealth() {
        return {
            circuitBreaker: this.circuitBreaker.getState(),
            queue: this.queue.getQueueStats(),
            rateLimiter: { availableTokens: this.rateLimiter.getTokens() },
            metrics: this.metrics.getSummary()
        };
    }
}

// Metrics tracking
class ApiMetrics {
    constructor() {
        this.requests = [];
        this.errors = [];
        this.rateLimits = [];
    }

    recordRequest(endpoint, status, duration) {
        this.requests.push({ endpoint, status, duration, timestamp: Date.now() });
        this.cleanup();
    }

    recordError(endpoint, error) {
        this.errors.push({ endpoint, error, timestamp: Date.now() });
        this.cleanup();
    }

    recordRateLimit(endpoint) {
        this.rateLimits.push({ endpoint, timestamp: Date.now() });
        this.cleanup();
    }

    cleanup() {
        const oneHourAgo = Date.now() - 3600000;
        this.requests = this.requests.filter(r => r.timestamp > oneHourAgo);
        this.errors = this.errors.filter(e => e.timestamp > oneHourAgo);
        this.rateLimits = this.rateLimits.filter(r => r.timestamp > oneHourAgo);
    }

    getSummary() {
        return {
            totalRequests: this.requests.length,
            totalErrors: this.errors.length,
            rateLimitHits: this.rateLimits.length,
            avgLatency: this.requests.length > 0
                ? this.requests.reduce((sum, r) => sum + r.duration, 0) / this.requests.length
                : 0
        };
    }
}

// Usage
const client = new ResilientApiClient({
    baseUrl: 'https://api.example.com',
    apiKey: process.env.API_KEY,
    rateLimit: { burst: 20, perSecond: 10 },
    circuitBreaker: { threshold: 5, timeout: 30000 },
    retry: { maxRetries: 3, baseDelay: 1000 },
    concurrency: 5
});

// Make requests
const users = await client.get('/users', { priority: 'high' });
const analytics = await client.post('/analytics', { event: 'page_view' });

Prompt Engineering for Resilient API Clients

When asking AI tools to generate API client code, use this comprehensive prompt template:

// Example prompt for AI code generation:

"Create an API client for [API Name] with the following requirements:

1. RATE LIMITING:
   - Implement token bucket with [X] requests/second
   - Respect X-RateLimit-* headers
   - Parse Retry-After header for 429 responses

2. RETRY LOGIC:
   - Exponential backoff with full jitter
   - Max [N] retries
   - Retry on: 429, 500, 502, 503, 504, network errors
   - Do NOT retry on: 4xx client errors (except 429)

3. CIRCUIT BREAKER:
   - Open after [N] consecutive failures
   - Half-open recovery after [X] seconds
   - Track per-endpoint failure rates

4. TIMEOUTS:
   - Connection timeout: [X]ms
   - Request timeout: [X]ms
   - Use AbortController/AbortSignal

5. QUEUE MANAGEMENT:
   - Max [N] concurrent requests
   - Priority levels: high, normal, low
   - Request deduplication for identical calls

6. OBSERVABILITY:
   - Log all retry attempts with reason
   - Track latency percentiles
   - Expose health check endpoint
   - Alert on circuit breaker state changes

7. ERROR HANDLING:
   - Custom error classes for different failure types
   - Preserve original error context
   - Include request ID in errors for debugging

Include TypeScript types and JSDoc comments."

Key Takeaways

Remember These Patterns

Never trust AI-generated API clients without adding rate limiting and retry logic
Token bucket algorithm provides burst capacity with sustained rate limiting
Exponential backoff with jitter prevents thundering herd problems
Always parse rate limit headers (X-RateLimit-*, Retry-After)
Circuit breakers prevent cascading failures during outages
Queue-based processing gives fine-grained control over request flow
Track metrics to identify rate limit patterns and optimize accordingly

Conclusion

AI-generated API clients are dangerous in production because they optimize for the happy path while ignoring the harsh realities of distributed systems: rate limits, network failures, and cascading outages. By implementing token bucket rate limiting, exponential backoff with jitter, proper 429 handling, circuit breakers, and queue-based processing, you transform brittle code into resilient systems that gracefully handle failure.

The patterns we've covered aren't just nice-to-haves—they're essential for any production system consuming external APIs. Whether you're integrating payment processors, AI services, or third-party data providers, these resilience patterns will save you from costly outages and unhappy users.

This concludes Category 1: Problems & Solutions of our AI in Web Development series. In the next article, we begin Category 2: Effective AI Tool Usage with Mastering GitHub Copilot: Beyond Basic Autocomplete, where we'll explore advanced techniques to maximize your productivity with AI coding assistants.