Natural language interfaces are transforming how users interact with web applications. Instead of navigating complex menus or learning query syntax, users simply describe what they want in plain English. Companies implementing AI-powered chatbots report 60% reduction in support ticket volume and significant improvements in user satisfaction. OpenAI's API makes it possible to add this intelligence to any web application.
In this comprehensive guide, we'll explore building conversational interfaces from the ground up. You'll learn how to integrate the OpenAI API securely, implement streaming responses for real-time interactions, build semantic search with embeddings, manage conversation context effectively, and create production-ready chatbots that handle edge cases gracefully.
OpenAI API Integration Fundamentals
The foundation of any natural language interface is a robust API integration. OpenAI's Chat Completions API provides the intelligence, but implementing it correctly requires careful attention to security, error handling, and cost management.
Secure Backend Implementation
Never expose your OpenAI API key to the frontend. All API calls should go through your backend server:
// lib/openai.ts
import OpenAI from 'openai';
// Initialize OpenAI client with API key from environment
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export interface ChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
export interface ChatCompletionOptions {
model?: string;
temperature?: number;
maxTokens?: number;
stream?: boolean;
}
export async function createChatCompletion(
messages: ChatMessage[],
options: ChatCompletionOptions = {}
): Promise {
const {
model = 'gpt-4-turbo-preview',
temperature = 0.7,
maxTokens = 1000
} = options;
try {
const response = await openai.chat.completions.create({
model,
messages,
temperature,
max_tokens: maxTokens,
});
return response.choices[0]?.message?.content || '';
} catch (error) {
if (error instanceof OpenAI.APIError) {
throw new ChatAPIError(
error.message,
error.status,
error.code
);
}
throw error;
}
}
// Custom error class for API errors
export class ChatAPIError extends Error {
constructor(
message: string,
public status: number | undefined,
public code: string | null | undefined
) {
super(message);
this.name = 'ChatAPIError';
}
}
// API route handler (Next.js App Router)
// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { createChatCompletion, ChatMessage } from '@/lib/openai';
import { rateLimit } from '@/lib/rate-limit';
import { validateMessages } from '@/lib/validation';
export async function POST(request: NextRequest) {
// Rate limiting per IP
const ip = request.ip || 'unknown';
const rateLimitResult = await rateLimit(ip, {
limit: 20,
window: 60 * 1000 // 1 minute
});
if (!rateLimitResult.success) {
return NextResponse.json(
{ error: 'Rate limit exceeded. Please try again later.' },
{
status: 429,
headers: {
'X-RateLimit-Limit': rateLimitResult.limit.toString(),
'X-RateLimit-Remaining': rateLimitResult.remaining.toString(),
'X-RateLimit-Reset': rateLimitResult.reset.toString()
}
}
);
}
try {
const body = await request.json();
const { messages, options } = body;
// Validate input messages
const validation = validateMessages(messages);
if (!validation.valid) {
return NextResponse.json(
{ error: validation.error },
{ status: 400 }
);
}
// Add system prompt for your application
const systemMessage: ChatMessage = {
role: 'system',
content: `You are a helpful customer support assistant for TechCorp.
You help users with product questions, troubleshooting, and account issues.
Be concise, friendly, and professional.
If you don't know something, admit it and offer to connect them with a human agent.`
};
const fullMessages = [systemMessage, ...messages];
const response = await createChatCompletion(fullMessages, options);
return NextResponse.json({
message: response,
usage: {
remaining: rateLimitResult.remaining
}
});
} catch (error) {
console.error('Chat API error:', error);
if (error instanceof ChatAPIError) {
return NextResponse.json(
{ error: error.message },
{ status: error.status || 500 }
);
}
return NextResponse.json(
{ error: 'An unexpected error occurred' },
{ status: 500 }
);
}
}
Implementing Streaming Responses
Users expect immediate feedback. Streaming responses show text as it's generated, creating a more natural conversational experience. Server-Sent Events (SSE) provide the foundation for real-time streaming:
// lib/openai-stream.ts
import OpenAI from 'openai';
import { ChatMessage } from './openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function* streamChatCompletion(
messages: ChatMessage[],
options: { model?: string; temperature?: number } = {}
): AsyncGenerator {
const { model = 'gpt-4-turbo-preview', temperature = 0.7 } = options;
const stream = await openai.chat.completions.create({
model,
messages,
temperature,
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
yield content;
}
}
}
// Streaming API route
// app/api/chat/stream/route.ts
import { NextRequest } from 'next/server';
import { streamChatCompletion } from '@/lib/openai-stream';
export async function POST(request: NextRequest) {
const { messages } = await request.json();
const systemMessage = {
role: 'system' as const,
content: 'You are a helpful assistant. Respond in a conversational manner.'
};
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const generator = streamChatCompletion(
[systemMessage, ...messages],
{ model: 'gpt-4-turbo-preview' }
);
for await (const chunk of generator) {
// Send each chunk as SSE data
const data = `data: ${JSON.stringify({ content: chunk })}\n\n`;
controller.enqueue(encoder.encode(data));
}
// Signal stream completion
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
controller.close();
} catch (error) {
const errorData = `data: ${JSON.stringify({
error: 'Stream error occurred'
})}\n\n`;
controller.enqueue(encoder.encode(errorData));
controller.close();
}
}
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
});
}
// React hook for consuming streaming responses
// hooks/useStreamingChat.ts
import { useState, useCallback, useRef } from 'react';
interface Message {
id: string;
role: 'user' | 'assistant';
content: string;
isStreaming?: boolean;
}
export function useStreamingChat() {
const [messages, setMessages] = useState([]);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState(null);
const abortControllerRef = useRef(null);
const sendMessage = useCallback(async (content: string) => {
const userMessage: Message = {
id: Date.now().toString(),
role: 'user',
content
};
const assistantMessage: Message = {
id: (Date.now() + 1).toString(),
role: 'assistant',
content: '',
isStreaming: true
};
setMessages(prev => [...prev, userMessage, assistantMessage]);
setIsLoading(true);
setError(null);
// Create abort controller for cancellation
abortControllerRef.current = new AbortController();
try {
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [...messages, userMessage].map(m => ({
role: m.role,
content: m.content
}))
}),
signal: abortControllerRef.current.signal
});
if (!response.ok) {
throw new Error('Failed to send message');
}
const reader = response.body?.getReader();
const decoder = new TextDecoder();
if (!reader) {
throw new Error('No response body');
}
let accumulatedContent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') {
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, isStreaming: false }
: m
)
);
continue;
}
try {
const parsed = JSON.parse(data);
if (parsed.error) {
throw new Error(parsed.error);
}
if (parsed.content) {
accumulatedContent += parsed.content;
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, content: accumulatedContent }
: m
)
);
}
} catch (parseError) {
// Skip invalid JSON lines
}
}
}
}
} catch (err) {
if (err instanceof Error && err.name === 'AbortError') {
// User cancelled the request
return;
}
setError(err instanceof Error ? err.message : 'An error occurred');
} finally {
setIsLoading(false);
abortControllerRef.current = null;
}
}, [messages]);
const cancelStream = useCallback(() => {
abortControllerRef.current?.abort();
}, []);
const clearMessages = useCallback(() => {
setMessages([]);
setError(null);
}, []);
return {
messages,
isLoading,
error,
sendMessage,
cancelStream,
clearMessages
};
}
Building Semantic Search with Embeddings
Semantic search finds content by meaning rather than exact keyword matches. OpenAI's text embeddings convert text into numerical vectors that capture semantic meaning. Combined with a vector database like Pinecone or Supabase Vector, you can build powerful search experiences:
// lib/embeddings.ts
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function createEmbedding(text: string): Promise {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
export async function createBatchEmbeddings(
texts: string[]
): Promise {
// OpenAI supports up to 2048 inputs per batch
const batchSize = 100;
const embeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch,
});
embeddings.push(...response.data.map(d => d.embedding));
}
return embeddings;
}
// Cosine similarity for comparing embeddings
export function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
// lib/semantic-search.ts
import { createClient } from '@supabase/supabase-js';
import { createEmbedding, createBatchEmbeddings } from './embeddings';
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
);
interface Document {
id: string;
content: string;
metadata: Record;
}
interface SearchResult extends Document {
similarity: number;
}
export class SemanticSearchEngine {
private tableName: string;
constructor(tableName: string = 'documents') {
this.tableName = tableName;
}
// Index documents for search
async indexDocuments(documents: Document[]): Promise {
const contents = documents.map(d => d.content);
const embeddings = await createBatchEmbeddings(contents);
const records = documents.map((doc, index) => ({
id: doc.id,
content: doc.content,
metadata: doc.metadata,
embedding: embeddings[index]
}));
// Upsert in batches
const batchSize = 100;
for (let i = 0; i < records.length; i += batchSize) {
const batch = records.slice(i, i + batchSize);
const { error } = await supabase
.from(this.tableName)
.upsert(batch, { onConflict: 'id' });
if (error) {
throw new Error(`Failed to index documents: ${error.message}`);
}
}
}
// Semantic search
async search(
query: string,
options: { limit?: number; threshold?: number; filter?: Record } = {}
): Promise {
const { limit = 10, threshold = 0.7, filter } = options;
const queryEmbedding = await createEmbedding(query);
// Use Supabase's vector similarity function
let rpcQuery = supabase.rpc('match_documents', {
query_embedding: queryEmbedding,
match_threshold: threshold,
match_count: limit
});
const { data, error } = await rpcQuery;
if (error) {
throw new Error(`Search failed: ${error.message}`);
}
return data.map((row: any) => ({
id: row.id,
content: row.content,
metadata: row.metadata,
similarity: row.similarity
}));
}
// Hybrid search combining semantic and keyword
async hybridSearch(
query: string,
options: { limit?: number; semanticWeight?: number } = {}
): Promise {
const { limit = 10, semanticWeight = 0.7 } = options;
const keywordWeight = 1 - semanticWeight;
// Get semantic results
const semanticResults = await this.search(query, { limit: limit * 2 });
// Get keyword results using full-text search
const { data: keywordResults } = await supabase
.from(this.tableName)
.select('id, content, metadata')
.textSearch('content', query, { type: 'websearch' })
.limit(limit * 2);
// Combine and re-rank results
const scoreMap = new Map();
semanticResults.forEach((result, index) => {
const normalizedScore = (semanticResults.length - index) / semanticResults.length;
scoreMap.set(result.id, {
document: result,
score: normalizedScore * semanticWeight
});
});
keywordResults?.forEach((result, index) => {
const normalizedScore = (keywordResults.length - index) / keywordResults.length;
const existing = scoreMap.get(result.id);
if (existing) {
existing.score += normalizedScore * keywordWeight;
} else {
scoreMap.set(result.id, {
document: result,
score: normalizedScore * keywordWeight
});
}
});
return Array.from(scoreMap.values())
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map(({ document, score }) => ({
...document,
similarity: score
}));
}
}
// SQL function for Supabase (run in SQL editor)
/*
create or replace function match_documents (
query_embedding vector(1536),
match_threshold float,
match_count int
)
returns table (
id text,
content text,
metadata jsonb,
similarity float
)
language sql stable
as $$
select
documents.id,
documents.content,
documents.metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where 1 - (documents.embedding <=> query_embedding) > match_threshold
order by similarity desc
limit match_count;
$$;
*/
Managing Conversation Context
Effective chatbots maintain context across multiple turns. This requires careful management of conversation history, token limits, and memory strategies:
// lib/conversation-manager.ts
import { ChatMessage } from './openai';
import { encode, decode } from 'gpt-tokenizer';
interface ConversationConfig {
maxTokens: number;
maxMessages: number;
summarizeThreshold: number;
}
interface ConversationState {
messages: ChatMessage[];
summary?: string;
metadata: {
userId: string;
startedAt: Date;
lastMessageAt: Date;
turnCount: number;
};
}
export class ConversationManager {
private config: ConversationConfig;
constructor(config: Partial = {}) {
this.config = {
maxTokens: 4000,
maxMessages: 20,
summarizeThreshold: 15,
...config
};
}
// Count tokens in messages
private countTokens(messages: ChatMessage[]): number {
let total = 0;
for (const message of messages) {
// Add overhead for message formatting
total += encode(message.content).length + 4;
}
return total;
}
// Truncate messages to fit token limit
private truncateMessages(
messages: ChatMessage[],
systemMessage: ChatMessage
): ChatMessage[] {
const systemTokens = encode(systemMessage.content).length + 4;
const availableTokens = this.config.maxTokens - systemTokens;
const result: ChatMessage[] = [];
let currentTokens = 0;
// Keep messages from most recent, working backwards
for (let i = messages.length - 1; i >= 0; i--) {
const message = messages[i];
const messageTokens = encode(message.content).length + 4;
if (currentTokens + messageTokens > availableTokens) {
break;
}
result.unshift(message);
currentTokens += messageTokens;
}
return result;
}
// Summarize older messages
async summarizeConversation(
messages: ChatMessage[],
openaiClient: any
): Promise {
const conversationText = messages
.map(m => `${m.role}: ${m.content}`)
.join('\n');
const response = await openaiClient.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [
{
role: 'system',
content: 'Summarize the following conversation concisely, preserving key information, decisions, and context.'
},
{
role: 'user',
content: conversationText
}
],
max_tokens: 300
});
return response.choices[0]?.message?.content || '';
}
// Prepare messages for API call with context management
async prepareMessages(
state: ConversationState,
systemPrompt: string,
openaiClient?: any
): Promise {
const systemMessage: ChatMessage = {
role: 'system',
content: systemPrompt
};
let messages = [...state.messages];
// Check if we need to summarize
if (messages.length > this.config.summarizeThreshold && openaiClient) {
const oldMessages = messages.slice(0, -5);
const recentMessages = messages.slice(-5);
const summary = await this.summarizeConversation(oldMessages, openaiClient);
// Replace old messages with summary
const summaryMessage: ChatMessage = {
role: 'system',
content: `Previous conversation summary: ${summary}`
};
messages = [summaryMessage, ...recentMessages];
}
// Truncate if still too long
const truncatedMessages = this.truncateMessages(messages, systemMessage);
return [systemMessage, ...truncatedMessages];
}
// Add message to conversation
addMessage(
state: ConversationState,
message: ChatMessage
): ConversationState {
const newMessages = [...state.messages, message];
// Enforce max messages limit
const trimmedMessages = newMessages.length > this.config.maxMessages
? newMessages.slice(-this.config.maxMessages)
: newMessages;
return {
...state,
messages: trimmedMessages,
metadata: {
...state.metadata,
lastMessageAt: new Date(),
turnCount: state.metadata.turnCount + 1
}
};
}
}
// Conversation persistence with Redis
// lib/conversation-store.ts
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL!);
export class ConversationStore {
private prefix: string = 'conversation:';
private ttl: number = 60 * 60 * 24; // 24 hours
async save(conversationId: string, state: ConversationState): Promise {
const key = this.prefix + conversationId;
await redis.setex(key, this.ttl, JSON.stringify(state));
}
async get(conversationId: string): Promise {
const key = this.prefix + conversationId;
const data = await redis.get(key);
if (!data) return null;
const state = JSON.parse(data);
state.metadata.startedAt = new Date(state.metadata.startedAt);
state.metadata.lastMessageAt = new Date(state.metadata.lastMessageAt);
return state;
}
async delete(conversationId: string): Promise {
const key = this.prefix + conversationId;
await redis.del(key);
}
async extend(conversationId: string): Promise {
const key = this.prefix + conversationId;
await redis.expire(key, this.ttl);
}
}
Building the Chat UI Component
A well-designed chat interface is crucial for user experience. Here's a production-ready React component with accessibility features:
// components/ChatInterface.tsx
import React, { useState, useRef, useEffect, KeyboardEvent } from 'react';
import { useStreamingChat } from '@/hooks/useStreamingChat';
import { Message } from '@/types';
interface ChatInterfaceProps {
systemPrompt?: string;
placeholder?: string;
welcomeMessage?: string;
onError?: (error: string) => void;
}
export function ChatInterface({
systemPrompt,
placeholder = 'Type your message...',
welcomeMessage = 'Hello! How can I help you today?',
onError
}: ChatInterfaceProps) {
const [input, setInput] = useState('');
const messagesEndRef = useRef(null);
const inputRef = useRef(null);
const {
messages,
isLoading,
error,
sendMessage,
cancelStream
} = useStreamingChat();
// Auto-scroll to bottom
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
// Report errors
useEffect(() => {
if (error && onError) {
onError(error);
}
}, [error, onError]);
// Auto-resize textarea
useEffect(() => {
if (inputRef.current) {
inputRef.current.style.height = 'auto';
inputRef.current.style.height = `${inputRef.current.scrollHeight}px`;
}
}, [input]);
const handleSubmit = async () => {
if (!input.trim() || isLoading) return;
const message = input.trim();
setInput('');
await sendMessage(message);
};
const handleKeyDown = (e: KeyboardEvent) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
handleSubmit();
}
};
return (
{/* Messages area */}
{/* Welcome message */}
{messages.length === 0 && (
{welcomeMessage}
)}
{/* Message list */}
{messages.map((message) => (
))}
{/* Typing indicator */}
{isLoading && messages[messages.length - 1]?.isStreaming && (
)}
{/* Error display */}
{error && (
{error}
)}
{/* Input area */}
);
}
// Individual message component
function ChatMessage({ message }: { message: Message }) {
const isUser = message.role === 'user';
return (
{isUser ? (
) : (
)}
{message.isStreaming && (
)}
);
}
// Parse and render message content with markdown
function MessageContent({ content }: { content: string }) {
// Simple markdown parsing for code blocks
const parts = content.split(/(```[\s\S]*?```)/g);
return (
<>
{parts.map((part, index) => {
if (part.startsWith('```')) {
const match = part.match(/```(\w+)?\n?([\s\S]*?)```/);
const language = match?.[1] || 'text';
const code = match?.[2] || part.slice(3, -3);
return (
{code}
);
}
return {part};
})}
>
);
}
Prompt Engineering and Management
Effective prompts are crucial for consistent, high-quality responses. Implement a structured prompt management system:
// lib/prompts.ts
interface PromptTemplate {
name: string;
version: string;
systemPrompt: string;
examples?: Array<{ user: string; assistant: string }>;
variables?: string[];
}
export const promptTemplates: Record = {
customerSupport: {
name: 'Customer Support Agent',
version: '2.1',
systemPrompt: `You are a helpful customer support agent for {{company_name}}.
Your responsibilities:
1. Answer product questions accurately using the knowledge base
2. Help troubleshoot common issues step-by-step
3. Guide users through account-related tasks
4. Escalate complex issues to human agents when needed
Guidelines:
- Be friendly, professional, and empathetic
- Keep responses concise (under 150 words unless explaining steps)
- Ask clarifying questions when the issue is unclear
- Never make up information - admit when you don't know
- For billing issues, always verify identity first
- Include relevant links to documentation when helpful
Current date: {{current_date}}
User tier: {{user_tier}}`,
examples: [
{
user: 'How do I reset my password?',
assistant: 'I can help you reset your password! Here\'s how:\n\n1. Go to the login page\n2. Click "Forgot Password"\n3. Enter your email address\n4. Check your inbox for a reset link (also check spam)\n5. Click the link and create a new password\n\nThe reset link expires in 24 hours. Let me know if you need any help with these steps!'
}
],
variables: ['company_name', 'current_date', 'user_tier']
},
technicalAssistant: {
name: 'Technical Assistant',
version: '1.3',
systemPrompt: `You are a technical assistant helping developers with {{product_name}}.
Your expertise includes:
- API integration and troubleshooting
- Code examples in JavaScript, Python, and TypeScript
- Best practices and optimization tips
- Error debugging and resolution
When providing code:
- Use clear, well-commented examples
- Include error handling
- Follow modern best practices
- Specify language and version requirements
When debugging:
1. Identify the likely cause
2. Explain why the issue occurs
3. Provide a solution with code
4. Suggest preventive measures
API Version: {{api_version}}
Documentation URL: {{docs_url}}`,
variables: ['product_name', 'api_version', 'docs_url']
}
};
export function compilePrompt(
templateName: string,
variables: Record
): string {
const template = promptTemplates[templateName];
if (!template) {
throw new Error(`Prompt template '${templateName}' not found`);
}
let compiled = template.systemPrompt;
// Replace variables
for (const [key, value] of Object.entries(variables)) {
const pattern = new RegExp(`\\{\\{${key}\\}\\}`, 'g');
compiled = compiled.replace(pattern, value);
}
// Check for unreplaced variables
const unreplaced = compiled.match(/\{\{\w+\}\}/g);
if (unreplaced) {
console.warn(`Unreplaced variables in prompt: ${unreplaced.join(', ')}`);
}
return compiled;
}
// Prompt versioning and A/B testing
export class PromptManager {
private activeVersions: Map = new Map();
private metrics: Map = new Map();
async getPrompt(
templateName: string,
variables: Record,
userId?: string
): Promise<{ prompt: string; version: string }> {
// Check for A/B test assignment
const version = this.getAssignedVersion(templateName, userId);
const template = this.getTemplateVersion(templateName, version);
const prompt = this.compile(template, variables);
return { prompt, version };
}
private getAssignedVersion(templateName: string, userId?: string): string {
// Simple A/B testing - hash user to version
if (userId) {
const hash = this.hashString(userId + templateName);
const versions = this.getAvailableVersions(templateName);
return versions[hash % versions.length];
}
return this.activeVersions.get(templateName) || 'default';
}
private hashString(str: string): number {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash);
}
trackMetric(
templateName: string,
version: string,
metric: 'success' | 'failure' | 'thumbsUp' | 'thumbsDown'
): void {
const key = `${templateName}:${version}`;
const current = this.metrics.get(key) || {
success: 0,
failure: 0,
thumbsUp: 0,
thumbsDown: 0
};
current[metric]++;
this.metrics.set(key, current);
}
}
interface PromptMetrics {
success: number;
failure: number;
thumbsUp: number;
thumbsDown: number;
}
RAG: Connecting to Your Knowledge Base
Retrieval-Augmented Generation (RAG) enhances chatbot responses with your specific data. Here's how to implement RAG with semantic search:
// lib/rag-chatbot.ts
import { SemanticSearchEngine } from './semantic-search';
import { createChatCompletion, ChatMessage } from './openai';
import { compilePrompt } from './prompts';
interface RAGConfig {
searchEngine: SemanticSearchEngine;
maxContext: number;
minRelevance: number;
}
interface RAGResponse {
answer: string;
sources: Array<{
id: string;
title: string;
excerpt: string;
relevance: number;
}>;
confidence: 'high' | 'medium' | 'low';
}
export class RAGChatbot {
private config: RAGConfig;
private conversationHistory: ChatMessage[] = [];
constructor(config: RAGConfig) {
this.config = {
maxContext: 3,
minRelevance: 0.75,
...config
};
}
async query(userQuery: string): Promise {
// Step 1: Search for relevant documents
const searchResults = await this.config.searchEngine.search(userQuery, {
limit: this.config.maxContext,
threshold: this.config.minRelevance
});
// Step 2: Build context from search results
const context = searchResults
.map((result, index) => {
return `[Source ${index + 1}]: ${result.content}`;
})
.join('\n\n');
// Step 3: Determine confidence based on search quality
const avgRelevance = searchResults.length > 0
? searchResults.reduce((sum, r) => sum + r.similarity, 0) / searchResults.length
: 0;
const confidence: RAGResponse['confidence'] =
avgRelevance > 0.85 ? 'high' :
avgRelevance > 0.75 ? 'medium' : 'low';
// Step 4: Build system prompt with context
const systemPrompt = `You are a helpful assistant that answers questions based on the provided context.
IMPORTANT GUIDELINES:
1. Only answer based on the provided context
2. If the context doesn't contain enough information, say so clearly
3. Cite sources by number when using information from them
4. Be concise and direct
CONTEXT:
${context || 'No relevant context found.'}
If no relevant context is found, politely explain that you don't have information about that topic and suggest the user contact support.`;
// Step 5: Generate response
const messages: ChatMessage[] = [
{ role: 'system', content: systemPrompt },
...this.conversationHistory.slice(-6), // Keep last 3 exchanges
{ role: 'user', content: userQuery }
];
const answer = await createChatCompletion(messages, {
temperature: 0.3, // Lower temperature for factual responses
maxTokens: 500
});
// Step 6: Update conversation history
this.conversationHistory.push(
{ role: 'user', content: userQuery },
{ role: 'assistant', content: answer }
);
// Step 7: Return structured response
return {
answer,
sources: searchResults.map(r => ({
id: r.id,
title: r.metadata.title || 'Untitled',
excerpt: r.content.substring(0, 200) + '...',
relevance: r.similarity
})),
confidence
};
}
async queryWithFallback(userQuery: string): Promise {
const response = await this.query(userQuery);
// If low confidence, try to provide helpful fallback
if (response.confidence === 'low' && response.sources.length === 0) {
const fallbackPrompt = `The user asked: "${userQuery}"
We couldn't find specific information in our knowledge base. Please:
1. Acknowledge that you don't have specific information
2. Offer general guidance if appropriate
3. Suggest they contact support for personalized help
4. Ask if there's something else you can help with`;
const fallbackAnswer = await createChatCompletion([
{ role: 'system', content: fallbackPrompt },
{ role: 'user', content: userQuery }
]);
return {
...response,
answer: fallbackAnswer
};
}
return response;
}
clearHistory(): void {
this.conversationHistory = [];
}
}
Cost Optimization and Rate Limiting
OpenAI API costs can grow quickly. Implement strategies to optimize costs while maintaining quality:
// lib/cost-optimizer.ts
import { encode } from 'gpt-tokenizer';
interface ModelPricing {
input: number; // per 1K tokens
output: number; // per 1K tokens
}
const MODEL_PRICING: Record = {
'gpt-4-turbo-preview': { input: 0.01, output: 0.03 },
'gpt-4': { input: 0.03, output: 0.06 },
'gpt-3.5-turbo': { input: 0.0005, output: 0.0015 },
'gpt-3.5-turbo-16k': { input: 0.003, output: 0.004 }
};
export class CostOptimizer {
private usageTracker: Map = new Map();
// Estimate cost before making request
estimateCost(
messages: ChatMessage[],
model: string,
estimatedOutputTokens: number = 500
): { inputTokens: number; estimatedCost: number } {
const pricing = MODEL_PRICING[model];
if (!pricing) {
throw new Error(`Unknown model: ${model}`);
}
let inputTokens = 0;
for (const message of messages) {
inputTokens += encode(message.content).length + 4;
}
const inputCost = (inputTokens / 1000) * pricing.input;
const outputCost = (estimatedOutputTokens / 1000) * pricing.output;
return {
inputTokens,
estimatedCost: inputCost + outputCost
};
}
// Choose optimal model based on task complexity
selectModel(
messages: ChatMessage[],
taskComplexity: 'simple' | 'medium' | 'complex'
): string {
const totalTokens = messages.reduce(
(sum, m) => sum + encode(m.content).length,
0
);
// Use GPT-3.5 for simple tasks
if (taskComplexity === 'simple') {
return totalTokens > 4000 ? 'gpt-3.5-turbo-16k' : 'gpt-3.5-turbo';
}
// Use GPT-4 Turbo for complex reasoning
if (taskComplexity === 'complex') {
return 'gpt-4-turbo-preview';
}
// Medium complexity: use GPT-3.5 unless conversation is long
return totalTokens > 2000 ? 'gpt-4-turbo-preview' : 'gpt-3.5-turbo';
}
// Detect task complexity from user input
detectComplexity(userMessage: string): 'simple' | 'medium' | 'complex' {
const message = userMessage.toLowerCase();
// Complex indicators
const complexPatterns = [
/explain.*in detail/i,
/compare.*and/i,
/analyze/i,
/write.*code/i,
/debug/i,
/optimize/i,
/architect/i
];
if (complexPatterns.some(p => p.test(message))) {
return 'complex';
}
// Simple indicators
const simplePatterns = [
/^(yes|no|ok|thanks)/i,
/^what (is|are)/i,
/^how (do|can) i/i,
/\?$/
];
if (simplePatterns.some(p => p.test(message)) && message.length < 100) {
return 'simple';
}
return 'medium';
}
// Track usage by user/tenant
trackUsage(
userId: string,
inputTokens: number,
outputTokens: number,
model: string
): void {
const pricing = MODEL_PRICING[model];
const cost =
(inputTokens / 1000) * pricing.input +
(outputTokens / 1000) * pricing.output;
const current = this.usageTracker.get(userId) || { tokens: 0, cost: 0 };
current.tokens += inputTokens + outputTokens;
current.cost += cost;
this.usageTracker.set(userId, current);
}
// Check if user is within budget
checkBudget(userId: string, dailyLimit: number): boolean {
const usage = this.usageTracker.get(userId);
return !usage || usage.cost < dailyLimit;
}
}
// Response caching for common queries
// lib/response-cache.ts
import Redis from 'ioredis';
import crypto from 'crypto';
const redis = new Redis(process.env.REDIS_URL!);
export class ResponseCache {
private ttl: number = 60 * 60; // 1 hour default
private hashQuery(messages: ChatMessage[]): string {
const content = messages.map(m => `${m.role}:${m.content}`).join('|');
return crypto.createHash('sha256').update(content).digest('hex');
}
async get(messages: ChatMessage[]): Promise {
const hash = this.hashQuery(messages);
return redis.get(`cache:${hash}`);
}
async set(
messages: ChatMessage[],
response: string,
ttl?: number
): Promise {
const hash = this.hashQuery(messages);
await redis.setex(`cache:${hash}`, ttl || this.ttl, response);
}
// Semantic cache using embeddings
async getSemanticMatch(
query: string,
threshold: number = 0.95
): Promise {
// Implementation would use vector similarity
// to find cached responses for semantically similar queries
return null;
}
}
Production Deployment Checklist
Before deploying your chatbot to production, ensure you've addressed these critical areas:
// lib/production-checklist.ts
export const productionChecklist = {
security: [
'API keys stored in environment variables',
'Rate limiting implemented per user/IP',
'Input validation and sanitization',
'Output sanitization for XSS prevention',
'CORS configured correctly',
'Authentication required for API routes'
],
reliability: [
'Error handling for API failures',
'Retry logic with exponential backoff',
'Circuit breaker for API outages',
'Graceful degradation when AI unavailable',
'Health check endpoints',
'Request timeout handling'
],
performance: [
'Response caching for common queries',
'Connection pooling for database',
'Streaming responses enabled',
'Bundle size optimized',
'CDN for static assets'
],
monitoring: [
'Request/response logging',
'Error tracking (Sentry)',
'Usage metrics (tokens, costs)',
'Response quality metrics',
'Latency monitoring',
'Alerting for anomalies'
],
compliance: [
'Privacy policy updated',
'Data retention policies',
'User consent for AI interactions',
'Content moderation for inputs/outputs',
'Audit logging for sensitive operations'
]
};
// Production-ready error handling
export class ProductionChatService {
private circuitBreaker = {
failures: 0,
lastFailure: 0,
isOpen: false,
threshold: 5,
resetTime: 60000 // 1 minute
};
async sendMessage(messages: ChatMessage[]): Promise {
// Check circuit breaker
if (this.isCircuitOpen()) {
return this.getFallbackResponse();
}
try {
const response = await this.callOpenAI(messages);
this.resetCircuitBreaker();
return response;
} catch (error) {
this.recordFailure();
if (this.shouldRetry(error)) {
return this.retryWithBackoff(messages);
}
throw error;
}
}
private isCircuitOpen(): boolean {
if (!this.circuitBreaker.isOpen) return false;
const timeSinceLastFailure = Date.now() - this.circuitBreaker.lastFailure;
if (timeSinceLastFailure > this.circuitBreaker.resetTime) {
this.circuitBreaker.isOpen = false;
this.circuitBreaker.failures = 0;
return false;
}
return true;
}
private recordFailure(): void {
this.circuitBreaker.failures++;
this.circuitBreaker.lastFailure = Date.now();
if (this.circuitBreaker.failures >= this.circuitBreaker.threshold) {
this.circuitBreaker.isOpen = true;
}
}
private getFallbackResponse(): string {
return "I'm currently experiencing technical difficulties. " +
"Please try again in a few minutes or contact support " +
"at support@example.com for immediate assistance.";
}
private async retryWithBackoff(
messages: ChatMessage[],
attempt: number = 1
): Promise {
const maxAttempts = 3;
const baseDelay = 1000;
if (attempt > maxAttempts) {
throw new Error('Max retry attempts exceeded');
}
const delay = baseDelay * Math.pow(2, attempt - 1);
await new Promise(resolve => setTimeout(resolve, delay));
try {
return await this.callOpenAI(messages);
} catch (error) {
return this.retryWithBackoff(messages, attempt + 1);
}
}
}
Key Takeaways
Remember These Points
- Never expose API keys: All OpenAI calls should go through your backend with proper authentication
- Implement streaming: Real-time response streaming dramatically improves perceived performance and user experience
- Use embeddings for search: Semantic search with vector embeddings finds content by meaning, not just keywords
- Manage context carefully: Token limits require smart truncation and summarization strategies
- RAG enhances accuracy: Connecting to your knowledge base ensures accurate, relevant responses
- Optimize costs proactively: Model selection, caching, and usage tracking prevent unexpected bills
- Build for production: Rate limiting, circuit breakers, and fallbacks ensure reliability
Conclusion
Natural language interfaces powered by ChatGPT and similar models are becoming essential for modern web applications. The combination of conversational AI, semantic search, and RAG creates intelligent systems that understand user intent and provide accurate, contextual responses.
Start with a basic integration, then progressively add streaming, semantic search, and conversation management as your needs grow. Focus on user experience by implementing proper error handling, loading states, and fallback behaviors. Monitor costs carefully and optimize model selection based on task complexity.
For further learning, explore the OpenAI Text Generation Guide, LangChain.js documentation for more advanced orchestration, and Pinecone's learning resources for vector database best practices.