The database schema is your application's foundation. Get it wrong, and you'll face cascading problems: slow queries, data integrity issues, migration nightmares, and technical debt that compounds with every feature. Yet AI code generators consistently produce schemas that would make any DBA wince—missing indexes on foreign keys, improper normalization, no constraints, and data types chosen without thought.
Even in 2025, with powerful database tools and AI assistants, developers still make elementary mistakes in schema design. These mistakes lead to performance issues, data inconsistency, and mounting technical debt. The problem is amplified when AI generates schemas without understanding your query patterns, data relationships, or performance requirements.
AI tools like ChartDB and DB Designer are improving—they can now highlight normalization issues, suggest missing indexes, and flag constraint problems. But fully automated schema design remains largely experimental. AI is better used for schema review than schema creation.
Critical: Foreign Keys Are NOT Automatically Indexed: PostgreSQL does NOT automatically create indexes on foreign key columns. Without an index, JOIN operations require full table scans. Always create indexes on foreign key columns—this is one of the most common AI schema mistakes.
Why AI Fails at Schema Design
1. Domain Expertise Gap
Good schema design requires understanding:
- Business rules: Which relationships are mandatory vs. optional?
- Data lifecycle: How often is data created, updated, deleted?
- Query patterns: Which columns are filtered, sorted, joined?
- Data volumes: Thousands of rows or billions?
- Consistency requirements: ACID compliance needs?
AI has none of this context unless explicitly provided.
2. No Query Pattern Knowledge
Index design requires knowing how data will be queried. AI generates schemas based on entity relationships, but:
-- AI generates this schema
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER,
status VARCHAR(50),
created_at TIMESTAMP,
total DECIMAL(10,2)
);
-- But doesn't know you'll frequently run:
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';
SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC;
-- So it misses the indexes you actually need:
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
3. Training Data Problems
AI is trained on schemas from public repositories, which often have:
- Tutorial-quality schemas (not production-ready)
- Missing constraints (simplified for learning)
- No performance optimization (not tested at scale)
- Inconsistent naming conventions
4. Premature Optimization Fear
Ironically, while AI over-engineers application code, it under-engineers database schemas. AI seems to treat schema design as "get the structure right, optimize later"—but database optimization is expensive to retrofit.
Common AI Schema Design Flaws
Flaw 1: Missing Indexes
-- AI-generated: No indexes except primary key
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
category_id INTEGER,
price DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER,
product_id INTEGER,
quantity INTEGER,
price DECIMAL(10,2)
);
-- Problems:
-- 1. No index on category_id (slow category filtering)
-- 2. No index on order_items.order_id (slow order lookups)
-- 3. No index on order_items.product_id (slow product history)
-- 4. No composite indexes for common queries
Flaw 2: Missing Foreign Key Constraints
-- AI-generated: References without constraints
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER, -- No REFERENCES!
title VARCHAR(255),
content TEXT
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INTEGER, -- No REFERENCES!
user_id INTEGER, -- No REFERENCES!
content TEXT
);
-- What happens: You can insert comments for non-existent posts
INSERT INTO comments (post_id, user_id, content)
VALUES (99999, 88888, 'Orphaned comment');
-- No error! Data integrity destroyed.
Flaw 3: Improper Normalization (Under-Normalized)
-- AI-generated: Under-normalized (everything in one table)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name VARCHAR(255),
customer_email VARCHAR(255),
customer_phone VARCHAR(50),
customer_address TEXT,
customer_city VARCHAR(100),
customer_country VARCHAR(100),
product_name VARCHAR(255),
product_price DECIMAL(10,2),
product_category VARCHAR(100),
quantity INTEGER,
total DECIMAL(10,2)
);
-- Problems:
-- 1. Customer data repeated for every order
-- 2. Product data repeated for every order item
-- 3. Update anomalies (change customer email in one place, not others)
-- 4. Deletion anomalies (delete order, lose customer data)
-- 5. Insertion anomalies (can't add customer without order)
Flaw 4: Over-Normalization
-- AI-generated: Over-normalized (too many tables)
CREATE TABLE users (id SERIAL PRIMARY KEY);
CREATE TABLE user_emails (id SERIAL, user_id INT, email VARCHAR(255));
CREATE TABLE user_names (id SERIAL, user_id INT, first_name VARCHAR(100));
CREATE TABLE user_last_names (id SERIAL, user_id INT, last_name VARCHAR(100));
CREATE TABLE user_phones (id SERIAL, user_id INT, phone VARCHAR(50));
-- Getting a user requires 5 JOINs!
SELECT u.id, ue.email, un.first_name, uln.last_name, up.phone
FROM users u
LEFT JOIN user_emails ue ON u.id = ue.user_id
LEFT JOIN user_names un ON u.id = un.user_id
LEFT JOIN user_last_names uln ON u.id = uln.user_id
LEFT JOIN user_phones up ON u.id = up.user_id
WHERE u.id = 1;
Flaw 5: Wrong Data Types
-- AI-generated: Poor data type choices
CREATE TABLE events (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
date VARCHAR(50), -- Should be DATE
time VARCHAR(50), -- Should be TIME
price VARCHAR(20), -- Should be DECIMAL
is_active VARCHAR(10), -- Should be BOOLEAN
attendee_count VARCHAR(10), -- Should be INTEGER
metadata TEXT -- Should be JSONB for queries
);
-- Problems:
-- 1. Can't use date functions on VARCHAR date
-- 2. Can't do numeric comparisons on VARCHAR price
-- 3. Storage inefficiency (VARCHAR vs native types)
-- 4. No type validation (can insert "not a number" as price)
Flaw 6: No Soft Delete Strategy
-- AI-generated: No deletion strategy
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE,
name VARCHAR(255)
);
-- Problem: Hard delete loses audit trail
DELETE FROM users WHERE id = 1;
-- User gone forever, along with any audit history
-- Better: Soft delete with index
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255),
name VARCHAR(255),
deleted_at TIMESTAMP NULL,
UNIQUE(email) WHERE deleted_at IS NULL -- Partial unique index
);
CREATE INDEX idx_users_active ON users(id) WHERE deleted_at IS NULL;
Indexing Strategies
Index Decision Framework
Choose the right index type for your query patterns:
- B-tree (default): Exact match lookups (
WHERE email = ?) and range queries (WHERE created_at > ?) - Composite B-tree: Multiple columns filtered (
WHERE status = ? AND created_at > ?) - GIN/GiST: Full-text search (
WHERE content @@ 'search') - GIN: JSONB queries (
WHERE metadata @> '{"key": "value"}') - Partial index: Filtered queries (
WHERE status = 'active') - Partial unique: Unique with conditions (unique email for non-deleted users)
Index Best Practices
-- 1. Always index foreign keys
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- 2. Composite indexes for common query patterns
-- Column order matters: most selective first, or match query order
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- 3. Partial indexes for filtered queries
-- If 90% of queries filter for active status:
CREATE INDEX idx_orders_active ON orders(created_at)
WHERE status = 'active';
-- 4. Covering indexes to avoid table lookups
CREATE INDEX idx_orders_summary ON orders(user_id, created_at)
INCLUDE (status, total);
-- 5. Index for sorting
CREATE INDEX idx_orders_recent ON orders(created_at DESC);
-- 6. Don't over-index (each index slows writes)
-- Monitor index usage:
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey';
-- These indexes are never used - consider dropping
Normalization Guide
Normal Forms Explained
-- UNNORMALIZED (0NF): Repeating groups
CREATE TABLE orders_bad (
order_id INT,
customer_name VARCHAR(255),
product1_name VARCHAR(255),
product1_qty INT,
product2_name VARCHAR(255),
product2_qty INT,
product3_name VARCHAR(255),
product3_qty INT
-- What if order has 4 products?
);
-- FIRST NORMAL FORM (1NF): Eliminate repeating groups
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name VARCHAR(255)
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_name VARCHAR(255),
quantity INT
);
-- SECOND NORMAL FORM (2NF): Remove partial dependencies
-- (All non-key columns depend on the ENTIRE primary key)
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10,2)
);
CREATE TABLE order_items_2nf (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT
-- product_name is now in products table
);
-- THIRD NORMAL FORM (3NF): Remove transitive dependencies
-- (Non-key columns don't depend on other non-key columns)
-- Bad: category_name depends on category_id, not product_id
CREATE TABLE products_bad (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
category_id INT,
category_name VARCHAR(255) -- Transitive dependency!
);
-- Good: category_name in separate table
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE products_3nf (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
category_id INT REFERENCES categories(id)
);
When to Denormalize
-- Denormalization is acceptable when:
-- 1. Read performance is critical and writes are rare
-- 2. Data rarely changes (lookup tables)
-- 3. Avoiding complex JOINs in hot paths
-- Example: Storing user display_name on posts
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
author_name VARCHAR(255), -- Denormalized from users
title VARCHAR(255),
content TEXT
);
-- Trade-off: Faster reads, but must update author_name if user changes name
-- Use triggers or application logic to maintain consistency
CREATE OR REPLACE FUNCTION update_post_author_names()
RETURNS TRIGGER AS $$
BEGIN
UPDATE posts SET author_name = NEW.name WHERE user_id = NEW.id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_author_names
AFTER UPDATE OF name ON users
FOR EACH ROW EXECUTE FUNCTION update_post_author_names();
Foreign Key Best Practices
Constraint Naming Convention
-- Naming pattern: fk_{child_table}_{parent_table}
ALTER TABLE orders
ADD CONSTRAINT fk_orders_users
FOREIGN KEY (user_id) REFERENCES users(id);
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_orders
FOREIGN KEY (order_id) REFERENCES orders(id);
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_products
FOREIGN KEY (product_id) REFERENCES products(id);
Cascading Actions
-- ON DELETE CASCADE: Delete children when parent deleted
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INT REFERENCES posts(id) ON DELETE CASCADE,
content TEXT
);
-- Deleting a post automatically deletes its comments
-- ON DELETE SET NULL: Preserve children, null the reference
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
author_id INT REFERENCES users(id) ON DELETE SET NULL,
title VARCHAR(255)
);
-- Deleting a user keeps posts but sets author_id to NULL
-- ON DELETE RESTRICT (default): Prevent deletion if children exist
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id) ON DELETE RESTRICT
);
-- Can't delete user if they have orders
-- Choose based on business rules:
-- CASCADE: Comments, likes, notifications (derived data)
-- SET NULL: Posts, orders (preserve history)
-- RESTRICT: Financial records, audit logs (prevent data loss)
Avoiding Circular Dependencies
-- BAD: Circular foreign keys
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
manager_id INT REFERENCES employees(id) -- Circular!
);
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
department_id INT REFERENCES departments(id) -- Circular!
);
-- GOOD: Break the cycle with nullable FK or junction table
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
department_id INT REFERENCES departments(id)
);
-- Add manager relationship separately
ALTER TABLE departments
ADD COLUMN manager_id INT REFERENCES employees(id);
-- Or use a junction table for many-to-many management roles
EXPLAIN Query Analysis
Use EXPLAIN to identify schema performance issues.
Reading EXPLAIN Output
-- Basic EXPLAIN
EXPLAIN SELECT * FROM orders WHERE user_id = 1;
-- EXPLAIN ANALYZE (actually runs the query)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 1;
-- EXPLAIN with all options (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 1;
Identifying Problems
-- Problem: Sequential Scan on large table
EXPLAIN SELECT * FROM orders WHERE user_id = 1;
-- Output:
-- Seq Scan on orders (cost=0.00..1250.00 rows=50 width=100)
-- Filter: (user_id = 1)
--
-- "Seq Scan" = Full table scan = MISSING INDEX!
-- Solution: Add index
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- After index:
-- Index Scan using idx_orders_user_id on orders (cost=0.29..8.31 rows=50)
-- Index Cond: (user_id = 1)
Common EXPLAIN Red Flags
- Seq Scan on large table: Full table scan - add appropriate index
- Nested Loop with high rows: O(n*m) join - add index on join column
- Sort with high cost: In-memory/disk sort - add index on ORDER BY column
- Hash Join with work_mem exceeded: Spilling to disk - increase work_mem or add index
- Filter with low selectivity: Index not used effectively - composite or partial index
EXPLAIN Helper Script
// scripts/analyze-slow-queries.js
const { Pool } = require('pg');
async function analyzeQuery(query, params = []) {
const pool = new Pool();
try {
// Get execution plan
const explainResult = await pool.query(
`EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) ${query}`,
params
);
const plan = explainResult.rows[0]['QUERY PLAN'][0];
// Extract key metrics
const analysis = {
totalTime: plan['Execution Time'],
planningTime: plan['Planning Time'],
nodeType: plan.Plan['Node Type'],
actualRows: plan.Plan['Actual Rows'],
estimatedRows: plan.Plan['Plan Rows'],
sharedHit: plan.Plan['Shared Hit Blocks'],
sharedRead: plan.Plan['Shared Read Blocks'],
};
// Flag issues
const issues = [];
if (plan.Plan['Node Type'] === 'Seq Scan' && plan.Plan['Actual Rows'] > 1000) {
issues.push('Sequential scan on large result - consider adding index');
}
if (analysis.actualRows > analysis.estimatedRows * 10) {
issues.push('Row estimate significantly off - run ANALYZE on table');
}
if (analysis.sharedRead > analysis.sharedHit) {
issues.push('More disk reads than cache hits - query not cached');
}
return { analysis, issues, fullPlan: plan };
} finally {
await pool.end();
}
}
// Usage
analyzeQuery('SELECT * FROM orders WHERE status = $1', ['pending'])
.then(result => console.log(JSON.stringify(result, null, 2)));
Schema Linting Tools
Database CI/CD with Schema Checks
# .github/workflows/database-lint.yml
name: Database Schema Lint
on:
pull_request:
paths:
- 'prisma/schema.prisma'
- 'migrations/**'
jobs:
lint-schema:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Validate Prisma schema
run: npx prisma validate
- name: Check for schema drift
run: npx prisma migrate diff --from-schema-datamodel prisma/schema.prisma --to-schema-datasource prisma/schema.prisma --exit-code
- name: Run schema linting
run: node scripts/lint-schema.js
test-migrations:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- name: Run migrations on test database
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test
run: |
npx prisma migrate deploy
npx prisma db seed
Custom Schema Linter
// scripts/lint-schema.js
const fs = require('fs');
function lintPrismaSchema(schemaPath) {
const schema = fs.readFileSync(schemaPath, 'utf-8');
const issues = [];
// Check for missing indexes on foreign keys
const relationFields = schema.match(/@relation\([^)]+\)/g) || [];
const indexFields = schema.match(/@@index\(\[([^\]]+)\]\)/g) || [];
// Extract field names from relations
const foreignKeyFields = [];
relationFields.forEach(rel => {
const match = rel.match(/fields:\s*\[([^\]]+)\]/);
if (match) {
foreignKeyFields.push(...match[1].split(',').map(f => f.trim()));
}
});
// Check if foreign keys have indexes
foreignKeyFields.forEach(fk => {
const hasIndex = indexFields.some(idx => idx.includes(fk));
if (!hasIndex) {
issues.push({
type: 'warning',
message: `Foreign key "${fk}" should have an index for JOIN performance`
});
}
});
// Check for missing @updatedAt
const models = schema.match(/model\s+\w+\s*\{[^}]+\}/g) || [];
models.forEach(model => {
const modelName = model.match(/model\s+(\w+)/)[1];
if (!model.includes('@updatedAt') && !model.includes('updatedAt')) {
issues.push({
type: 'info',
message: `Model "${modelName}" might need an updatedAt field`
});
}
});
// Check for String fields that should be enums
const stringFields = schema.match(/\w+\s+String\s*(@[^\n]+)?/g) || [];
stringFields.forEach(field => {
if (field.includes('status') || field.includes('type') || field.includes('role')) {
issues.push({
type: 'warning',
message: `Field "${field.split(' ')[0]}" might be better as an enum`
});
}
});
return issues;
}
// Run linter
const issues = lintPrismaSchema('./prisma/schema.prisma');
if (issues.length > 0) {
console.log('Schema lint issues found:\n');
issues.forEach(issue => {
console.log(`[${issue.type.toUpperCase()}] ${issue.message}`);
});
const hasErrors = issues.some(i => i.type === 'error');
if (hasErrors) process.exit(1);
} else {
console.log('Schema lint: No issues found');
}
Prisma Migration Best Practices
Schema-First Workflow
// prisma/schema.prisma
generator client {
provider = "prisma-client-js"
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
model User {
id Int @id @default(autoincrement())
email String @unique
name String?
posts Post[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([email])
@@index([createdAt])
}
model Post {
id Int @id @default(autoincrement())
title String
content String?
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id], onDelete: Cascade)
authorId Int
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
// Index for common queries
@@index([authorId])
@@index([published, createdAt])
}
Migration Commands
# Development: Create and apply migration
npx prisma migrate dev --name add_user_posts
# Production: Apply pending migrations
npx prisma migrate deploy
# Check migration status
npx prisma migrate status
# Reset database (development only!)
npx prisma migrate reset
# Generate migration SQL without applying
npx prisma migrate diff \
--from-schema-datasource prisma/schema.prisma \
--to-schema-datamodel prisma/schema.prisma \
--script
Expand and Contract Pattern
-- Safe schema changes using expand and contract
-- PHASE 1: EXPAND - Add new column alongside old
ALTER TABLE users ADD COLUMN email_new VARCHAR(255);
-- PHASE 2: MIGRATE - Copy data to new column
UPDATE users SET email_new = email WHERE email_new IS NULL;
-- PHASE 3: UPDATE CODE - Application writes to both columns
-- PHASE 4: CONTRACT - Remove old column (after verification)
ALTER TABLE users DROP COLUMN email;
ALTER TABLE users RENAME COLUMN email_new TO email;
Safe Migration Checklist
# Before deploying migrations:
1. Review generated SQL:
npx prisma migrate diff --from-schema-datasource --to-schema-datamodel --script
2. Test on staging with production-like data
3. Check for breaking changes:
- Column drops (data loss)
- Type changes (may fail for existing data)
- NOT NULL on existing columns
4. Plan for rollback:
- Keep rollback migration ready
- Test rollback procedure
5. Consider deployment order:
- New columns: deploy migration first, then code
- Removed columns: deploy code first, then migration
Schema Design Prompting for AI
Bad Prompt (Produces Flawed Schema)
// DON'T: Vague prompt
"Create a database schema for an e-commerce app"
// AI will miss: indexes, foreign keys, constraints, proper types
Good Prompt (Comprehensive Schema)
"Create a PostgreSQL schema for an e-commerce app with these requirements:
TABLES NEEDED:
- users (customers and admins)
- products (with categories)
- orders (with status tracking)
- order_items (line items)
REQUIREMENTS:
1. Use proper data types (not VARCHAR for everything)
2. Add FOREIGN KEY constraints with appropriate ON DELETE actions
3. Create indexes on ALL foreign key columns
4. Create indexes for these query patterns:
- Find orders by user
- Find orders by status and date range
- Find products by category
- Search products by name (case-insensitive)
5. Include:
- created_at and updated_at timestamps on all tables
- Soft delete (deleted_at) on users and products
- UNIQUE constraints where appropriate
- NOT NULL constraints where appropriate
- CHECK constraints for status fields
6. Use enums for:
- Order status (pending, paid, shipped, delivered, cancelled)
- User role (customer, admin)
7. Naming conventions:
- snake_case for table and column names
- FK constraint: fk_{child_table}_{parent_table}
- Index: idx_{table}_{column(s)}
Generate the complete SQL DDL with all constraints and indexes."
Schema Review Prompt
"Review this database schema for issues:
[paste schema SQL]
CHECK FOR:
1. Missing indexes on foreign keys
2. Missing indexes for common query patterns
3. Normalization issues (too much or too little)
4. Missing constraints (FK, NOT NULL, CHECK, UNIQUE)
5. Poor data type choices
6. Missing audit columns (created_at, updated_at)
7. Circular foreign key dependencies
8. Missing cascade rules
9. Performance concerns for large tables
Provide specific recommendations with SQL fixes."
Schema Review Checklist
Pre-Production Schema Checklist
- Constraints: All tables have PK, all FKs have constraints, appropriate ON DELETE actions, NOT NULL on required fields, UNIQUE on natural keys, CHECK on status fields
- Indexes: Index on every FK column, indexes for WHERE/ORDER BY columns, composite indexes for multi-column queries, partial indexes for filtered queries, no unused indexes
- Data Types: DATE/TIMESTAMP for dates, DECIMAL for money, BOOLEAN for true/false, INTEGER for counts, ENUM/CHECK for status, JSONB for structured metadata
- Normalization: At least 3NF for transactional data, no repeating groups, no partial/transitive dependencies, intentional denormalization documented
- Operations: created_at and updated_at on all tables, soft delete strategy defined, migration plan ready, backup and recovery tested
Key Takeaways
Database Schema Essentials
- AI Lacks Query Pattern Knowledge: AI generates schemas based on entity relationships but doesn't know your actual query patterns—always add indexes based on how you'll actually query the data
- Index All Foreign Keys: Foreign keys are NOT automatically indexed—without indexes, JOINs require full table scans (most common AI schema mistake)
- Enforce Referential Integrity: AI often omits FK constraints entirely—without them, invalid data silently enters your database
- Target Third Normal Form: Normalize to 3NF to eliminate data anomalies—denormalize intentionally for specific performance needs, not by accident
- Use EXPLAIN to Validate: Run EXPLAIN on common queries to identify sequential scans and missing indexes
- Use AI for Review, Not Creation: AI is better at reviewing schemas than creating them—design based on domain knowledge, then ask AI to check for issues
Conclusion
Database schema design is where AI assistance requires the most human expertise. AI can help generate boilerplate and review for common issues, but it cannot understand your business domain, query patterns, or performance requirements.
The key is to design your schema based on domain knowledge, then use AI for review and refinement. Document your query patterns before writing the schema. Add indexes based on actual usage, not just foreign key relationships. Use EXPLAIN to validate performance before going to production.
Remember: The database schema is your application's foundation. Taking the time to get it right saves countless hours of performance debugging and migration headaches later.
In our next article, we'll explore CI/CD Pipeline Configuration Errors: When AI Misunderstands Build Processes, examining how AI generates broken deployment configurations.