1. Primary Index (Clustered Index)
The primary index determines the physical order of data in the table.
-- Creating a table with primary key (automatically creates clustered index)
CREATE TABLE users (
id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
### 2. Secondary Index (Non-Clustered Index)
Secondary indexes point to the location of data in the table without changing the physical order.
-- Creating secondary indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_created_at ON users(created_at);
### 3. Composite Index
Indexes on multiple columns, useful for queries that filter on multiple fields.
-- Composite index for queries filtering by status and created_at
CREATE INDEX idx_users_status_created ON users(status, created_at);
-- This index will be effective for queries like:
-- SELECT * FROM users WHERE status = 'active' AND created_at > '2025-01-01';
-- SELECT * FROM users WHERE status = 'active'; -- Uses leftmost prefix
### 4. Unique Index
Ensures uniqueness while providing fast lookups.
-- Unique index on email
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- This prevents duplicate emails and speeds up email lookups
### 5. Partial Index
Indexes only rows that meet certain conditions (PostgreSQL, SQLite).
-- Index only active users
CREATE INDEX idx_active_users ON users(username) WHERE status = 'active';
-- This is smaller and more efficient for queries on active users
## Indexing Strategies
### 1. Analyze Query Patterns
Before creating indexes, understand your application's query patterns:
-- Common query patterns to analyze
SELECT * FROM users WHERE email = ?; -- Single column lookup
SELECT * FROM users WHERE status = ? AND created_at > ?; -- Multi-column filter
SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC; -- Filter + sort
SELECT COUNT(*) FROM users WHERE status = ?; -- Aggregation
### 2. Index Selectivity
Create indexes on columns with high selectivity (many unique values):
-- Check selectivity
SELECT
column_name,
COUNT(DISTINCT column_name) as unique_values,
COUNT(*) as total_rows,
COUNT(DISTINCT column_name) * 100.0 / COUNT(*) as selectivity_percentage
FROM users;
-- High selectivity (good for indexing): email, username, id
-- Low selectivity (poor for indexing): status, gender, boolean flags
### 3. Covering Indexes
Include all columns needed by a query in the index to avoid table lookups:
-- Covering index for a specific query
CREATE INDEX idx_users_covering ON users(status, created_at)
INCLUDE (username, email);
-- This query can be satisfied entirely from the index:
SELECT username, email FROM users
WHERE status = 'active' AND created_at > '2025-01-01';
## Advanced Indexing Techniques
### 1. Function-Based Indexes
Index the result of functions or expressions:
-- Index for case-insensitive searches
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Index for date-based queries
CREATE INDEX idx_orders_year_month ON orders(EXTRACT(YEAR FROM created_at), EXTRACT(MONTH FROM created_at));
-- Usage
SELECT * FROM users WHERE LOWER(email) = LOWER('user@example.com');
### 2. Hash Indexes
Useful for equality comparisons (PostgreSQL, MySQL):
-- Hash index for exact matches
CREATE INDEX idx_users_email_hash ON users USING HASH(email);
-- Good for: WHERE email = 'specific@email.com'
-- Not good for: WHERE email LIKE '%@gmail.com'
### 3. GIN/GiST Indexes (PostgreSQL)
For complex data types like arrays, JSON, and full-text search:
-- GIN index for JSON data
CREATE INDEX idx_users_preferences ON users USING GIN(preferences);
-- GIN index for array data
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
-- Full-text search index
CREATE INDEX idx_posts_content_fts ON posts USING GIN(to_tsvector('english', content));
-- Usage examples
SELECT * FROM users WHERE preferences @> '{"theme": "dark"}';
SELECT * FROM posts WHERE tags @> ARRAY['javascript', 'react'];
SELECT * FROM posts WHERE to_tsvector('english', content) @@ to_tsquery('database & optimization');
## Index Maintenance and Monitoring
### 1. Monitor Index Usage
-- PostgreSQL: Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- MySQL: Check index usage
SELECT
TABLE_SCHEMA,
TABLE_NAME,
INDEX_NAME,
CARDINALITY
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY CARDINALITY DESC;
### 2. Identify Unused Indexes
-- PostgreSQL: Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey';
### 3. Index Maintenance
-- Rebuild fragmented indexes
REINDEX INDEX idx_users_email;
REINDEX TABLE users;
-- Update statistics
ANALYZE users;
-- MySQL: Optimize table
OPTIMIZE TABLE users;
## Performance Testing
### 1. Query Execution Plans
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- MySQL
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE email = 'test@example.com';
-- SQL Server
SET STATISTICS IO ON;
SELECT * FROM users WHERE email = 'test@example.com';
### 2. Benchmark Queries
import time
import psycopg2
def benchmark_query(connection, query, iterations=1000):
cursor = connection.cursor()
start_time = time.time()
for _ in range(iterations):
cursor.execute(query)
cursor.fetchall()
end_time = time.time()
avg_time = (end_time - start_time) / iterations
print(f"Average query time: {avg_time:.4f} seconds")
cursor.close()
# Usage
conn = psycopg2.connect("dbname=test user=postgres")
# Test with and without index
benchmark_query(conn, "SELECT * FROM users WHERE email = 'test@example.com'")
## Best Practices
### 1. Index Design Guidelines
-- ✅ Good: Index on frequently queried columns
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- ✅ Good: Composite index with proper column order
CREATE INDEX idx_orders_status_date ON orders(status, created_at);
-- ❌ Bad: Too many single-column indexes
CREATE INDEX idx_users_first_name ON users(first_name);
CREATE INDEX idx_users_last_name ON users(last_name);
CREATE INDEX idx_users_age ON users(age);
-- ✅ Better: One composite index
CREATE INDEX idx_users_name_age ON users(first_name, last_name, age);
### 2. Avoid Over-Indexing
-- Monitor index overhead
SELECT
table_name,
COUNT(*) as index_count,
SUM(size_bytes) as total_index_size
FROM (
SELECT
tablename as table_name,
pg_relation_size(indexrelid) as size_bytes
FROM pg_stat_user_indexes
) index_sizes
GROUP BY table_name
ORDER BY total_index_size DESC;
### 3. Regular Maintenance Schedule
#!/bin/bash
# Daily index maintenance script
# Update statistics
psql -d mydb -c "ANALYZE;"
# Reindex if fragmentation > 30%
psql -d mydb -c "
SELECT 'REINDEX INDEX ' || indexname || ';'
FROM pg_stat_user_indexes
WHERE idx_scan > 0
AND pg_relation_size(indexrelid) > 1000000;
"
## Common Pitfalls
### 1. Wrong Column Order in Composite Indexes
-- ❌ Wrong: Low selectivity column first
CREATE INDEX idx_users_status_email ON users(status, email);
-- ✅ Right: High selectivity column first
CREATE INDEX idx_users_email_status ON users(email, status);
### 2. Indexing Small Tables
-- Don't index tables with < 1000 rows
-- The overhead isn't worth it
SELECT COUNT(*) FROM small_lookup_table; -- Returns 50 rows
-- No need for indexes here
### 3. Ignoring Index Maintenance
-- Set up automated statistics updates
-- PostgreSQL: Enable autovacuum
ALTER TABLE users SET (autovacuum_enabled = true);
-- MySQL: Enable automatic statistics updates
SET GLOBAL innodb_stats_auto_recalc = ON;
## Conclusion
Effective database indexing requires understanding your data, query patterns, and performance requirements. Key takeaways:
- **Analyze before indexing**: Understand your query patterns
- **Choose the right index type**: B-tree, hash, GIN, etc.
- **Consider composite indexes**: For multi-column queries
- **Monitor and maintain**: Regular performance reviews
- **Avoid over-indexing**: Balance query speed with write performance
Remember: indexes speed up reads but slow down writes. Always measure the impact and find the right balance for your specific use case.