Weaviate Production Playbook

Platform Overview

AI-native vector database built for semantic search at scale

Why Weaviate Stands Out: Weaviate is an AI-native vector database built from the ground up for semantic search, not retrofitted from traditional databases. It excels in multi-tenant architectures with its ability to handle 50,000+ active shards per node and millions of tenants. The platform's dynamic indexing automatically transitions from flat to HNSW indices at 10,000 objects, optimizing for both small and large datasets without manual intervention.

Capability	Key Features	Performance Metrics	Optimal Use Cases
Core Platform	Native hybrid search, GraphQL API, dynamic indexing, streaming ingestion	50k+ shards/node Sub-5ms p99 latency	Multi-tenant SaaS, real-time RAG, graph-based architectures
Multi-Tenancy	Shard-level isolation, automatic tenant lifecycle, resource quotas	Millions of tenants 30-50% memory savings	B2B SaaS, GDPR compliance, variable dataset sizes
Hybrid Search	Combined BM25 + semantic, single query fusion, native result ranking	Zero fusion overhead 97%+ recall accuracy	Enterprise search, e-commerce, content discovery

Performance Benchmarks

Production performance on standard cloud hardware (2024 results)

Dataset	Vector Count	Dimensions	Throughput	P99 Latency	Recall
DBPedia OpenAI	1 million	1,536	5,639 QPS	4.43ms	97.24%
SIFT1M	1 million	128	10,940 QPS	3.13ms	98.35%
MSMARCO	8.8 million	768	7,363 QPS	3.69ms	95.8%
Sphere DPR	10 million	768	3,523 QPS	7.73ms	96%

* Benchmarks on 16 vCPU, 128GB RAM instances with NVMe storage

Index Selection Strategy

Choosing the right index type for your workload characteristics

Index Type	Best For	Memory Usage	Build Time	Query Performance
Flat	Datasets < 10k objects, cold tenants, exact search requirements	Baseline (vector size)	Instant	O(n) linear
HNSW	Large datasets > 10k, hot data, sub-100ms latency needs	+30-50% overhead	Slower initial build	O(log n) logarithmic
Dynamic	Multi-tenant with varying sizes, unpredictable growth patterns	Adaptive optimization	Progressive	Auto-optimized
PQ Compressed	Very large datasets, memory-constrained environments	60-75% reduction	Additional preprocessing	5% recall trade-off

Dynamic Indexing Benefits

Automatic optimization: Switches from flat to HNSW at 10,000 objects threshold
Multi-tenant efficiency: Reduces memory by 30-50% vs uniform HNSW
Zero manual tuning: Ideal for unpredictable growth patterns

Production Architecture

Reference topologies for different scale requirements

Scale Tier	Cluster Configuration	Capacity	Performance	Monthly Cost*
Development	1 node: 4 vCPU, 16GB RAM, 500GB SSD	Up to 1M vectors	500 QPS, p99 < 100ms	$150-200
Small Production	3 nodes: 8 vCPU, 32GB RAM, 1TB SSD	Up to 10M vectors	2,500 QPS, p99 < 50ms	$600-800
Medium Production	5 nodes: 16 vCPU, 64GB RAM, 2TB NVMe	Up to 100M vectors	10,000 QPS, p99 < 100ms	$2,500-3,500
Large Production	10+ nodes: 32 vCPU, 128GB RAM, 4TB NVMe	1B+ vectors	50,000 QPS, p99 < 150ms	$8,000-12,000

* Self-hosted costs including compute, storage, networking, and operational overhead

Multi-Tenancy Architecture

Shard-level isolation for compliance and performance

Feature	Implementation	Benefits	Limitations
Tenant Isolation	Dedicated shards with separate storage and vector indices per tenant	Complete data isolation, GDPR compliance	Minimum overhead per tenant
Lifecycle Management	Automatic creation, activation/deactivation, resource preservation	On-demand resources, cost optimization	Reactivation latency
Resource Quotas	Per-tenant rate limiting, storage caps, query complexity bounds	Fair resource sharing, predictable costs	Manual configuration needed
Scale Limits	50,000+ active shards per node, millions of total tenants	Massive multi-tenancy support	Monitoring complexity at scale

Operational Excellence

Monitoring, disaster recovery, and performance optimization

Operational Area	Key Metrics & Thresholds	Best Practices	Common Issues
Monitoring	`cache_hit_ratio > 80%` `bloom_filter_hit > 90%` `indexing_queue < 100k`	Prometheus integration, granular alerts, monitoring groups for multi-tenant	Missing aggregation config, alert fatigue
Performance Tuning	`ef: 64-128` `maxConnections: 32-64` `vectorCacheMaxObjects: 2M`	Async indexing, batch processing, HNSW parameter tuning	Over-indexing, cache thrashing
Disaster Recovery	`RTO < 15 minutes` `RPO ≈ 0 (with replication)` `Backup frequency: hourly`	S3/GCS backups, blue-green deployments, cross-AZ replication	Incomplete backups, version mismatches
Memory Management	`HNSW overhead: +30-50%` `PQ compression: -60-75%` `Tombstone cleanup cycles`	Memory-mapped files, configurable WAL, Product Quantization	Memory leaks in high-churn, OOM on imports

Cost Optimization

Strategies for reducing operational expenses

Optimization Strategy	Implementation	Cost Impact	Trade-offs
Dynamic Indexing	Auto-switch between flat and HNSW based on tenant size	-30-50% memory costs	Slightly higher management complexity
Product Quantization	Compress vectors using PQ encoding for cold data	-60-75% storage costs	~5% recall degradation
Tenant Deactivation	Automatically deactivate idle tenants after threshold	-40-60% resource usage	Reactivation latency on first query
Module Optimization	Disable unused modules and vectorizers	-20-30% CPU overhead	Reduced feature availability

Weaviate Cloud Pricing

Usage-based: $0.095/GB storage + $0.075/million queries
Dedicated instances: Starting at $900/month for 4 vCPU, 16GB RAM
Includes: Daily backups, SSO, automatic scaling

Troubleshooting Patterns

Common issues and resolution strategies

Issue Category	Symptoms	Root Causes	Resolution
Slow Queries	P95 latency > 150ms, degrading response times	Low ef parameter, cache misses, wrong index type	Increase ef to 128, expand vector cache, verify index selection
Memory Pressure	OOM kills, pod evictions, import failures	Insufficient vector cache, large batch sizes, memory leaks	Reduce cache size, enable PQ compression, horizontal scale
Split Brain	Inconsistent cluster state, gossip failures	Network segmentation, port 7946/7947 blocked	Verify network connectivity, check pod hostnames consistency
Import Failures	Timeouts, partial imports, data loss	Oversized batches, duplicate detection, schema mismatches	Reduce batch size to 100, verify schema, check deduplication

Integration Ecosystem

Vectorization options and deployment platforms

Integration Type	Options Available	Use Cases	Considerations
Vectorization	OpenAI, Cohere, Hugging Face, Google Vertex AI, Sentence Transformers	Text/image embeddings, multi-modal search	API costs, latency, data privacy
Deployment	Kubernetes Helm charts, Docker Compose, Embedded, Cloud marketplaces	Production clusters, development, edge computing	Operational complexity, costs
Query Interface	GraphQL API, REST API, gRPC, Client SDKs (Python, JS, Go, Java)	Application integration, batch operations	Learning curve, performance
Observability	Prometheus metrics, Grafana dashboards, OpenTelemetry traces	Monitoring, alerting, debugging	Metric cardinality at scale

Recent Innovations (v1.25+)

Latest features improving performance and scalability

Feature	Description	Performance Impact	Best For
Dynamic Indexing	Automatic transition from flat to HNSW at configurable thresholds	30-50% memory reduction	Multi-tenant deployments with varying data sizes
Async Indexing	Decouples object creation from vector indexing	40-60% ingestion speedup	High-throughput streaming workloads
Enhanced Multi-tenancy	50,000+ active shards per node, automatic tenant management	10x tenant density	Large-scale B2B SaaS platforms
Binary Quantization	1-bit vector compression for flat indices	32x memory reduction	Massive datasets with relaxed recall requirements

Resources & Documentation

Official resources and community support channels

Documentation Comprehensive technical docs Helm Charts Production K8s deployments Benchmarks Latest performance metrics Cloud Console Managed service portal Community Forum Production experiences Slack Community Real-time engineer support Weaviate Academy Structured learning courses Technical Blog Deep-dives and case studies

Dynamic Indexing Benefits

Weaviate Cloud Pricing

GenAI Community