← Vector DB

Weaviate Production Playbook

Enterprise deployment guide for Weaviate vector database - architecture, performance, and operational excellence
GenAI Community
join.maxpool.dev →
Platform Overview
AI-native vector database built for semantic search at scale
Why Weaviate Stands Out: Weaviate is an AI-native vector database built from the ground up for semantic search, not retrofitted from traditional databases. It excels in multi-tenant architectures with its ability to handle 50,000+ active shards per node and millions of tenants. The platform's dynamic indexing automatically transitions from flat to HNSW indices at 10,000 objects, optimizing for both small and large datasets without manual intervention.
Capability Key Features Performance Metrics Optimal Use Cases
Core Platform Native hybrid search, GraphQL API, dynamic indexing, streaming ingestion
50k+ shards/node
Sub-5ms p99 latency
Multi-tenant SaaS, real-time RAG, graph-based architectures
Multi-Tenancy Shard-level isolation, automatic tenant lifecycle, resource quotas
Millions of tenants
30-50% memory savings
B2B SaaS, GDPR compliance, variable dataset sizes
Hybrid Search Combined BM25 + semantic, single query fusion, native result ranking
Zero fusion overhead
97%+ recall accuracy
Enterprise search, e-commerce, content discovery
Performance Benchmarks
Production performance on standard cloud hardware (2024 results)
Dataset Vector Count Dimensions Throughput P99 Latency Recall
DBPedia OpenAI 1 million 1,536 5,639 QPS 4.43ms 97.24%
SIFT1M 1 million 128 10,940 QPS 3.13ms 98.35%
MSMARCO 8.8 million 768 7,363 QPS 3.69ms 95.8%
Sphere DPR 10 million 768 3,523 QPS 7.73ms 96%

* Benchmarks on 16 vCPU, 128GB RAM instances with NVMe storage

Index Selection Strategy
Choosing the right index type for your workload characteristics
Index Type Best For Memory Usage Build Time Query Performance
Flat Datasets < 10k objects, cold tenants, exact search requirements Baseline (vector size) Instant O(n) linear
HNSW Large datasets > 10k, hot data, sub-100ms latency needs +30-50% overhead Slower initial build O(log n) logarithmic
Dynamic Multi-tenant with varying sizes, unpredictable growth patterns Adaptive optimization Progressive Auto-optimized
PQ Compressed Very large datasets, memory-constrained environments 60-75% reduction Additional preprocessing 5% recall trade-off

Dynamic Indexing Benefits

  • Automatic optimization: Switches from flat to HNSW at 10,000 objects threshold
  • Multi-tenant efficiency: Reduces memory by 30-50% vs uniform HNSW
  • Zero manual tuning: Ideal for unpredictable growth patterns
Production Architecture
Reference topologies for different scale requirements
Scale Tier Cluster Configuration Capacity Performance Monthly Cost*
Development 1 node: 4 vCPU, 16GB RAM, 500GB SSD Up to 1M vectors 500 QPS, p99 < 100ms $150-200
Small Production 3 nodes: 8 vCPU, 32GB RAM, 1TB SSD Up to 10M vectors 2,500 QPS, p99 < 50ms $600-800
Medium Production 5 nodes: 16 vCPU, 64GB RAM, 2TB NVMe Up to 100M vectors 10,000 QPS, p99 < 100ms $2,500-3,500
Large Production 10+ nodes: 32 vCPU, 128GB RAM, 4TB NVMe 1B+ vectors 50,000 QPS, p99 < 150ms $8,000-12,000

* Self-hosted costs including compute, storage, networking, and operational overhead

Multi-Tenancy Architecture
Shard-level isolation for compliance and performance
Feature Implementation Benefits Limitations
Tenant Isolation Dedicated shards with separate storage and vector indices per tenant Complete data isolation, GDPR compliance Minimum overhead per tenant
Lifecycle Management Automatic creation, activation/deactivation, resource preservation On-demand resources, cost optimization Reactivation latency
Resource Quotas Per-tenant rate limiting, storage caps, query complexity bounds Fair resource sharing, predictable costs Manual configuration needed
Scale Limits 50,000+ active shards per node, millions of total tenants Massive multi-tenancy support Monitoring complexity at scale
Operational Excellence
Monitoring, disaster recovery, and performance optimization
Operational Area Key Metrics & Thresholds Best Practices Common Issues
Monitoring cache_hit_ratio > 80%
bloom_filter_hit > 90%
indexing_queue < 100k
Prometheus integration, granular alerts, monitoring groups for multi-tenant Missing aggregation config, alert fatigue
Performance Tuning ef: 64-128
maxConnections: 32-64
vectorCacheMaxObjects: 2M
Async indexing, batch processing, HNSW parameter tuning Over-indexing, cache thrashing
Disaster Recovery RTO < 15 minutes
RPO ≈ 0 (with replication)
Backup frequency: hourly
S3/GCS backups, blue-green deployments, cross-AZ replication Incomplete backups, version mismatches
Memory Management HNSW overhead: +30-50%
PQ compression: -60-75%
Tombstone cleanup cycles
Memory-mapped files, configurable WAL, Product Quantization Memory leaks in high-churn, OOM on imports
Cost Optimization
Strategies for reducing operational expenses
Optimization Strategy Implementation Cost Impact Trade-offs
Dynamic Indexing Auto-switch between flat and HNSW based on tenant size -30-50% memory costs Slightly higher management complexity
Product Quantization Compress vectors using PQ encoding for cold data -60-75% storage costs ~5% recall degradation
Tenant Deactivation Automatically deactivate idle tenants after threshold -40-60% resource usage Reactivation latency on first query
Module Optimization Disable unused modules and vectorizers -20-30% CPU overhead Reduced feature availability

Weaviate Cloud Pricing

  • Usage-based: $0.095/GB storage + $0.075/million queries
  • Dedicated instances: Starting at $900/month for 4 vCPU, 16GB RAM
  • Includes: Daily backups, SSO, automatic scaling
Troubleshooting Patterns
Common issues and resolution strategies
Issue Category Symptoms Root Causes Resolution
Slow Queries P95 latency > 150ms, degrading response times Low ef parameter, cache misses, wrong index type Increase ef to 128, expand vector cache, verify index selection
Memory Pressure OOM kills, pod evictions, import failures Insufficient vector cache, large batch sizes, memory leaks Reduce cache size, enable PQ compression, horizontal scale
Split Brain Inconsistent cluster state, gossip failures Network segmentation, port 7946/7947 blocked Verify network connectivity, check pod hostnames consistency
Import Failures Timeouts, partial imports, data loss Oversized batches, duplicate detection, schema mismatches Reduce batch size to 100, verify schema, check deduplication
Integration Ecosystem
Vectorization options and deployment platforms
Integration Type Options Available Use Cases Considerations
Vectorization OpenAI, Cohere, Hugging Face, Google Vertex AI, Sentence Transformers Text/image embeddings, multi-modal search API costs, latency, data privacy
Deployment Kubernetes Helm charts, Docker Compose, Embedded, Cloud marketplaces Production clusters, development, edge computing Operational complexity, costs
Query Interface GraphQL API, REST API, gRPC, Client SDKs (Python, JS, Go, Java) Application integration, batch operations Learning curve, performance
Observability Prometheus metrics, Grafana dashboards, OpenTelemetry traces Monitoring, alerting, debugging Metric cardinality at scale
Recent Innovations (v1.25+)
Latest features improving performance and scalability
Feature Description Performance Impact Best For
Dynamic Indexing Automatic transition from flat to HNSW at configurable thresholds 30-50% memory reduction Multi-tenant deployments with varying data sizes
Async Indexing Decouples object creation from vector indexing 40-60% ingestion speedup High-throughput streaming workloads
Enhanced Multi-tenancy 50,000+ active shards per node, automatic tenant management 10x tenant density Large-scale B2B SaaS platforms
Binary Quantization 1-bit vector compression for flat indices 32x memory reduction Massive datasets with relaxed recall requirements
Resources & Documentation
Official resources and community support channels