Optimize data retrieval strategies. Accelerate queries and enforce data integrity with high-performance indexes.
Indexes are essential for query performance in SoliDB. Without an index, SDBQL queries with FILTER conditions must perform a full collection scan, which becomes slower as the collection grows.
Primary Index
Every collection in SoliDB automatically has a Primary Index on the _key attribute.
Unique: The `_key` is guaranteed to be unique within a collection.
Automatic: Created automatically when the collection is created. Cannot be deleted.
Fastest Access: Retrieving a document by its `_key` is the most efficient operation (O(1)).
Hash Index
A Hash Index provides constant-time O(1) lookups for exact matches. It is the fastest index type but supports only equality comparisons (`==`).
Best For
Exact match lookups
High cardinality fields (e.g. UUIDs, email)
Uniqueness constraints
Limitations
No range queries (`>`, `<`)
No sorting
Bloom Filter
A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It may return false positives, but never false negatives.
Best For
Determining if a document might exist before performing expensive lookups.
Large datasets where memory efficient membership testing is needed.
Characteristics
No False Negatives: If it says "no", the element is definitely not in the set.
False Positives: Might say "yes" when the element is not there.
A Cuckoo Filter is an advanced probabilistic data structure that improves upon Bloom filters by supporting item deletion and providing higher space efficiency for low false positive rates.
Best For
Workloads requiring both insertion and deletion of indexed items.
High-performance probabilistic set membership tests.
A Persistent Index is a sorted index stored on disk (using RocksDB/SkipLists) that allows for fast lookups, range queries (`<`, `<=`, `>=`, `>`), and sorting.
Unique ConstraintsEnforce unique values for fields.
Compound Indexes: SoliDB supports multi-field (compound) indexes. Use the fields array to specify multiple fields.
Columnar Index
A Columnar Index stores data column-by-column rather than row-by-row. This layout is highly optimized for analytical queries (OLAP), aggregations, and scanning large datasets where only a subset of attributes is needed.
Best For
Analytics & Reporting
Aggregations (SUM, AVG, COUNT)
Queries on specific fields across many docs
Characteristics
High Compression: Similar data is stored contiguously, allowing for excellent compression ratios.
Fast Scans: Reads only the necessary columns, minimizing I/O.
Index Subtypes
Columnar indexes support specialized subtypes to further optimize specific access patterns:
MinMax
Stores min/max values per chunk. Excellent for range queries on time-series or sorted data.
Bitmap
Efficient for low-cardinality columns (e.g. status, category) and boolean fields.
Bloom
Probabilistic existence check to skip chunks that definitely don't contain the value.
Sorted
Keeps data physically sorted. Ideal for primary keys or frequently ranged columns.
A Vector Index enables semantic search by indexing high-dimensional vectors (embeddings). It uses HNSW (Hierarchical Navigable Small World) graphs for fast Approximate Nearest Neighbor (ANN) search.
A Fulltext Index allows for fuzzy search and relevance scoring on text fields. It uses BM25 scoring and n-gram tokenization to find relevant documents even with typos or partial matches.
A Compound Index spans multiple fields, enabling efficient queries that filter on field combinations. The order of fields matters for prefix-based lookups.
A compound index on ["a", "b", "c"] can accelerate queries filtering on:
a alone
a AND b
a AND b AND c
b or c alone (must include leading field)
Geo Spatial Index
A Geo Spatial Index accelerates queries involving geographic coordinates (latitude and longitude). It uses an R-Tree structure to efficiently find points within a radius or bounding box.
Field Format
The indexed field should contain an array with two numbers: [latitude, longitude].
"location": [48.8566, 2.3522] // Paris
Accelerated Queries
Queries using DISTANCE or GEO_DISTANCE will automatically utilize the index if available.
FOR place IN locations
FILTER DISTANCE(place.loc[0], place.loc[1], 48.85, 2.35) < 1000
RETURN place
TTL Index (Time-To-Live)
A TTL Index automatically removes documents from the collection after a certain amount of time. This is useful for caching, session management, or storing temporary logs.
How it works
A background thread periodically checks the index and removes expired documents. The `expireAfter` value defines the time in seconds.
Usage
expireAfter: 3600
Documents expire 1 hour after creation/update.
Sparse Behavior
All SoliDB indexes are implicitly sparse. Documents where the indexed field is null or missing are not added to the index.
When Documents Are Indexed
Field exists with a value → Indexed
Field is null → Not indexed
Field is missing → Not indexed
Compound Indexes
For compound indexes on multiple fields, a document is indexed if at least one field has a non-null value.