Indexes - SoliDB Documentation

Primary Index

Every collection in SoliDB automatically has a Primary Index on the _key attribute.

Unique: The `_key` is guaranteed to be unique within a collection.
Automatic: Created automatically when the collection is created. Cannot be deleted.
Fastest Access: Retrieving a document by its `_key` is the most efficient operation (O(1)).

Hash Index

A Hash Index provides constant-time O(1) lookups for exact matches. It is the fastest index type but supports only equality comparisons (`==`).

Best For

Exact match lookups
High cardinality fields (e.g. UUIDs, email)
Uniqueness constraints

Limitations

No range queries (`>`, `<`)
No sorting

Bloom Filter

A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It may return false positives, but never false negatives.

Best For

Determining if a document might exist before performing expensive lookups.
Large datasets where memory efficient membership testing is needed.

Characteristics

No False Negatives: If it says "no", the element is definitely not in the set.
False Positives: Might say "yes" when the element is not there.

Usage

{
   "type": "bloom",
   "field": "username",
   "unique": true
}

Cuckoo Filter

A Cuckoo Filter is an advanced probabilistic data structure that improves upon Bloom filters by supporting item deletion and providing higher space efficiency for low false positive rates.

Best For

Workloads requiring both insertion and deletion of indexed items.
High-performance probabilistic set membership tests.

Characteristics

Supports Deletion: Unlike standard Bloom filters, Cuckoo filters allow removing items.
Better Space Efficiency: Typically uses less space than Bloom filters for the same false positive rate.

Usage

{
   "type": "cuckoo",
   "field": "user_id",
   "unique": true
}

Persistent Index

A Persistent Index is a sorted index stored on disk (using RocksDB/SkipLists) that allows for fast lookups, range queries (`<`, `<=`, `>=`, `>`), and sorting.

Capabilities

Equality Lookups FILTER user.email == "[email protected]"

Range Queries FILTER user.age >= 21

Sorting SORT user.createdAt DESC

Unique Constraints Enforce unique values for fields.

Compound Indexes: SoliDB supports multi-field (compound) indexes. Use the fields array to specify multiple fields.

Columnar Index

A Columnar Index stores data column-by-column rather than row-by-row. This layout is highly optimized for analytical queries (OLAP), aggregations, and scanning large datasets where only a subset of attributes is needed.

Best For

Analytics & Reporting
Aggregations (SUM, AVG, COUNT)
Queries on specific fields across many docs

Characteristics

High Compression: Similar data is stored contiguously, allowing for excellent compression ratios.
Fast Scans: Reads only the necessary columns, minimizing I/O.

Index Subtypes

Columnar indexes support specialized subtypes to further optimize specific access patterns:

MinMax

Stores min/max values per chunk. Excellent for range queries on time-series or sorted data.

Bitmap

Efficient for low-cardinality columns (e.g. status, category) and boolean fields.

Bloom

Probabilistic existence check to skip chunks that definitely don't contain the value.

Sorted

Keeps data physically sorted. Ideal for primary keys or frequently ranged columns.

Usage

{
   "type": "columnar",
   "fields": ["revenue", "category", "region"],
   "subtypes": {
     "revenue": "minmax",
     "category": "bitmap",
     "region": "sorted"
   }
}

Vector Index

A Vector Index enables semantic search by indexing high-dimensional vectors (embeddings). It uses HNSW (Hierarchical Navigable Small World) graphs for fast Approximate Nearest Neighbor (ANN) search.

Key Features

Metrics: Cosine Similarity, Euclidean Distance, Dot Product.

Quantization: Scalar quantization (u8) for 4x memory savings.

Performance: HNSW graph structure for log(N) search complexity.

Dimensions: specific to your embedding model (e.g. 1536 for OpenAI).

Usage

{
   "type": "vector",
   "field": "embedding",
   "dimension": 1536,
   "metric": "cosine",
   "quantization": "scalar"
}

Fulltext Index

A Fulltext Index allows for fuzzy search and relevance scoring on text fields. It uses BM25 scoring and n-gram tokenization to find relevant documents even with typos or partial matches.

Capabilities

Fuzzy matching (Levenshtein distance)
Relevance scoring (BM25)
Prefix/Suffix matching

Configuration

{
   "type": "fulltext",
   "fields": ["title", "description"],
   "ngram_size": 3
}

Compound Index

A Compound Index spans multiple fields, enabling efficient queries that filter on field combinations. The order of fields matters for prefix-based lookups.

Best For

Multi-field queries (e.g., user + date)
Composite unique constraints
Prefix lookups on first field

Example

{
  "name": "idx_user_date",
  "type": "persistent",
  "fields": ["user_id", "created_at"],
  "unique": false
}

Prefix Matching

A compound index on ["a", "b", "c"] can accelerate queries filtering on:

a alone
a AND b
a AND b AND c
b or c alone (must include leading field)

Geo Spatial Index

A Geo Spatial Index accelerates queries involving geographic coordinates (latitude and longitude). It uses an R-Tree structure to efficiently find points within a radius or bounding box.

Field Format

The indexed field should contain an array with two numbers: [latitude, longitude].


                  "location": [48.8566, 2.3522] // Paris

Accelerated Queries

Queries using DISTANCE or GEO_DISTANCE will automatically utilize the index if available.

FOR place IN locations
  FILTER DISTANCE(place.loc[0], place.loc[1], 48.85, 2.35) < 1000
  RETURN place

TTL Index (Time-To-Live)

A TTL Index automatically removes documents from the collection after a certain amount of time. This is useful for caching, session management, or storing temporary logs.

How it works

A background thread periodically checks the index and removes expired documents. The `expireAfter` value defines the time in seconds.

Usage

expireAfter: 3600

Documents expire 1 hour after creation/update.

Sparse Behavior

All SoliDB indexes are implicitly sparse. Documents where the indexed field is null or missing are not added to the index.

When Documents Are Indexed

Field exists with a value → Indexed

Field is null → Not indexed

Field is missing → Not indexed

Compound Indexes

For compound indexes on multiple fields, a document is indexed if at least one field has a non-null value.


                  {"user_id": "abc", "email": null} → Indexed (user_id exists)

Benefits of Sparse Indexes

Storage Efficiency

Documents without the field don't consume index space.

Unique Constraints

Multiple docs can have null for a unique field (no conflict).

Query Performance

Smaller indexes = faster lookups and less memory.

Management API

Manage indexes via the HTTP API.

Create Standard Index POST /_api/database/:db/index/:collection

For Persistent, Hash, or Fulltext indexes. Use field for single-field or fields for compound indexes.

Single Field:

{
  "name": "idx_email",
  "type": "persistent",
  "field": "email",
  "unique": true
}

Compound Index:

{
  "name": "idx_user_date",
  "type": "persistent",
  "fields": ["user_id", "created_at"],
  "unique": false
}

Create Columnar Index POST /_api/database/:db/index/:collection

Optimize for analytics by indexing specific columns.

{
  "name": "idx_analytics",
  "type": "columnar",
  "fields": ["price", "category"]
}

Create Geo Index POST /_api/database/:db/geo/:collection

{
  "name": "idx_location",
  "field": "location"
}

Create TTL Index POST /_api/database/:db/ttl/:collection

{
  "name": "idx_expire",
  "field": "createdAt",
  "expire_after_seconds": 3600
}

List Indexes GET /_api/index?collection=<name>

Delete Index DELETE /_api/index/<index-id>