Hybrid Search

Combine vector similarity with fulltext search for superior RAG and semantic search results. Get the best of both worlds.

Why Hybrid Search?

Neither pure vector search nor pure fulltext search is perfect for all queries. Hybrid search combines both approaches to deliver 15-30% better relevance in production RAG systems.

🔍

Vector Search Alone

Understands meaning but misses exact keywords

"error code 500" won't find "HTTP 500"
📝

Fulltext Search Alone

Finds exact keywords but misses synonyms

"car" won't find "automobile"
🎯

Hybrid Search

Best of both: meaning + exact matches

15-30% better relevance

How Hybrid Search Works

1

Query Input

Provide query vector + text query

2

Vector Search

Find semantically similar docs

3

Fulltext Search

Find keyword matches

4

Score Fusion

Combine & normalize scores

5

Ranked Results

Return merged, sorted results

Key Insight: Documents that match both vector and text queries rank highest, while documents matching only one source still appear in results.

Score Fusion Methods

SoliDB supports two score fusion methods. Both normalize scores to [0,1] before combining.

W

Weighted Sum

Default

Linearly combine normalized scores with configurable weights.

score = α × vector_score + β × text_score
Best for: When you know relative importance of semantic vs keyword matching for your use case.
R

Reciprocal Rank Fusion (RRF)

fusion: "rrf"

Combine based on rank position, not raw scores. More robust when score distributions differ.

score = Σ 1/(k + rank_i)
Best for: When vector and text scores have very different distributions.

SDBQL: HYBRID_SEARCH Function

Syntax

HYBRID_SEARCH(collection, vector_index, fulltext_field, query_vector, text_query, [options])

Parameters

Parameter Type Description
collection string Name of the collection to search
vector_index string Name of the vector index to use
fulltext_field string Document field to search with fulltext (must have fulltext index)
query_vector array Query embedding vector
text_query string Text query for fulltext search
options object Optional configuration (see below)

Options

Option Type Default Description
vector_weight float 0.5 Weight for vector similarity scores (0-1)
text_weight float 0.5 Weight for fulltext scores (0-1)
limit integer 10 Maximum results to return
fusion string "weighted" "weighted" or "rrf" (Reciprocal Rank Fusion)

Return Format

Returns an array of result objects, sorted by score descending:

{ "doc": { /* full document */ }, "score": 0.85, // combined score "vector_score": 0.9, // normalized vector similarity "text_score": 0.7, // normalized fulltext score "sources": ["vector", "fulltext"] // which searches matched }

Examples

Basic Hybrid Search
LET results = HYBRID_SEARCH(
    "articles",
    "embedding_idx",
    "content",
    @query_vector,
    "machine learning tutorial"
)
FOR result IN results
  RETURN {
    title: result.doc.title,
    score: result.score,
    sources: result.sources
  }
Favor Semantic Similarity (70/30)
LET results = HYBRID_SEARCH(
    "articles",
    "embedding_idx",
    "content",
    @query_vector,
    "kubernetes deployment",
    { vector_weight: 0.7, text_weight: 0.3, limit: 20 }
)
FOR result IN results
  RETURN result.doc
Using RRF Fusion
LET results = HYBRID_SEARCH(
    "products",
    "embed_idx",
    "description",
    @vec,
    "laptop gaming",
    { fusion: "rrf", limit: 10 }
)
FOR result IN results
  FILTER result.score > 0.01
  RETURN {
    name: result.doc.name,
    price: result.doc.price,
    relevance: result.score
  }

REST API Endpoint

POST /_api/database/:db/hybrid/:collection/search

Request Body

Field Type Required Description
vector array[float] Yes Query embedding vector
text_query string Yes Text query for fulltext search
vector_index string Yes Name of the vector index
fulltext_field string Yes Field to search with fulltext
vector_weight float No Weight for vector scores (default: 0.5)
text_weight float No Weight for text scores (default: 0.5)
limit integer No Max results (default: 10)
fusion string No "weighted" (default) or "rrf"

Example Request

curl -X POST \ http://localhost:6745/_api/database/mydb/hybrid/articles/search \ -H "Content-Type: application/json" \ -d '{ "vector": [0.1, 0.2, 0.3, ...], "text_query": "machine learning tutorial", "vector_index": "embedding_idx", "fulltext_field": "content", "vector_weight": 0.6, "text_weight": 0.4, "limit": 10 }'

Response

200 OK
{ "results": [ { "doc_key": "article_001", "score": 0.85, "vector_score": 0.9, "text_score": 0.7, "sources": ["vector", "fulltext"] }, { "doc_key": "article_042", "score": 0.72, "vector_score": 0.8, "text_score": null, "sources": ["vector"] } ] }

Setup Requirements

Hybrid search requires both a vector index and a fulltext index on your collection.

1 Create Vector Index

curl -X POST \ http://localhost:6745/_api/database/mydb/vector/articles \ -d '{ "name": "embedding_idx", "field": "embedding", "dimension": 1536, "metric": "cosine" }'

2 Create Fulltext Index

curl -X POST \ http://localhost:6745/_api/database/mydb/index/articles \ -d '{ "type": "fulltext", "name": "content_ft", "fields": ["content"], "min_length": 3 }'

Note: The fulltext_field parameter in HYBRID_SEARCH should match the field name, not the index name. The function will automatically use the fulltext index on that field.

Best Practices

When to Use Hybrid Search

  • RAG applications where both semantic and keyword relevance matter
  • Queries with specific terms (product codes, error messages, names)
  • When users expect exact keyword matches to rank highly
  • E-commerce search combining product descriptions with SKUs

Tuning Tips

  • Start with 50/50 weights, then adjust based on result quality
  • Use RRF when vector and text scores have different distributions
  • Favor vector_weight for semantic/conceptual queries
  • Favor text_weight for queries with specific keywords/codes