Hybrid Search

Combine vector similarity with fulltext search for superior RAG and semantic search results. Get the best of both worlds.

Why Hybrid Search?

Neither pure vector search nor pure fulltext search is perfect for all queries. Hybrid search combines both approaches to deliver 15-30% better relevance in production RAG systems.

🔍

Vector Search Alone

Understands meaning but misses exact keywords

"error code 500" won't find "HTTP 500"

📝

Fulltext Search Alone

Finds exact keywords but misses synonyms

"car" won't find "automobile"

🎯

Hybrid Search

Best of both: meaning + exact matches

15-30% better relevance

How Hybrid Search Works

Query Input

Provide query vector + text query

Vector Search

Find semantically similar docs

Fulltext Search

Find keyword matches

Score Fusion

Combine & normalize scores

Ranked Results

Return merged, sorted results

Key Insight: Documents that match both vector and text queries rank highest, while documents matching only one source still appear in results.

Score Fusion Methods

SoliDB supports two score fusion methods. Both normalize scores to [0,1] before combining.

Weighted Sum

Default

Linearly combine normalized scores with configurable weights.

score = α × vector_score + β × text_score

Best for: When you know relative importance of semantic vs keyword matching for your use case.

Reciprocal Rank Fusion (RRF)

fusion: "rrf"

Combine based on rank position, not raw scores. More robust when score distributions differ.

score = Σ 1/(k + rank_i)

Best for: When vector and text scores have very different distributions.

SDBQL: HYBRID_SEARCH Function

Syntax

HYBRID_SEARCH(collection, vector_index, fulltext_field, query_vector, text_query, [options])

Parameters

Parameter	Type	Description
collection	string	Name of the collection to search
vector_index	string	Name of the vector index to use
fulltext_field	string	Document field to search with fulltext (must have fulltext index)
query_vector	array	Query embedding vector
text_query	string	Text query for fulltext search
options	object	Optional configuration (see below)

Options

Option	Type	Default	Description
vector_weight	float	0.5	Weight for vector similarity scores (0-1)
text_weight	float	0.5	Weight for fulltext scores (0-1)
limit	integer	10	Maximum results to return
fusion	string	"weighted"	"weighted" or "rrf" (Reciprocal Rank Fusion)

Return Format

Returns an array of result objects, sorted by score descending:

{ "doc": { /* full document */ }, "score": 0.85, // combined score "vector_score": 0.9, // normalized vector similarity "text_score": 0.7, // normalized fulltext score "sources": ["vector", "fulltext"] // which searches matched }

Examples

Basic Hybrid Search

LET results = HYBRID_SEARCH(
    "articles",
    "embedding_idx",
    "content",
    @query_vector,
    "machine learning tutorial"
)
FOR result IN results
  RETURN {
    title: result.doc.title,
    score: result.score,
    sources: result.sources
  }

Favor Semantic Similarity (70/30)

LET results = HYBRID_SEARCH(
    "articles",
    "embedding_idx",
    "content",
    @query_vector,
    "kubernetes deployment",
    { vector_weight: 0.7, text_weight: 0.3, limit: 20 }
)
FOR result IN results
  RETURN result.doc

Using RRF Fusion

LET results = HYBRID_SEARCH(
    "products",
    "embed_idx",
    "description",
    @vec,
    "laptop gaming",
    { fusion: "rrf", limit: 10 }
)
FOR result IN results
  FILTER result.score > 0.01
  RETURN {
    name: result.doc.name,
    price: result.doc.price,
    relevance: result.score
  }

REST API Endpoint

POST /_api/database/:db/hybrid/:collection/search

Request Body

Field	Type	Required	Description
vector	array[float]	Yes	Query embedding vector
text_query	string	Yes	Text query for fulltext search
vector_index	string	Yes	Name of the vector index
fulltext_field	string	Yes	Field to search with fulltext
vector_weight	float	No	Weight for vector scores (default: 0.5)
text_weight	float	No	Weight for text scores (default: 0.5)
limit	integer	No	Max results (default: 10)
fusion	string	No	"weighted" (default) or "rrf"

Example Request

curl -X POST \ http://localhost:6745/_api/database/mydb/hybrid/articles/search \ -H "Content-Type: application/json" \ -d '{ "vector": [0.1, 0.2, 0.3, ...], "text_query": "machine learning tutorial", "vector_index": "embedding_idx", "fulltext_field": "content", "vector_weight": 0.6, "text_weight": 0.4, "limit": 10 }'

Response

200 OK

{ "results": [ { "doc_key": "article_001", "score": 0.85, "vector_score": 0.9, "text_score": 0.7, "sources": ["vector", "fulltext"] }, { "doc_key": "article_042", "score": 0.72, "vector_score": 0.8, "text_score": null, "sources": ["vector"] } ] }

Setup Requirements

Hybrid search requires both a vector index and a fulltext index on your collection.

1 Create Vector Index

curl -X POST \ http://localhost:6745/_api/database/mydb/vector/articles \ -d '{ "name": "embedding_idx", "field": "embedding", "dimension": 1536, "metric": "cosine" }'

2 Create Fulltext Index

curl -X POST \ http://localhost:6745/_api/database/mydb/index/articles \ -d '{ "type": "fulltext", "name": "content_ft", "fields": ["content"], "min_length": 3 }'

Note: The fulltext_field parameter in HYBRID_SEARCH should match the field name, not the index name. The function will automatically use the fulltext index on that field.

Best Practices

When to Use Hybrid Search

✓ RAG applications where both semantic and keyword relevance matter
✓ Queries with specific terms (product codes, error messages, names)
✓ When users expect exact keyword matches to rank highly
✓ E-commerce search combining product descriptions with SKUs

Tuning Tips

→ Start with 50/50 weights, then adjust based on result quality
→ Use RRF when vector and text scores have different distributions
→ Favor vector_weight for semantic/conceptual queries
→ Favor text_weight for queries with specific keywords/codes

Vector Search Graphs & Edges