Fulltext Search Functions
Fulltext search capabilities with n-gram indexing and BM25 relevance scoring.
Fulltext Search Operations
SDBQL provides powerful fulltext search capabilities using n-gram indexing for fuzzy matching and BM25 for relevance scoring.
FULLTEXT N-Gram Fulltext Search
Fuzzy search using n-gram indexing. Returns matching documents with similarity scores.
FULLTEXT(collection, field, query, distance?)
Parameters
collection- Name of the collection to searchfield- Field name to search withinquery- Search query stringdistance- Optional max edit distance (default: 2)
Example
LET matches = FULLTEXT("articles", "title", "rust", 2)
FOR m IN matches
RETURN m.doc
N-Gram Indexing
FULLTEXT uses n-gram (trigram) indexing to enable fuzzy matching. This allows finding documents even when there are typos or slight variations in the search query.
BM25 BM25 Relevance Scoring
BM25 relevance scoring for ranking search results. Returns a numeric score that can be used in SORT clauses.
BM25(field, query)
BM25 Algorithm
BM25 (Best Matching 25) is a probabilistic ranking function that provides sophisticated relevance scoring. It considers:
- Term frequency (TF) - How often the term appears in the document
- Inverse document frequency (IDF) - How rare the term is across all documents
- Document length - Normalizes for shorter vs longer documents
Basic Scoring
FOR doc IN articles
RETURN {
title: doc.title,
score: BM25(doc.content, "machine learning")
}
Sort by Relevance
FOR doc IN articles
SORT BM25(doc.content, "rust database") DESC
LIMIT 10
RETURN doc
Combined with Filters
FOR doc IN articles
FILTER doc.published == true
LET score = BM25(doc.content, "query optimization")
FILTER score > 0.1
SORT score DESC
LIMIT 5
RETURN {title: doc.title, score}
HYBRID_SEARCH Vector + Fulltext Hybrid Search
Combines vector similarity search with fulltext search for improved RAG results. Returns documents ranked by combined score.
HYBRID_SEARCH(collection, vector_index, fulltext_field, query_vector, text_query, [options])
Parameters
collection- Collection name to searchvector_index- Name of the vector indexfulltext_field- Field with fulltext indexquery_vector- Query embedding vectortext_query- Text query stringoptions- Optional: {vector_weight, text_weight, limit, fusion}
Example
LET results = HYBRID_SEARCH(
"articles", "embedding_idx", "content",
@query_vec, "machine learning"
)
FOR r IN results
RETURN { title: r.doc.title, score: r.score }
With Custom Weights
LET results = HYBRID_SEARCH(
"articles", "embedding_idx", "content",
@query_vec, "kubernetes deployment",
{ vector_weight: 0.7, text_weight: 0.3, limit: 20, fusion: "rrf" }
)
FOR r IN results
RETURN r.doc
Hybrid Search Benefits
Documents matching both vector and text queries rank highest. Provides 15-30% better relevance than pure vector or pure text search alone. Supports weighted sum (default) and RRF fusion methods.
HIGHLIGHT Search Result Highlighting
Wraps matched search terms in HTML bold tags for highlighting in search results.
HIGHLIGHT(text, terms)
Parameters
text- The text to search and highlight withinterms- Search term string or array of terms to highlight
Returns
Text with matched terms wrapped in <b> tags (case-insensitive matching)
Single Term
RETURN HIGHLIGHT("Hello World", "world")
-- "Hello <b>World</b>"
Multiple Terms
RETURN HIGHLIGHT("Rust database engine", ["rust", "engine"])
-- "<b>Rust</b> database <b>engine</b>"
Usage in Search Results
FOR doc IN articles
LET score = BM25(doc.content, @query)
FILTER score > 0.1
SORT score DESC
LIMIT 10
RETURN {
title: HIGHLIGHT(doc.title, @query),
preview: HIGHLIGHT(SUBSTRING(doc.content, 0, 200), @query),
score
}
HTML Output
HIGHLIGHT returns HTML with <b> tags. Ensure your frontend renders the output as HTML (not escaped text) to display the highlighting properly.
SAMPLE Random Document Sampling
Returns a random sample of documents from a collection. Useful for testing, data exploration, and machine learning workflows.
SAMPLE(collection, count)
Parameters
collection- Name of the collection to sample fromcount- Number of random documents to return
Returns
Array of randomly selected documents (may return fewer if collection has less documents than requested)
Basic Usage
LET random_users = SAMPLE("users", 5)
FOR user IN random_users
RETURN user.name
ML Training Set
LET training_data = SAMPLE("articles", 1000)
FOR doc IN training_data
RETURN { text: doc.content, label: doc.category }
Data Exploration
-- Get random products to inspect data quality
LET samples = SAMPLE("products", 10)
FOR p IN samples
RETURN {
_key: p._key,
name: p.name,
hasPrice: IS_NUMBER(p.price),
hasCategory: IS_STRING(p.category)
}
Use Cases
- Testing with representative data
- Machine learning training/validation splits
- Data quality inspection
- Building demo datasets
- A/B testing candidate selection
Practical Search Examples
-- Fuzzy search with ranking
FOR doc IN products
LET score = BM25(doc.description, @searchQuery)
FILTER score > 0.1
SORT score DESC
LIMIT 20
RETURN { name: doc.name, score, snippet: SUBSTRING(doc.description, 0, 100) }
-- Multi-field search with boosting
FOR doc IN articles
LET titleScore = BM25(doc.title, @query) * 2 -- Boost title matches
LET contentScore = BM25(doc.content, @query)
LET totalScore = titleScore + contentScore
FILTER totalScore > 0
SORT totalScore DESC
LIMIT 10
RETURN { title: doc.title, score: totalScore }
-- Search with category filter
FOR doc IN products
FILTER doc.category == @category
LET score = BM25(doc.name, @search)
FILTER score > 0.5
SORT score DESC
RETURN doc
-- Combine fulltext with fuzzy matching
LET fulltextResults = FULLTEXT("products", "name", @query, 2)
FOR result IN fulltextResults
LET doc = result.doc
LET relevance = BM25(doc.description, @query)
SORT relevance DESC
RETURN { doc, relevance }
-- Autocomplete search
FOR doc IN products
FILTER doc.name LIKE CONCAT(@prefix, "%")
LET score = BM25(doc.name, @prefix)
SORT score DESC
LIMIT 10
RETURN { name: doc.name, category: doc.category }
-- Search with highlighting simulation
FOR doc IN articles
LET score = BM25(doc.content, @query)
FILTER score > 0
SORT score DESC
LIMIT 10
LET preview = SUBSTRING(doc.content, 0, 200)
RETURN {
title: doc.title,
preview: CONCAT(preview, "..."),
score
}
Search Tips
Improving Relevance
- Boost important fields (title, name) with multipliers
- Use multiple search terms for better matching
- Filter by score threshold to remove weak matches
- Combine with FILTER for faceted search
Performance Tips
- Create fulltext indexes on frequently searched fields
- Use LIMIT to restrict result set size
- Pre-filter with indexed fields before scoring
- Cache common search results