Fulltext Search Functions

Fulltext search capabilities with n-gram indexing and BM25 relevance scoring.

Fulltext Search Operations

SDBQL provides powerful fulltext search capabilities using n-gram indexing for fuzzy matching and BM25 for relevance scoring.

FULLTEXT N-Gram Fulltext Search

Fuzzy search using n-gram indexing. Returns matching documents with similarity scores.

FULLTEXT(collection, field, query, distance?)

Parameters

collection - Name of the collection to search
field - Field name to search within
query - Search query string
distance - Optional max edit distance (default: 2)

Example

LET matches = FULLTEXT("articles", "title", "rust", 2)
FOR m IN matches
  RETURN m.doc

N-Gram Indexing

FULLTEXT uses n-gram (trigram) indexing to enable fuzzy matching. This allows finding documents even when there are typos or slight variations in the search query.

BM25 BM25 Relevance Scoring

BM25 relevance scoring for ranking search results. Returns a numeric score that can be used in SORT clauses.

BM25(field, query)

BM25 Algorithm

BM25 (Best Matching 25) is a probabilistic ranking function that provides sophisticated relevance scoring. It considers:

Term frequency (TF) - How often the term appears in the document
Inverse document frequency (IDF) - How rare the term is across all documents
Document length - Normalizes for shorter vs longer documents

Basic Scoring

FOR doc IN articles
  RETURN {
    title: doc.title,
    score: BM25(doc.content, "machine learning")
  }

Sort by Relevance

FOR doc IN articles
  SORT BM25(doc.content, "rust database") DESC
  LIMIT 10
  RETURN doc

Combined with Filters

FOR doc IN articles
  FILTER doc.published == true
  LET score = BM25(doc.content, "query optimization")
  FILTER score > 0.1
  SORT score DESC
  LIMIT 5
  RETURN {title: doc.title, score}

HYBRID_SEARCH Vector + Fulltext Hybrid Search

Combines vector similarity search with fulltext search for improved RAG results. Returns documents ranked by combined score.

HYBRID_SEARCH(collection, vector_index, fulltext_field, query_vector, text_query, [options])

Parameters

collection - Collection name to search
vector_index - Name of the vector index
fulltext_field - Field with fulltext index
query_vector - Query embedding vector
text_query - Text query string
options - Optional: {vector_weight, text_weight, limit, fusion}

Example

LET results = HYBRID_SEARCH(
    "articles", "embedding_idx", "content",
    @query_vec, "machine learning"
)
FOR r IN results
  RETURN { title: r.doc.title, score: r.score }

With Custom Weights

LET results = HYBRID_SEARCH(
    "articles", "embedding_idx", "content",
    @query_vec, "kubernetes deployment",
    { vector_weight: 0.7, text_weight: 0.3, limit: 20, fusion: "rrf" }
)
FOR r IN results
  RETURN r.doc

Hybrid Search Benefits

Documents matching both vector and text queries rank highest. Provides 15-30% better relevance than pure vector or pure text search alone. Supports weighted sum (default) and RRF fusion methods.

See full Hybrid Search documentation →

HIGHLIGHT Search Result Highlighting

Wraps matched search terms in HTML bold tags for highlighting in search results.

HIGHLIGHT(text, terms)

Parameters

text - The text to search and highlight within
terms - Search term string or array of terms to highlight

Returns

Text with matched terms wrapped in <b> tags (case-insensitive matching)

Single Term

RETURN HIGHLIGHT("Hello World", "world")
-- "Hello <b>World</b>"

Multiple Terms

RETURN HIGHLIGHT("Rust database engine", ["rust", "engine"])
-- "<b>Rust</b> database <b>engine</b>"

Usage in Search Results

FOR doc IN articles
  LET score = BM25(doc.content, @query)
  FILTER score > 0.1
  SORT score DESC
  LIMIT 10
  RETURN {
    title: HIGHLIGHT(doc.title, @query),
    preview: HIGHLIGHT(SUBSTRING(doc.content, 0, 200), @query),
    score
  }

HTML Output

HIGHLIGHT returns HTML with <b> tags. Ensure your frontend renders the output as HTML (not escaped text) to display the highlighting properly.

SAMPLE Random Document Sampling

Returns a random sample of documents from a collection. Useful for testing, data exploration, and machine learning workflows.

SAMPLE(collection, count)

Parameters

collection - Name of the collection to sample from
count - Number of random documents to return

Returns

Array of randomly selected documents (may return fewer if collection has less documents than requested)

Basic Usage

LET random_users = SAMPLE("users", 5)
FOR user IN random_users
  RETURN user.name

ML Training Set

LET training_data = SAMPLE("articles", 1000)
FOR doc IN training_data
  RETURN { text: doc.content, label: doc.category }

Data Exploration

-- Get random products to inspect data quality
LET samples = SAMPLE("products", 10)
FOR p IN samples
  RETURN {
    _key: p._key,
    name: p.name,
    hasPrice: IS_NUMBER(p.price),
    hasCategory: IS_STRING(p.category)
  }

Use Cases

Testing with representative data
Machine learning training/validation splits
Data quality inspection
Building demo datasets
A/B testing candidate selection

Practical Search Examples

-- Fuzzy search with ranking
FOR doc IN products
  LET score = BM25(doc.description, @searchQuery)
  FILTER score > 0.1
  SORT score DESC
  LIMIT 20
  RETURN { name: doc.name, score, snippet: SUBSTRING(doc.description, 0, 100) }

-- Multi-field search with boosting
FOR doc IN articles
  LET titleScore = BM25(doc.title, @query) * 2  -- Boost title matches
  LET contentScore = BM25(doc.content, @query)
  LET totalScore = titleScore + contentScore
  FILTER totalScore > 0
  SORT totalScore DESC
  LIMIT 10
  RETURN { title: doc.title, score: totalScore }

-- Search with category filter
FOR doc IN products
  FILTER doc.category == @category
  LET score = BM25(doc.name, @search)
  FILTER score > 0.5
  SORT score DESC
  RETURN doc

-- Combine fulltext with fuzzy matching
LET fulltextResults = FULLTEXT("products", "name", @query, 2)
FOR result IN fulltextResults
  LET doc = result.doc
  LET relevance = BM25(doc.description, @query)
  SORT relevance DESC
  RETURN { doc, relevance }

-- Autocomplete search
FOR doc IN products
  FILTER doc.name LIKE CONCAT(@prefix, "%")
  LET score = BM25(doc.name, @prefix)
  SORT score DESC
  LIMIT 10
  RETURN { name: doc.name, category: doc.category }

-- Search with highlighting simulation
FOR doc IN articles
  LET score = BM25(doc.content, @query)
  FILTER score > 0
  SORT score DESC
  LIMIT 10
  LET preview = SUBSTRING(doc.content, 0, 200)
  RETURN {
    title: doc.title,
    preview: CONCAT(preview, "..."),
    score
  }

Search Tips

Improving Relevance

Boost important fields (title, name) with multipliers
Use multiple search terms for better matching
Filter by score threshold to remove weak matches
Combine with FILTER for faceted search

Performance Tips

Create fulltext indexes on frequently searched fields
Use LIMIT to restrict result set size
Pre-filter with indexed fields before scoring
Cache common search results

Geo Functions Other Functions