Fulltext Search Functions

Fulltext search capabilities with n-gram indexing and BM25 relevance scoring.

Fulltext Search Operations

SDBQL provides powerful fulltext search capabilities using n-gram indexing for fuzzy matching and BM25 for relevance scoring.

FULLTEXT N-Gram Fulltext Search

Fuzzy search using n-gram indexing. Returns matching documents with similarity scores.

FULLTEXT(collection, field, query, distance?)

Parameters

  • collection - Name of the collection to search
  • field - Field name to search within
  • query - Search query string
  • distance - Optional max edit distance (default: 2)

Example

LET matches = FULLTEXT("articles", "title", "rust", 2)
FOR m IN matches
  RETURN m.doc

N-Gram Indexing

FULLTEXT uses n-gram (trigram) indexing to enable fuzzy matching. This allows finding documents even when there are typos or slight variations in the search query.

BM25 BM25 Relevance Scoring

BM25 relevance scoring for ranking search results. Returns a numeric score that can be used in SORT clauses.

BM25(field, query)

BM25 Algorithm

BM25 (Best Matching 25) is a probabilistic ranking function that provides sophisticated relevance scoring. It considers:

  • Term frequency (TF) - How often the term appears in the document
  • Inverse document frequency (IDF) - How rare the term is across all documents
  • Document length - Normalizes for shorter vs longer documents

Basic Scoring

FOR doc IN articles
  RETURN {
    title: doc.title,
    score: BM25(doc.content, "machine learning")
  }

Sort by Relevance

FOR doc IN articles
  SORT BM25(doc.content, "rust database") DESC
  LIMIT 10
  RETURN doc

Combined with Filters

FOR doc IN articles
  FILTER doc.published == true
  LET score = BM25(doc.content, "query optimization")
  FILTER score > 0.1
  SORT score DESC
  LIMIT 5
  RETURN {title: doc.title, score}

HIGHLIGHT Search Result Highlighting

Wraps matched search terms in HTML bold tags for highlighting in search results.

HIGHLIGHT(text, terms)

Parameters

  • text - The text to search and highlight within
  • terms - Search term string or array of terms to highlight

Returns

Text with matched terms wrapped in <b> tags (case-insensitive matching)

Single Term

RETURN HIGHLIGHT("Hello World", "world")
-- "Hello <b>World</b>"

Multiple Terms

RETURN HIGHLIGHT("Rust database engine", ["rust", "engine"])
-- "<b>Rust</b> database <b>engine</b>"

Usage in Search Results

FOR doc IN articles
  LET score = BM25(doc.content, @query)
  FILTER score > 0.1
  SORT score DESC
  LIMIT 10
  RETURN {
    title: HIGHLIGHT(doc.title, @query),
    preview: HIGHLIGHT(SUBSTRING(doc.content, 0, 200), @query),
    score
  }

HTML Output

HIGHLIGHT returns HTML with <b> tags. Ensure your frontend renders the output as HTML (not escaped text) to display the highlighting properly.

SAMPLE Random Document Sampling

Returns a random sample of documents from a collection. Useful for testing, data exploration, and machine learning workflows.

SAMPLE(collection, count)

Parameters

  • collection - Name of the collection to sample from
  • count - Number of random documents to return

Returns

Array of randomly selected documents (may return fewer if collection has less documents than requested)

Basic Usage

LET random_users = SAMPLE("users", 5)
FOR user IN random_users
  RETURN user.name

ML Training Set

LET training_data = SAMPLE("articles", 1000)
FOR doc IN training_data
  RETURN { text: doc.content, label: doc.category }

Data Exploration

-- Get random products to inspect data quality
LET samples = SAMPLE("products", 10)
FOR p IN samples
  RETURN {
    _key: p._key,
    name: p.name,
    hasPrice: IS_NUMBER(p.price),
    hasCategory: IS_STRING(p.category)
  }

Use Cases

  • Testing with representative data
  • Machine learning training/validation splits
  • Data quality inspection
  • Building demo datasets
  • A/B testing candidate selection

Practical Search Examples

-- Fuzzy search with ranking
FOR doc IN products
  LET score = BM25(doc.description, @searchQuery)
  FILTER score > 0.1
  SORT score DESC
  LIMIT 20
  RETURN { name: doc.name, score, snippet: SUBSTRING(doc.description, 0, 100) }

-- Multi-field search with boosting
FOR doc IN articles
  LET titleScore = BM25(doc.title, @query) * 2  -- Boost title matches
  LET contentScore = BM25(doc.content, @query)
  LET totalScore = titleScore + contentScore
  FILTER totalScore > 0
  SORT totalScore DESC
  LIMIT 10
  RETURN { title: doc.title, score: totalScore }

-- Search with category filter
FOR doc IN products
  FILTER doc.category == @category
  LET score = BM25(doc.name, @search)
  FILTER score > 0.5
  SORT score DESC
  RETURN doc

-- Combine fulltext with fuzzy matching
LET fulltextResults = FULLTEXT("products", "name", @query, 2)
FOR result IN fulltextResults
  LET doc = result.doc
  LET relevance = BM25(doc.description, @query)
  SORT relevance DESC
  RETURN { doc, relevance }

-- Autocomplete search
FOR doc IN products
  FILTER doc.name LIKE CONCAT(@prefix, "%")
  LET score = BM25(doc.name, @prefix)
  SORT score DESC
  LIMIT 10
  RETURN { name: doc.name, category: doc.category }

-- Search with highlighting simulation
FOR doc IN articles
  LET score = BM25(doc.content, @query)
  FILTER score > 0
  SORT score DESC
  LIMIT 10
  LET preview = SUBSTRING(doc.content, 0, 200)
  RETURN {
    title: doc.title,
    preview: CONCAT(preview, "..."),
    score
  }

Search Tips

Improving Relevance
  • Boost important fields (title, name) with multipliers
  • Use multiple search terms for better matching
  • Filter by score threshold to remove weak matches
  • Combine with FILTER for faceted search
Performance Tips
  • Create fulltext indexes on frequently searched fields
  • Use LIMIT to restrict result set size
  • Pre-filter with indexed fields before scoring
  • Cache common search results