String Functions

20+ string manipulation functions including fuzzy matching, phonetic algorithms, and text processing.

STARTS_WITH(str, prefix)

Check if string starts with prefix.

RETURN STARTS_WITH("hello", "he") --true
ENDS_WITH(str, suffix)

Check if string ends with suffix.

RETURN ENDS_WITH("file.txt", ".txt") --true
PAD_LEFT(str, len, char?)

Pad string from left. Alias: LPAD.

RETURN PAD_LEFT("42", 5, "0") --"00042"
PAD_RIGHT(str, len, char?)

Pad string from right. Alias: RPAD.

RETURN PAD_RIGHT("hi", 5) --"hi   "
REPEAT(str, count)

Repeat string n times.

RETURN REPEAT("ab", 3) --"ababab"
CAPITALIZE(str)

Capitalize first letter of string.

RETURN CAPITALIZE("hello") --"Hello"
TITLE_CASE(str)

Title case all words. Alias: INITCAP.

RETURN TITLE_CASE("hello world") --"Hello World"
ENCODE_URI(str)

URL encode string. Alias: URL_ENCODE.

RETURN ENCODE_URI("hello world") --"hello%20world"
DECODE_URI(str)

URL decode string. Alias: URL_DECODE.

RETURN DECODE_URI("hello%20world") --"hello world"
WORD_COUNT(str)

Count words in string.

RETURN WORD_COUNT("hello world") --2
TRUNCATE_TEXT(str, len, suffix?)

Truncate with ellipsis. Alias: ELLIPSIS.

RETURN TRUNCATE_TEXT("Hello World", 8) --"Hello..."
MASK(str, start?, end?, char?)

Mask string for PII protection.

RETURN MASK("4111111111111111", 0, -4) --"************1111"
CONCAT(str1, ...)

Concatenates strings.

RETURN CONCAT("A", "B") --"AB"
CONCAT_SEPARATOR(sep, arr)

Joins array with separator.

RETURN CONCAT_SEPARATOR(",", ["A","B"])
LENGTH(str)

Returns string length.

RETURN LENGTH("Hello") --5
SUBSTRING(str, start, len?)

Extracts substring.

RETURN SUBSTRING("Hello", 0, 2) --"He"
LOWER(str) / UPPER(str)

Case conversion.

RETURN LOWER("Hi") --"hi"
LEFT(str, n) / RIGHT(str, n)

Extracts n characters from start or end.

RETURN LEFT("Hello", 2) --"He"
CHAR_LENGTH(str)

Returns number of characters (Unicode aware).

RETURN CHAR_LENGTH("hello") --5
TRIM(val, type?/chars?)

Trims whitespace or chars. type: 0=both, 1=left, 2=right.

RETURN TRIM(" foo ") --"foo"
RETURN TRIM("--foo--", "-") --"foo"
LTRIM(val, chars?) / RTRIM(val, chars?)

Trims from start/end.

RETURN LTRIM(" foo ") --"foo "
SPLIT(val, sep, limit?)

Splits string into array.

RETURN SPLIT("foo-bar-baz", "-") --["foo", "bar", "baz"]
SUBSTITUTE(val, search, replace, limit?)

Replaces occurrences of search with replace.

RETURN SUBSTITUTE("foobar", "foo", "baz") --"bazbar"
CONTAINS(text, search, returnIndex?)

Checks if search string is in text. Returns boolean or index.

RETURN CONTAINS("foobar", "bar") --true
RETURN CONTAINS("foobar", "bar", true) --3
FIND_FIRST(str, search) / FIND_LAST

Returns index of first/last occurrence.

RETURN FIND_FIRST("hello", "l") --2
REGEX_TEST(str, pattern)

Tests if string matches regex. Alias: REGEX_MATCH.

RETURN REGEX_TEST("abc", "a.*") --true
REGEX_REPLACE(text, pattern, replacement)

Replace occurrences of a pattern in a string.

RETURN REGEX_REPLACE("the quick brown fox", "the (.*) fox", "a $1 dog")
JSON_PARSE(text)

Parses JSON string to value. Returns NULL on error.

RETURN JSON_PARSE("{\"a\":1}") --{a: 1}
JSON_STRINGIFY(value)

Serializes value to JSON string.

RETURN JSON_STRINGIFY({a: 1}) --"{\"a\":1}"

Fuzzy Matching & Similarity

LEVENSHTEIN(s1, s2)

Calculates edit distance between strings.

RETURN LEVENSHTEIN("foo", "bar") --3
SIMILARITY(s1, s2)

Returns trigram similarity score (0.0 to 1.0).

RETURN SIMILARITY("hello", "hello") --1.0
RETURN SIMILARITY("hello", "hallo") --~0.6
FUZZY_MATCH(text, pattern, max_distance?)

Returns true if text matches pattern within edit distance. Default max_distance is 2.

RETURN FUZZY_MATCH("hello", "hello", 0) --true
RETURN FUZZY_MATCH("hello", "hallo", 1) --true
RETURN FUZZY_MATCH("jonathan", "jonathen") --true (default distance 2)

FOR doc IN users
  FILTER FUZZY_MATCH(doc.name, @search, 2)
  RETURN doc

Phonetic Functions

SOUNDEX(str, locale?)

Returns phonetic code for name matching. Supports locales: en, de, fr, es, it, pt, nl, el, ja.

-- English (default)
RETURN SOUNDEX("Smith") == SOUNDEX("Smyth") --true

-- German (de) - Cologne Phonetic
RETURN SOUNDEX("Mueller", "de") == SOUNDEX("Muller", "de") --true

-- French (fr)
RETURN SOUNDEX("Dupont", "fr") == SOUNDEX("Dupon", "fr") --true
METAPHONE(str)

Returns phonetic encoding using Metaphone algorithm.

RETURN METAPHONE("Katherine") --"K0RN"
RETURN METAPHONE("Wright") == METAPHONE("Right") --true
DOUBLE_METAPHONE(str)

Returns array with [primary, secondary] phonetic codes.

RETURN DOUBLE_METAPHONE("Schmidt") --["SMT", "SMT"]
COLOGNE(str)

Cologne Phonetic algorithm for German names.

RETURN COLOGNE("Mueller") == COLOGNE("Muller") --true
CAVERPHONE(str)

Caverphone algorithm for European surnames.

RETURN CAVERPHONE("Lee") == CAVERPHONE("Leigh") --true
NYSIIS(str)

New York State algorithm. Accurate for various ethnic origins.

RETURN NYSIIS("MacDonald") == NYSIIS("McDonald") --true

Text Transformation

Functions for transforming and sanitizing text input for URLs, forms, and security.

SLUGIFY URL-Friendly Slug Generation

Converts text to a URL-friendly slug: lowercase, spaces become hyphens, special characters removed.

SLUGIFY(text)

Basic Usage

RETURN SLUGIFY("Hello World!")
-- "hello-world"

RETURN SLUGIFY("My Blog Post Title")
-- "my-blog-post-title"

Special Characters

RETURN SLUGIFY("Café & Restaurant")
-- "cafe-restaurant"

RETURN SLUGIFY("Product #42: 50% Off!")
-- "product-42-50-off"

Use in Insert

INSERT {
  title: @title,
  slug: SLUGIFY(@title),
  content: @content
} INTO articles

SANITIZE Input Sanitization

Cleans and sanitizes input strings using one or more operations. Supports chaining multiple sanitization options.

SANITIZE(text, options?)

Available Options

  • "trim" - Remove leading/trailing whitespace (default)
  • "lowercase" / "lower" - Convert to lowercase
  • "uppercase" / "upper" - Convert to uppercase
  • "alphanumeric" / "alnum" - Keep only letters and numbers
  • "alpha" - Keep only letters
  • "numeric" / "digits" - Keep only numbers (and . -)
  • "email" - Sanitize email (lowercase, valid chars only)
  • "url" - Keep only URL-safe characters
  • "html" - Escape HTML entities
  • "strip_html" - Remove HTML tags
  • "normalize" - Collapse multiple spaces to single space

Single Option

RETURN SANITIZE("  Hello  ")
-- "Hello" (default: trim)

RETURN SANITIZE("Hello World", "uppercase")
-- "HELLO WORLD"

RETURN SANITIZE("abc123!@#", "alphanumeric")
-- "abc123"

Multiple Options (Array)

RETURN SANITIZE("  [email protected]  ", ["trim", "lowercase"])
-- "[email protected]"

RETURN SANITIZE("  John  Doe  ", ["trim", "normalize"])
-- "John Doe"

Security Sanitization

RETURN SANITIZE("<script>alert('xss')</script>", "strip_html")
-- "alert('xss')"

RETURN SANITIZE("<b>Bold</b> text", "html")
-- "&lt;b&gt;Bold&lt;/b&gt; text"

Form Input Sanitization

INSERT {
  email: SANITIZE(@email, "email"),
  name: SANITIZE(@name, ["trim", "normalize"]),
  phone: SANITIZE(@phone, "numeric"),
  bio: SANITIZE(@bio, ["trim", "strip_html"])
} INTO users

Security Best Practice

Always sanitize user input before storing. Use "html" to escape output displayed in HTML, or "strip_html" to remove tags entirely. Combine with validation for defense in depth.