Skip to content

katana_public_api_client.helpers.search

katana_public_api_client.helpers.search

Text search utilities with tokenization, fuzzy matching, and relevance scoring.

Provides in-memory search capabilities modeled after the PostgreSQL trigram-based search in katana-tools, adapted for client-side use with difflib.SequenceMatcher.

Usage

from katana_public_api_client.helpers.search import score_match, search_and_rank

Score a single item against a query

score = score_match( query="fox fork", fields={ "sku": ("FOX-FORK-160", 100), "name": ("Fox 36 Factory Fork", 30), }, )

Search and rank a collection

results = search_and_rank( query="fox fork", items=variants, field_extractor=lambda v: { "sku": (v.sku or "", 100), "name": (v.get_display_name(), 30), }, limit=20, )

Functions

score_field(query_tokens, full_query, field_value, weight)

Score a single field against query tokens.

Scoring tiers (as fraction of weight): - 1.0x: Exact match (full field == full query) - 0.8x: Field starts with query - 0.6x: All tokens found as substrings (AND logic) - 0.4x: All tokens fuzzy-match a field token (similarity > threshold) - 0.0x: No match

Parameters:

  • query_tokens (list[str]) –

    Lowercase query tokens.

  • full_query (str) –

    The full query string (lowercase).

  • field_value (str) –

    The field value to score against.

  • weight (int) –

    Maximum points this field can contribute.

Returns:

  • float

    Score between 0.0 and weight.

Source code in katana_public_api_client/helpers/search.py
def score_field(
    query_tokens: list[str],
    full_query: str,
    field_value: str,
    weight: int,
) -> float:
    """Score a single field against query tokens.

    Scoring tiers (as fraction of weight):
    - 1.0x: Exact match (full field == full query)
    - 0.8x: Field starts with query
    - 0.6x: All tokens found as substrings (AND logic)
    - 0.4x: All tokens fuzzy-match a field token (similarity > threshold)
    - 0.0x: No match

    Args:
        query_tokens: Lowercase query tokens.
        full_query: The full query string (lowercase).
        field_value: The field value to score against.
        weight: Maximum points this field can contribute.

    Returns:
        Score between 0.0 and weight.
    """
    if not field_value:
        return 0.0

    field_lower = field_value.lower()

    # Exact match
    if field_lower == full_query:
        return weight

    # Prefix match
    if field_lower.startswith(full_query):
        return weight * 0.8

    # All tokens as substrings
    if all(token in field_lower for token in query_tokens):
        return weight * 0.6

    # Fuzzy match: each query token must fuzzy-match at least one field token
    field_tokens = _tokenize(field_value)
    if field_tokens and all(
        _best_token_similarity(qt, field_tokens) >= FUZZY_THRESHOLD
        for qt in query_tokens
    ):
        # Scale by average similarity
        avg_sim = sum(
            _best_token_similarity(qt, field_tokens) for qt in query_tokens
        ) / len(query_tokens)
        return weight * 0.4 * avg_sim

    return 0.0

score_match(query, fields)

Score an item against a search query across multiple fields.

Parameters:

  • query (str) –

    The search query string.

  • fields (dict[str, tuple[str, int]]) –

    Mapping of field_name -> (field_value, weight). Weight determines how many points this field can contribute. Higher weight = more important field.

Returns:

  • float

    Total relevance score (sum across all fields). Zero means no match.

Example

score = score_match( query="fox fork", fields={ "sku": ("FOX-FORK-160", 100), "name": ("Fox 36 Factory Fork", 30), }, )

Source code in katana_public_api_client/helpers/search.py
def score_match(
    query: str,
    fields: dict[str, tuple[str, int]],
) -> float:
    """Score an item against a search query across multiple fields.

    Args:
        query: The search query string.
        fields: Mapping of field_name -> (field_value, weight).
            Weight determines how many points this field can contribute.
            Higher weight = more important field.

    Returns:
        Total relevance score (sum across all fields). Zero means no match.

    Example:
        score = score_match(
            query="fox fork",
            fields={
                "sku": ("FOX-FORK-160", 100),
                "name": ("Fox 36 Factory Fork", 30),
            },
        )
    """
    query = query.strip()
    if not query:
        return 0.0

    query_lower = query.lower()
    query_tokens = _tokenize(query)
    if not query_tokens:
        return 0.0

    total = 0.0
    for _field_name, (value, weight) in fields.items():
        total += score_field(query_tokens, query_lower, value, weight)

    return total

search_and_rank(query, items, field_extractor, limit=50, min_score=0.0)

Search and rank items by relevance.

Parameters:

  • query (str) –

    Search query string.

  • items (list[T]) –

    List of items to search.

  • field_extractor (Callable[[T], dict[str, tuple[str, int]]]) –

    Function that takes an item and returns a dict of field_name -> (field_value, weight) for scoring.

  • limit (int, default: 50 ) –

    Maximum results to return.

  • min_score (float, default: 0.0 ) –

    Minimum score threshold (default 0 = any match).

Returns:

  • list[T]

    Items sorted by relevance score (highest first), limited to top N.

Example

results = search_and_rank( query="fox fork", items=all_variants, field_extractor=lambda v: { "sku": (v.sku or "", 100), "name": (v.get_display_name(), 30), }, limit=20, )

Source code in katana_public_api_client/helpers/search.py
def search_and_rank[T](
    query: str,
    items: list[T],
    field_extractor: Callable[[T], dict[str, tuple[str, int]]],
    limit: int = 50,
    min_score: float = 0.0,
) -> list[T]:
    """Search and rank items by relevance.

    Args:
        query: Search query string.
        items: List of items to search.
        field_extractor: Function that takes an item and returns a dict of
            field_name -> (field_value, weight) for scoring.
        limit: Maximum results to return.
        min_score: Minimum score threshold (default 0 = any match).

    Returns:
        Items sorted by relevance score (highest first), limited to top N.

    Example:
        results = search_and_rank(
            query="fox fork",
            items=all_variants,
            field_extractor=lambda v: {
                "sku": (v.sku or "", 100),
                "name": (v.get_display_name(), 30),
            },
            limit=20,
        )
    """
    query = query.strip()
    if not query:
        return []

    scored: list[tuple[T, float]] = []
    for item in items:
        fields = field_extractor(item)
        score = score_match(query, fields)
        if score > min_score:
            scored.append((item, score))

    scored.sort(key=lambda x: x[1], reverse=True)
    return [item for item, _score in scored[:limit]]