Centinel AnalyticaCentinel Analytica

Policy rules

Write rules that decide which requests to allow, block, rate-limit, or scrutinize. Covers evaluation order, matcher fields, pattern syntax (RE2 regex, glob, CIDR), directives, and examples.

A policy rule pairs a matcher (when the rule applies) with a set of directives (what the rule does). Rules live on the Policy page of the dashboard. The validator evaluates them top to bottom in priority order, and the first matching rule wins per directive slot.

How rules are evaluated

Rules are sorted by priority, ascending. Lower numbers run first.

For each request, the validator walks the list and fills five directive slots: verdict, bot_detect, rate_limit, monitor, challenge. The first rule that matches each slot wins.

Once a slot is filled, later rules cannot overwrite it. A high-priority rule can set just bot_detect and let a later rule decide the verdict.

If nothing matches, the default policy applies: allow, with normal bot detection.

A rule flagged is_default matches every request. Use one as a final catch-all at the lowest priority (highest number).

Tip: leave gaps in your priority numbers

Spacing rules at 100, 200, 300, ... lets you slot new rules in later without renumbering.

Matcher fields

A matcher has up to five clauses. All clauses on a rule must match for the rule to apply (logical AND). To express OR, write two rules at the same priority, or use a single regex with |.

FieldAllowed kindsMatches
urlliteral, glob, regexThe request path. No scheme, no host.
ualiteral, regexThe User-Agent header.
ipliteral, cidrThe client IP. IPv4 and IPv6.
hostnameliteral, globThe Host header.
crawler(composite)What the validator already knows about the request: identification, verification, allowlist status, name, and category.

crawler clause

The crawler clause has up to five sub-fields, all combined with AND:

  • identified (boolean): true if the request matches a known crawler signature.
  • verified (boolean): true if the crawler passed ownership verification (reverse DNS, IP range). Requires identified=true.
  • allowed (boolean): true if the crawler is on your allowlist. Requires identified=true.
  • name: literal or regex match against the crawler name (e.g. Googlebot, GPTBot).
  • category: one of search, seo, ai_training, ai_assistant, ai_search, ai_agent, scraper, archive, monitoring, social_media, aggregator, accessibility, advertising, feed_reader, preview, research, security, other.

Pattern syntax

Each clause has a kind that selects how the pattern is interpreted. Pick the simplest one that fits.

literal

Exact, byte-for-byte string equality. No wildcards, no escaping, no case folding. Use this when you have a single fixed value.

/login
MyApp/1.0
shop.example.com

glob (shell-style)

Available for url and hostname. Implemented with gobwas/glob. Supports:

TokenMeaning
*Any run of characters except the path separator (/).
**Any run of characters including separators.
?Any single character.
[abc]Any one of the listed characters.
{a,b,c}Alternation.
/api/*           matches /api/users but not /api/users/42
/api/**          matches /api/users and /api/users/42
/files/*.json    matches /files/data.json
*.example.com    matches shop.example.com and api.example.com

regex (RE2)

Available for url, ua, and crawler.name. Patterns are compiled with Go's regexp package, which uses RE2 syntax.

RE2 looks like Perl/PCRE but with two notable differences:

  • No backreferences (\1, \2).
  • No lookaround ((?=...), (?!...), (?<=...), (?<!...)).

In return, matching runs in linear time. A bad pattern cannot stall the validator the way it can with PCRE.

Regex matches are substring matches

The matcher uses MatchString, which looks for the pattern anywhere in the value. To require a full-string match, anchor with ^ and $. Without anchors, /admin as a regex matches /v2/admin/users too.

(?i)bot|crawler|spider     matches anywhere in the UA, any case
^/admin(/|$)               path is /admin, or starts with /admin/
^Mozilla/5\.0 \(.*Linux    literal dot, parens are escaped
[a-z]{3,8}                 three to eight lowercase letters
(?P<vendor>googlebot)      named capture (works in RE2)

For case-insensitive matching, use the inline flag (?i) at the start of the pattern. Patterns that fail to compile are rejected at save time, and the rule is skipped at runtime if compilation ever fails. The compile error appears in the rule editor.

cidr

Available for ip. Uses Go's net.ParseCIDR. IPv4 and IPv6 both work.

192.168.0.0/16
10.0.0.0/8
2001:db8::/32

Directives

A rule must set at least one directive. Each directive slot is filled by the first matching rule and cannot be overwritten.

DirectiveValuesEffect
verdictallow, blockFinal decision. allow lets the request through. block returns a 403 (or your configured block response).
bot_detectoff, low, normal, highBot-detection sensitivity. off skips scoring. high is the strictest threshold.
rate_limitobjectPer-scope budget. Fields: max_requests, window_seconds, scope (session, ip, or session_or_ip), phase (currently only pre).
monitortrue, falseWhen true, the rule logs its decision but does not change the verdict. Use it to shadow-test new rules.
challenge{ kind: string }Issue an interactive challenge of the given kind. Only fires for non-allowed verdicts.

Examples

Block obvious scraping clients on the checkout

{
  "priority": 100,
  "when_matcher": {
    "url": { "kind": "glob", "value": "/checkout/**" },
    "ua":  { "kind": "regex", "value": "(?i)curl|wget|python-requests|httpie" }
  },
  "set_directives": { "verdict": "block" }
}

Always allow verified search engines, no scoring

{
  "priority": 50,
  "when_matcher": {
    "crawler": {
      "identified": true,
      "verified":   true,
      "category":   "search"
    }
  },
  "set_directives": {
    "verdict": "allow",
    "bot_detect": "off"
  }
}

Rate-limit the public API per IP

{
  "priority": 200,
  "when_matcher": {
    "url": { "kind": "glob", "value": "/api/v1/**" }
  },
  "set_directives": {
    "rate_limit": {
      "max_requests": 60,
      "window_seconds": 60,
      "scope": "ip",
      "phase": "pre"
    }
  }
}

Shadow-test a new block rule for a week

{
  "priority": 80,
  "note": "Trial run for AI-agent block, 2026-05",
  "when_matcher": {
    "crawler": { "identified": true, "category": "ai_agent" }
  },
  "set_directives": { "verdict": "block", "monitor": true }
}

Default fallback at the bottom

{
  "priority": 9999,
  "when_matcher": { "is_default": true },
  "set_directives": { "bot_detect": "normal" }
}

Common mistakes

  • Forgetting to anchor a regex. /admin matches /v2/admin/users. Use ^/admin(/|$) if you mean only /admin and its sub-paths.
  • Treating . as a literal dot. In regex, . is any character. Escape it as \. inside hostnames or paths.
  • Confusing * and ** in globs. * does not cross the / separator. Use ** for nested paths.
  • Trying to use lookaround or backreferences. RE2 does not support them. Split the matcher into a glob plus a second clause, or rewrite the pattern.
  • Combining clauses expecting OR. Several clauses on one rule are AND. For OR, write two rules at the same priority, or use | inside a regex.
  • Skipping monitor: true on first deploy. New rules can match more traffic than expected. Run them in monitor mode for a few days, watch the logs, then flip monitor off.

On this page