Policy rules
Write rules that decide which requests to allow, block, rate-limit, or scrutinize. Covers evaluation order, matcher fields, pattern syntax (RE2 regex, glob, CIDR), directives, and examples.
A policy rule pairs a matcher (when the rule applies) with a set of directives (what the rule does). Rules live on the Policy page of the dashboard. The validator evaluates them top to bottom in priority order, and the first matching rule wins per directive slot.
How rules are evaluated
Rules are sorted by priority, ascending. Lower numbers run first.
For each request, the validator walks the list and fills five directive slots: verdict, bot_detect, rate_limit, monitor, challenge. The first rule that matches each slot wins.
Once a slot is filled, later rules cannot overwrite it. A high-priority rule can set just bot_detect and let a later rule decide the verdict.
If nothing matches, the default policy applies: allow, with normal bot detection.
A rule flagged is_default matches every request. Use one as a final catch-all at the lowest priority (highest number).
Tip: leave gaps in your priority numbers
Spacing rules at 100, 200, 300, ... lets you slot new rules in later without renumbering.
Matcher fields
A matcher has up to five clauses. All clauses on a rule must match for the rule to apply (logical AND). To express OR, write two rules at the same priority, or use a single regex with |.
| Field | Allowed kinds | Matches |
|---|---|---|
url | literal, glob, regex | The request path. No scheme, no host. |
ua | literal, regex | The User-Agent header. |
ip | literal, cidr | The client IP. IPv4 and IPv6. |
hostname | literal, glob | The Host header. |
crawler | (composite) | What the validator already knows about the request: identification, verification, allowlist status, name, and category. |
crawler clause
The crawler clause has up to five sub-fields, all combined with AND:
identified(boolean): true if the request matches a known crawler signature.verified(boolean): true if the crawler passed ownership verification (reverse DNS, IP range). Requiresidentified=true.allowed(boolean): true if the crawler is on your allowlist. Requiresidentified=true.name: literal or regex match against the crawler name (e.g.Googlebot,GPTBot).category: one ofsearch,seo,ai_training,ai_assistant,ai_search,ai_agent,scraper,archive,monitoring,social_media,aggregator,accessibility,advertising,feed_reader,preview,research,security,other.
Pattern syntax
Each clause has a kind that selects how the pattern is interpreted. Pick the simplest one that fits.
literal
Exact, byte-for-byte string equality. No wildcards, no escaping, no case folding. Use this when you have a single fixed value.
/login
MyApp/1.0
shop.example.comglob (shell-style)
Available for url and hostname. Implemented with gobwas/glob. Supports:
| Token | Meaning |
|---|---|
* | Any run of characters except the path separator (/). |
** | Any run of characters including separators. |
? | Any single character. |
[abc] | Any one of the listed characters. |
{a,b,c} | Alternation. |
/api/* matches /api/users but not /api/users/42
/api/** matches /api/users and /api/users/42
/files/*.json matches /files/data.json
*.example.com matches shop.example.com and api.example.comregex (RE2)
Available for url, ua, and crawler.name. Patterns are compiled with Go's regexp package, which uses RE2 syntax.
RE2 looks like Perl/PCRE but with two notable differences:
- No backreferences (
\1,\2). - No lookaround (
(?=...),(?!...),(?<=...),(?<!...)).
In return, matching runs in linear time. A bad pattern cannot stall the validator the way it can with PCRE.
Regex matches are substring matches
The matcher uses MatchString, which looks for the pattern anywhere in the value. To require a full-string match, anchor with ^ and $. Without anchors, /admin as a regex matches /v2/admin/users too.
(?i)bot|crawler|spider matches anywhere in the UA, any case
^/admin(/|$) path is /admin, or starts with /admin/
^Mozilla/5\.0 \(.*Linux literal dot, parens are escaped
[a-z]{3,8} three to eight lowercase letters
(?P<vendor>googlebot) named capture (works in RE2)For case-insensitive matching, use the inline flag (?i) at the start of the pattern. Patterns that fail to compile are rejected at save time, and the rule is skipped at runtime if compilation ever fails. The compile error appears in the rule editor.
cidr
Available for ip. Uses Go's net.ParseCIDR. IPv4 and IPv6 both work.
192.168.0.0/16
10.0.0.0/8
2001:db8::/32Directives
A rule must set at least one directive. Each directive slot is filled by the first matching rule and cannot be overwritten.
| Directive | Values | Effect |
|---|---|---|
verdict | allow, block | Final decision. allow lets the request through. block returns a 403 (or your configured block response). |
bot_detect | off, low, normal, high | Bot-detection sensitivity. off skips scoring. high is the strictest threshold. |
rate_limit | object | Per-scope budget. Fields: max_requests, window_seconds, scope (session, ip, or session_or_ip), phase (currently only pre). |
monitor | true, false | When true, the rule logs its decision but does not change the verdict. Use it to shadow-test new rules. |
challenge | { kind: string } | Issue an interactive challenge of the given kind. Only fires for non-allowed verdicts. |
Examples
Block obvious scraping clients on the checkout
{
"priority": 100,
"when_matcher": {
"url": { "kind": "glob", "value": "/checkout/**" },
"ua": { "kind": "regex", "value": "(?i)curl|wget|python-requests|httpie" }
},
"set_directives": { "verdict": "block" }
}Always allow verified search engines, no scoring
{
"priority": 50,
"when_matcher": {
"crawler": {
"identified": true,
"verified": true,
"category": "search"
}
},
"set_directives": {
"verdict": "allow",
"bot_detect": "off"
}
}Rate-limit the public API per IP
{
"priority": 200,
"when_matcher": {
"url": { "kind": "glob", "value": "/api/v1/**" }
},
"set_directives": {
"rate_limit": {
"max_requests": 60,
"window_seconds": 60,
"scope": "ip",
"phase": "pre"
}
}
}Shadow-test a new block rule for a week
{
"priority": 80,
"note": "Trial run for AI-agent block, 2026-05",
"when_matcher": {
"crawler": { "identified": true, "category": "ai_agent" }
},
"set_directives": { "verdict": "block", "monitor": true }
}Default fallback at the bottom
{
"priority": 9999,
"when_matcher": { "is_default": true },
"set_directives": { "bot_detect": "normal" }
}Common mistakes
- Forgetting to anchor a regex.
/adminmatches/v2/admin/users. Use^/admin(/|$)if you mean only/adminand its sub-paths. - Treating
.as a literal dot. In regex,.is any character. Escape it as\.inside hostnames or paths. - Confusing
*and**in globs.*does not cross the/separator. Use**for nested paths. - Trying to use lookaround or backreferences. RE2 does not support them. Split the matcher into a glob plus a second clause, or rewrite the pattern.
- Combining clauses expecting OR. Several clauses on one rule are AND. For OR, write two rules at the same priority, or use
|inside a regex. - Skipping
monitor: trueon first deploy. New rules can match more traffic than expected. Run them in monitor mode for a few days, watch the logs, then flipmonitoroff.