Rate Limiting
Rate Limiting controls how much traffic a single visitor can send to your site before they are throttled or blocked. It is one of the cheapest and most effective ways to mitigate scraping, low-effort denial-of-service attempts, and runaway misbehaving crawlers. The defense works at the IP layer: the firewall counts requests per IP per time window, and applies a configured action when the count crosses a threshold.
In This Article
- Configuring rate-limit thresholds
- Humans vs. crawlers
- Known-good bots
- Throttle, block, or 404
- Troubleshooting
Configuring rate-limit thresholds
Open VMP Security → Firewall and scroll to the Rate Limiting card. The card presents three scenarios, each with its own threshold:
- If a visitor’s requests exceed N per minute — the basic throttle. Default 240.
- If a crawler’s requests exceed N per minute — the crawler-specific threshold. Default 120.
- If a visitor’s page-not-found errors exceed N per minute — treats a flood of 404s as a strong signal of a scanner probing for paths.
The defaults are tuned for typical content sites and are usually fine. Lower the thresholds (more aggressive) if you have data showing that legitimate human users never come close to those rates and you want to cut off automated traffic faster. Raise them (more permissive) if your site has legitimate clients that send sustained high request rates — for example, a single-page-app frontend that fetches a lot of small data, or an API client integration.
Humans vs. crawlers
Rate Limiting distinguishes between visitors that look like real browsers and visitors that look like crawlers. The distinction is heuristic: the firewall looks at the User-Agent string, the request pattern (do they fetch CSS and images, or only HTML?), and other behavioral signals.
Two thresholds exist because the right rate limit is different for the two cases. A real browser pulling a page typically issues 30 to 80 requests in a few seconds (HTML, scripts, stylesheets, images, fonts) and then sits idle for a while. A crawler issues a steady trickle of HTML requests and pulls almost no associated assets. Setting a single threshold high enough for the burst pattern of a browser would let crawlers run wild; setting it low enough to constrain crawlers would block the third image on every page for real users.
The User-Agent-based detection is not perfect — an attacker can claim to be a browser — but combined with the behavioral signals, it is good enough that the two-threshold approach catches the cases that matter without false positives in normal browsing.
Known-good bots
Search engine crawlers like Googlebot, Bingbot, and DuckDuckBot are an exception. You generally want them to crawl your site quickly and thoroughly, even if their request rate exceeds the crawler threshold. The rate limiter recognizes the major search engine User-Agents and verifies them by reverse-DNS lookup of the source IP — if the IP’s reverse-DNS does not match the claimed search engine, the bot is treated as a regular crawler and rate-limited normally.
You can extend the known-good list from the bottom of the Rate Limiting page. Add a User-Agent pattern and the reverse-DNS suffix that should authorize it (e.g. googlebot.com or search.msn.com). Any bot that matches both the pattern and a verified reverse-DNS gets exempted from rate limits.
Do not exempt User-Agents alone, without a reverse-DNS check. User-Agent strings are trivially forged; the reverse-DNS check is what makes the exemption safe.
Throttle, block, or 404
When a threshold is crossed, you have three response options:
- Throttle. The firewall slows the visitor down by injecting delays into responses. From the visitor’s perspective, the site simply gets slow. This is the gentlest action and is the right choice for thresholds that may sometimes be crossed by legitimate users.
- Block. The firewall blocks all further requests from that IP for a configurable duration. Visitors get a clear error page explaining the block. This is the right choice for the page-not-found threshold (legitimate users do not flood your site with bad URLs) and for low “crawler” thresholds.
- Return 404. The firewall returns a 404 to all further requests from that IP. The visitor sees the same response they would get for any non-existent URL. This is useful against scanners specifically: it tells the scanner there is nothing here without giving it the rich error page it might use to identify the firewall.
Troubleshooting
A legitimate user got rate-limited
Check the Live Traffic page and find the request that triggered the action. The page shows which threshold was crossed, the request rate at the time, and the IP’s recent history. If the user is genuinely legitimate, raise the relevant threshold or add their IP to the trusted-IP list.
An attacker is rotating IPs faster than rate limits can keep up
Per-IP rate limits are weak against a botnet that can use many IPs. For sustained distributed attacks, the firewall’s per-IP limits will not be enough on their own; layer in Country Blocking, the Real-Time IP Blocklist, and consider putting a CDN or DDoS provider in front of your origin.
Rate limits are not triggering at all
If your site is behind a reverse proxy or CDN and the firewall sees every request as coming from the proxy IP, all rate limits will be applied to the proxy as a single “visitor.” Configure the firewall to read the visitor IP from X-Forwarded-For or the equivalent trusted header from your proxy — this is set on the All Options page under the “How does VMP Security get IPs?” option.