Recently our hosting company disabled our sites, including www.URTech.ca, for a few hours because a “bot” was crawling and scraping the sites which was consuming far too much CPU:

System administration has identified your account as using higher resources on the server housing your account.  Unfortunately, this increased resource usage has started to have a negative impact on other customers on the server.

Upon further investigation we have found that a domain on your account is being heavily crawled by specific User Agents: AhrefsBot, Amazonbot, Barkrowler, Bytespider, ClaudeBot, DotBot, Googlebot, GPTBot, MJ12bot, PetalBot, SemrushBot, bingbot, facebookexternalhit, meta-externalagent

…In order to prevent resource overages on your account while avoiding causing you downtime due to an account suspension, we have taken action to block these user agents from accessing the domains on your account.  We are able to block specific user agents by adding a rule to your .htaccess file.

To clarify this traffic is not human traffic, but rather traffic generated by automated scripts also referred to as Bots.  There are many reasons why a Bot User Agent would be crawling a domain, such as Search Engine Indexing, AI Training, or even malicious actors..

What is a Bad Bot?

bad bots taking too much

Put simply, bad bots are automated scripts that harm websites by scraping content, launching attacks, and wasting server resources.

Stopping bad bots is crucial for site performance and security.

There ar several solutions and here are two you may want to use to solve Bad Bots.

1 – ‘Blackhole for Bad Bots’ WordPress Plugin Explained

The Blackhole for Bad Bots plugin is a simple WordPress security tool that uses a “honeypot” method to block bad bots. Instead of relying on a list of known bad bots, it sets a trap:

  1. The Trap: The plugin adds a hidden link to the footer of every page on your site.
    • This link is invisible to human users
  2. The Rule: The plugin also adds a rule to your robots.txt file, explicitly telling all bots to “Disallow” crawling that specific hidden link
  3. The Action: Legitimate bots (like Googlebot, Bingbot, etc.) will read and obey the robots.txt rule, so they will never follow the hidden link. However, malicious or “bad” bots often ignore robots.txt files because their purpose is to scrape or spam the site, not to index it for search
  4. The Blackhole: When a bot ignores the robots.txt rule and follows the hidden link, it “falls into the blackhole.” The plugin then records the bot’s IP address and blocks it from accessing your site entirely

This method is effective because it targets bots based on their behavior rather than their user agent, so it can catch new or unknown bad bots. The plugin also provides a way to view a log of blocked bots and manually add or remove IPs from the blocklist.

2 – Use ‘Wordfence’ To Block Bad Bots

We already have Wordfence installed on our WordPress sites, and just hate to add the complexity of new plugins. Wordfence has a different way of handling bad bots which is to consider the usage created by each visitor, including bots, and throttle or block them if they are consuming too much. We think that is a better solution as is covers more than just bots.

Rate limiting settings are not a one-size-fits-all solution, as the ideal configuration depends on your website’s size, traffic patterns, and the type of content you serve. A small personal blog should have much stricter limits than a large e-commerce site.

The default Wordfence setting of “unlimited” for many of the rules is a safety measure to prevent you from accidentally blocking legitimate users or search engine crawlers. However, with some careful thought, you can set reasonable limits that will significantly improve your bot security.

wordfence rate limiting limits

Here are some suggested starting points for your Wordfence rate-limiting rules, along with an explanation for each. The key is to start with a less aggressive action like “throttle” and then escalate to “block” if you’re confident it’s a malicious bot:

Note that this feature is available on even the free version of Wordfence.

Also remember that a REQUEST is not visitor. Is takes 10 to 20 requests from a visitor to your server to complete the loading of a single webpage.


SettingSuggested ValueActionRationale
If anyone’s requests exceed480 per minuteThrottle itA very safe starting point to control high-volume traffic from bots without affecting legitimate users
If a crawler’s page views exceed120 per minuteThrottle itAllows for efficient crawling by legitimate bots while preventing them from overwhelming your server
If a crawler’s pages not found (404s) exceed30 per minuteBlock itAn indicator of malicious bots or vulnerability scanners; a legitimate bot should not generate many 404s
If a human’s page views exceed180 per minuteThrottle itA high limit to avoid blocking normal human users, even those on fast connections or shared IPs
If a human’s pages not found (404s) exceed30 per minuteBlock itUnlikely for a human to generate this many 404s. This helps block human attackers

The Wrap

Bad bots are automated programs that negatively impact websites by stealing content, carrying out attacks, and consuming server resources. Bad bots need to be controlled.

Wordfence uses rate limiting and a firewall to block bots based on their behavior, such as making too many requests in a short time or hitting a high number of nonexistent pages. This method is highly configurable and works well for controlling aggressive crawlers and other over-usage problems.

Blackhole for Bad Bots uses a unique “honeypot” approach targeted specificly at aggressive bots. It places a hidden link on the site that is disallowed in robots.txt. Legitimate bots obey this rule, while malicious bots that ignore it fall into the trap and are permanently blocked. This is a simple, lightweight method for catching new and unknown bots. While both are effective, their different strategies make them suitable for different use cases.

We hope this helped solve your bad bots problem



0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *