Recently our hosting company disabled our sites, including www.URTech.ca, for a few hours because a “bot” was crawling and scraping the sites which was consuming far too much CPU:
System administration has identified your account as using higher resources on the server housing your account. Unfortunately, this increased resource usage has started to have a negative impact on other customers on the server.
Upon further investigation we have found that a domain on your account is being heavily crawled by specific User Agents: AhrefsBot, Amazonbot, Barkrowler, Bytespider, ClaudeBot, DotBot, Googlebot, GPTBot, MJ12bot, PetalBot, SemrushBot, bingbot, facebookexternalhit, meta-externalagent
…In order to prevent resource overages on your account while avoiding causing you downtime due to an account suspension, we have taken action to block these user agents from accessing the domains on your account. We are able to block specific user agents by adding a rule to your .htaccess file.
To clarify this traffic is not human traffic, but rather traffic generated by automated scripts also referred to as Bots. There are many reasons why a Bot User Agent would be crawling a domain, such as Search Engine Indexing, AI Training, or even malicious actors..
What is a Bad Bot?
Put simply, bad bots are automated scripts that harm websites by scraping content, launching attacks, and wasting server resources.
Stopping bad bots is crucial for site performance and security.
There ar several solutions and here are two you may want to use to solve Bad Bots.
1 – ‘Blackhole for Bad Bots’ WordPress Plugin Explained
The Blackhole for Bad Bots plugin is a simple WordPress security tool that uses a “honeypot” method to block bad bots. Instead of relying on a list of known bad bots, it sets a trap:
- The Trap: The plugin adds a hidden link to the footer of every page on your site.
- This link is invisible to human users
- The Rule: The plugin also adds a rule to your
robots.txtfile, explicitly telling all bots to “Disallow” crawling that specific hidden link - The Action: Legitimate bots (like Googlebot, Bingbot, etc.) will read and obey the
robots.txtrule, so they will never follow the hidden link. However, malicious or “bad” bots often ignorerobots.txtfiles because their purpose is to scrape or spam the site, not to index it for search - The Blackhole: When a bot ignores the
robots.txtrule and follows the hidden link, it “falls into the blackhole.” The plugin then records the bot’s IP address and blocks it from accessing your site entirely
This method is effective because it targets bots based on their behavior rather than their user agent, so it can catch new or unknown bad bots. The plugin also provides a way to view a log of blocked bots and manually add or remove IPs from the blocklist.
2 – Use ‘Wordfence’ To Block Bad Bots
We already have Wordfence installed on our WordPress sites, and just hate to add the complexity of new plugins. Wordfence has a different way of handling bad bots which is to consider the usage created by each visitor, including bots, and throttle or block them if they are consuming too much. We think that is a better solution as is covers more than just bots.
Rate limiting settings are not a one-size-fits-all solution, as the ideal configuration depends on your website’s size, traffic patterns, and the type of content you serve. A small personal blog should have much stricter limits than a large e-commerce site.
The default Wordfence setting of “unlimited” for many of the rules is a safety measure to prevent you from accidentally blocking legitimate users or search engine crawlers. However, with some careful thought, you can set reasonable limits that will significantly improve your bot security.
Here are some suggested starting points for your Wordfence rate-limiting rules, along with an explanation for each. The key is to start with a less aggressive action like “throttle” and then escalate to “block” if you’re confident it’s a malicious bot:
Note that this feature is available on even the free version of Wordfence.
Also remember that a REQUEST is not visitor. Is takes 10 to 20 requests from a visitor to your server to complete the loading of a single webpage.
| Setting | Suggested Value | Action | Rationale |
| If anyone’s requests exceed | 480 per minute | Throttle it | A very safe starting point to control high-volume traffic from bots without affecting legitimate users |
| If a crawler’s page views exceed | 120 per minute | Throttle it | Allows for efficient crawling by legitimate bots while preventing them from overwhelming your server |
| If a crawler’s pages not found (404s) exceed | 30 per minute | Block it | An indicator of malicious bots or vulnerability scanners; a legitimate bot should not generate many 404s |
| If a human’s page views exceed | 180 per minute | Throttle it | A high limit to avoid blocking normal human users, even those on fast connections or shared IPs |
| If a human’s pages not found (404s) exceed | 30 per minute | Block it | Unlikely for a human to generate this many 404s. This helps block human attackers |
The Wrap
Bad bots are automated programs that negatively impact websites by stealing content, carrying out attacks, and consuming server resources. Bad bots need to be controlled.
Wordfence uses rate limiting and a firewall to block bots based on their behavior, such as making too many requests in a short time or hitting a high number of nonexistent pages. This method is highly configurable and works well for controlling aggressive crawlers and other over-usage problems.
Blackhole for Bad Bots uses a unique “honeypot” approach targeted specificly at aggressive bots. It places a hidden link on the site that is disallowed in robots.txt. Legitimate bots obey this rule, while malicious bots that ignore it fall into the trap and are permanently blocked. This is a simple, lightweight method for catching new and unknown bots. While both are effective, their different strategies make them suitable for different use cases.
We hope this helped solve your bad bots problem


0 Comments