Your web site is sort of a espresso store. Individuals are available in and browse the menu. Some order lattes, sit, sip, and go away.
However what if half your “clients” simply occupy tables, waste your baristas’ time, and by no means purchase espresso?
In the meantime, actual clients go away attributable to no tables and sluggish service?
Nicely, that’s the world of net crawlers and bots.
These automated packages gobble up your bandwidth, decelerate your website, and drive away precise clients.
Current research present that nearly 51% of web visitors comes from bots. That’s proper — greater than half of your digital guests could be losing your server assets.
However don’t panic!
This information will allow you to spot hassle and management your website’s efficiency, all with out coding or calling your techy cousin.
A Fast Refresher on Bots
Bots are automated software program packages that carry out duties on the web with out human intervention. They:
- Go to web sites
- Work together with digital content material
- And execute particular features primarily based on their programming.
Some bots analyze and index your website (doubtlessly bettering search engine rankings.) Some spend their time scraping your content material for AI coaching datasets — or worse — posting spam, producing pretend evaluations, or searching for exploits and safety holes in your web site.
After all, not all bots are created equal. Some are crucial to the well being and visibility of your web site. Others are arguably impartial, and some are downright poisonous. Realizing the distinction — and deciding which bots to dam and which to permit — is essential for safeguarding your website and its fame.
Good Bot, Unhealthy Bot: What’s What?
Bots make up the web.
As an illustration, Google’s bot visits each web page on the web and provides it to their databases for rating. This bot assists in offering priceless search visitors, which is vital for the well being of your web site.
However, not each bot goes to supply worth, and a few are simply outright unhealthy. Right here’s what to maintain and what to dam.
The VIP Bots (Preserve These)
- Search engine crawlers like Googlebot and Bingbot are examples of those crawlers. Don’t block them, otherwise you’ll change into invisible on-line.
- Analytics bots collect information about your website’s efficiency, just like the Google Pagespeed Insights bot or the GTmetrix bot.
The Troublemakers (Want Managing)
- Content material scrapers that steal your content material to be used elsewhere
- Spam bots that flood your varieties and feedback with junk
- Unhealthy actors who try to hack accounts or exploit vulnerabilities
The unhealthy bots scale may shock you. In 2024, superior bots made up 55% of all superior unhealthy bot visitors, whereas good ones accounted for 44%.
These superior bots are sneaky — they will mimic human habits, together with mouse actions and clicks, making them tougher to detect.
Are Bots Bogging Down Your Web site? Search for These Warning Indicators
Earlier than leaping into options, let’s ensure that bots are literally your drawback. Take a look at the indicators beneath.
Crimson Flags in Your Analytics
- Site visitors spikes with out rationalization: In case your customer depend all of the sudden jumps however gross sales don’t, bots is likely to be the perpetrator.
- All the pieces s-l-o-w-s down: Pages take longer to load, irritating actual clients who may go away for good. Aberdeen reveals that 40% of holiday makers abandon web sites that take over three seconds to load, which ends up in…
- Excessive bounce charges: above 90% usually point out bot exercise.
- Bizarre session patterns: People don’t usually go to for simply milliseconds or keep on one web page for hours.
- You begin getting a number of uncommon visitors: Particularly from international locations the place you don’t do enterprise. That’s suspicious.
- Kind submissions with random textual content: Traditional bot habits.
- Your server will get overwhelmed: Think about seeing 100 clients directly, however 75 are simply window buying.
Examine Your Server Logs
Your web site’s server logs comprise information of each customer.
Right here’s what to search for:
- Too many subsequent requests from the identical IP deal with
- Unusual user-agent strings (the identification that bots present)
- Requests for uncommon URLs that don’t exist in your website
Consumer Agent
A consumer agent is a kind of software program that retrieves and renders net content material in order that customers can work together with it. The commonest examples are net browsers and e-mail readers.
A official Googlebot request may appear to be this in your logs:
66.249.78.17 - - [13/Jul/2015:07:18:58 -0400] "GET /robots.txt HTTP/1.1" 200 0 "-" "Mozilla/5.0 (suitable; Googlebot/2.1; +http://www.google.com/bot.html)"
When you see patterns that don’t match regular human looking habits, it’s time to take motion.
The GPTBot Drawback as AI Crawlers Surge
Not too long ago, many web site homeowners have reported points with AI crawlers producing irregular visitors patterns.
In line with Imperva’s analysis, OpenAI’s GPTBot made 569 million requests in a single month whereas Claude’s bot made 370 million throughout Vercel’s community.
Search for:
- Error spikes in your logs: When you all of the sudden see a whole bunch or hundreds of 404 errors, test in the event that they’re from AI crawlers.
- Extraordinarily lengthy, nonsensical URLs: AI bots may request weird URLs like the next:
/Odonto-lieyectoresli-541.aspx/property/js/plugins/Docs/Productos/property/js/Docs/Productos/property/js/property/js/property/js/vendor/images2021/Docs/...
- Recursive parameters: Search for countless repeating parameters, for instance:
amp;amp;amp;web page=6&web page=6
- Bandwidth spikes: Readthedocs, a famend technical documentation firm, acknowledged that one AI crawler downloaded 73TB of ZIP recordsdata, with 10TB downloaded in a single day, costing them over $5,000 in bandwidth costs.
These patterns can point out AI crawlers which are both malfunctioning or being manipulated to trigger issues.
When To Get Technical Assist
When you spot these indicators however don’t know what to do subsequent, it’s time to usher in skilled assist. Ask your developer to test particular consumer brokers like this one:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; suitable; GPTBot/1.2; +https://openai.com/gptbot)
There are numerous recorded consumer agent strings for different AI crawlers that you may lookup on Google to dam. Do observe that the strings change, which means you may find yourself with fairly a big checklist over time.
👉 Don’t have a developer on pace dial? DreamHost’s DreamCare workforce can analyze your logs and implement safety measures. They’ve seen these points earlier than and know precisely easy methods to deal with them.
Now for the nice half: easy methods to cease these bots from slowing down your website. Roll up your sleeves and let’s get to work.
1. Create a Correct robots.txt File

The robots.txt easy textual content file sits in your root listing and tells well-behaved bots which components of your website they shouldn’t entry.
You possibly can entry the robots.txt for just about any web site by including a /robots.txt to its area. As an illustration, if you wish to see the robots.txt file for DreamHost, add robots.txt on the finish of the area like this: https://dreamhost.com/robots.txt
There’s no obligation for any of the bots to simply accept the foundations.
However well mannered bots will respect it, and the troublemakers can select to disregard the foundations. It’s finest so as to add a robots.txt anyway so the nice bots don’t begin indexing admin login, post-checkout pages, thanks pages, and so forth.
How you can Implement
1. Create a plain textual content file named robots.txt
2. Add your directions utilizing this format:
Consumer-agent: * # This line applies to all bots
Disallow: /admin/ # Do not crawl the admin space
Disallow: /personal/ # Keep out of personal folders
Crawl-delay: 10 # Wait 10 seconds between requests
Consumer-agent: Googlebot # Particular guidelines only for Google
Enable: / # Google can entry all the things
3. Add the file to your web site’s root listing (so it’s at yourdomain.com/robots.txt)
The “Crawl-delay” directive is your secret weapon right here. It forces bots to attend between requests, stopping them from hammering your server.
Most main crawlers respect this, though Googlebot follows its personal system (which you’ll management by means of Google Search Console).
Professional tip: Take a look at your robots.txt with Google’s robots.txt testing instrument to make sure you haven’t by accident blocked vital content material.
2. Set Up Price Limiting
Price limiting restricts what number of requests a single customer could make inside a selected interval.
It prevents bots from overwhelming your server so regular people can browse your website with out interruption.
How you can Implement
When you’re utilizing Apache (widespread for WordPress websites), add these traces to your .htaccess file:
RewriteEngine On
RewriteCond %{REQUEST_URI} !(.css|.js|.png|.jpg|.gif|robots.txt)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^Googlebot [NC]
RewriteCond %{HTTP_USER_AGENT} !^Bingbot [NC]
# Enable max 3 requests in 10 seconds per IP
RewriteCond %{REMOTE_ADDR} ^([0-9]+.[0-9]+.[0-9]+.[0-9]+)$
RewriteRule .* - [F,L]
.htaccess
“.htaccess” is a configuration file utilized by the Apache net server software program. The .htaccess file incorporates directives (directions) that inform Apache easy methods to behave for a selected web site or listing.
When you’re on Nginx, add this to your server configuration:
limit_req_zone $binary_remote_addr zone=one:10m price=30r/m;
server {
...
location / {
limit_req zone=one burst=5;
...
}
}
Many internet hosting management panels, like cPanel or Plesk, additionally supply rate-limiting instruments of their safety sections.
Professional tip: Begin with conservative limits (like 30 requests per minute) and monitor your website. You possibly can at all times tighten restrictions if bot visitors continues.
3. Use a Content material Supply Community (CDN)
CDNs do two good issues for you:
- Distribute content material throughout international server networks so your web site is delivered rapidly worldwide
- Filter visitors earlier than it reaches the web site to dam any irrelevant bots and assaults
The “irrelevant bots” half is what issues to us for now, however the different advantages are helpful too. Most CDNs embody built-in bot administration that identifies and blocks suspicious guests robotically.
How you can Implement
- Join a CDN service like DreamHost CDN, Cloudflare, Amazon CloudFront, or Fastly.
- Observe the setup directions (might require altering identify servers).
- Configure the safety settings to allow bot safety.
In case your internet hosting service gives a CDN by default, you eradicate all of the steps since your web site will robotically be hosted on CDN.
As soon as arrange, your CDN will:
- Cache static content material to cut back server load.
- Filter suspicious visitors earlier than it reaches your website.
- Apply machine studying to distinguish between official and malicious requests.
- Block identified malicious actors robotically.
Professional tip: Cloudflare’s free tier contains primary bot safety that works effectively for many small enterprise websites. Their paid plans supply extra superior choices if you happen to want them.
4. Add CAPTCHA for Delicate Actions

CAPTCHAs are these little puzzles that ask you to establish visitors lights or bicycles. They’re annoying for people however almost unattainable for many bots, making them good gatekeepers for vital areas of your website.
How you can Implement
- Join Google’s reCAPTCHA (free) or hCaptcha.
- Add the CAPTCHA code to your delicate varieties:
- Login pages
- Contact varieties
- Checkout processes
- Remark sections
For WordPress customers, plugins like Akismet can deal with this robotically for feedback and type submissions.
Professional tip: Fashionable invisible CAPTCHAs (like reCAPTCHA v3) work behind the scenes for many guests, solely exhibiting challenges to suspicious customers. Use this technique to achieve safety with out annoying official clients.
5. Take into account the New llms.txt Customary

The llms.txt normal is a current growth that controls how AI crawlers work together along with your content material.
It’s like robots.txt however particularly for telling AI methods what data they will entry and what they need to keep away from.
How you can Implement
1. Create a markdown file named llms.txt with this content material construction:
# Your Web site Identify
> Transient description of your website
## Principal Content material Areas
- [Product Pages](https://yoursite.com/merchandise): Details about merchandise
- [Blog Articles](https://yoursite.com/weblog): Instructional content material
## Restrictions
- Please do not use our pricing data in coaching
2. Add it to your root listing (at yourdomain.com/llms.txt) → Attain out to a developer if you happen to don’t have direct entry to the server.
Is llms.txt the official normal? Not but.
It’s an ordinary proposed in late 2024 by Jeremy Howard, which has been adopted by Zapier, Stripe, Cloudflare, and lots of different massive firms. Right here’s a rising checklist of internet sites adopting llms.txt.
So, if you wish to soar on board, they’ve official documentation on GitHub with implementation pointers.
Professional tip: As soon as carried out, see if ChatGPT (with net search enabled) can entry and perceive the llms.txt file.

Confirm that the llms.txt is accessible to those bots by asking ChatGPT (or one other LLM) to “Examine if you happen to can learn this web page” or “What does the web page say.”
We are able to’t know if the bots will respect llms.txt anytime quickly. Nonetheless, if the AI search can learn and perceive the llms.txt file now, they could begin respecting it sooner or later, too.
Monitoring and Sustaining Your Website’s Bot Safety
So that you’ve arrange your bot defenses — superior work!
Simply remember the fact that bot expertise is at all times evolving, which means bots come again with new tips. Let’s ensure that your website stays protected for the lengthy haul.
- Schedule common safety check-ups: As soon as a month, take a look at your server logs for something fishy and ensure your robots.txt and llms.txt recordsdata are up to date with any new web page hyperlinks that you simply’d just like the bots to entry/not entry.
- Preserve your bot blocklist recent: Bots preserve altering their disguises. Observe safety blogs (or let your internet hosting supplier do it for you) and replace your blocking guidelines at common intervals.
- Watch your pace: Bot safety that slows your website to a crawl isn’t doing you any favors. Keep watch over your web page load instances and fine-tune your safety if issues begin getting sluggish. Bear in mind, actual people are impatient creatures!
- Take into account occurring autopilot: If all this feels like an excessive amount of work (we get it, you have got a enterprise to run!), look into automated options or managed internet hosting that handles safety for you. Generally the very best DIY is DIFM — Do It For Me!
A Bot-Free Web site Whereas You Sleep? Sure, Please!
Pat your self on the again. You’ve coated numerous floor right here!
Nonetheless, even with our step-by-step steerage, these things can get fairly technical. (What precisely is an .htaccess file anyway?)
And whereas DIY bot administration is definitely potential, you thoughts discover that your time is healthier spent working the enterprise.
DreamCare is the “we’ll deal with it for you” button you’re searching for.
Our workforce retains your website protected with:
- 24/7 monitoring that catches suspicious exercise whilst you sleep
- Common safety evaluations to remain forward of rising threats
- Automated software program updates that patch vulnerabilities earlier than bots can exploit them
- Complete malware scanning and removing if something sneaks by means of
See, bots are right here to remain. And contemplating their rise in the previous couple of years, we might see extra bots than people within the close to future. Nobody is aware of.
However, why lose sleep over it?
Professional Companies – Web site Administration
Web site Administration Made Straightforward
Allow us to deal with the backend — we’ll handle and monitor your web site so it’s protected, safe, and at all times up.
This web page incorporates affiliate hyperlinks. This implies we might earn a fee if you buy providers by means of our hyperlink with none further price to you.
Did you take pleasure in this text?