Got a warning for my blog going over 100GB in bandwidth this month… which sounded incredibly unusual. My blog is text and a couple images and I haven’t posted anything to it in ages… like how would that even be possible?
Turns out it’s possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for ‘Unknown robot’? This is actually bonkers.
Edit: As Thunraz points out below, there’s a footnote that reads “Numbers after + are successful hits on ‘robots.txt’ files” and not scientific notation.
Edit 2: After doing more digging, the culprit is a post where I shared a few wallpapers for download. The bots have been downloading these wallpapers over and over, using 100GB of bandwidth usage in the first 12 days of November. That’s when my account was suspended for exceeding bandwidth (it’s an artificial limit I put on there awhile back and forgot about…) that’s also why the ‘last visit’ for all the bots is November 12th.
Fucking hell.
Yeah and that’s why people are using cloudflare so much.
AI scrapers are the new internet DDoS.
Might want to throw something Infront of your blog to ward them off like Anubis or a Tarpit.
I run an ecommerce site and lately they’ve latched onto one very specific product with attempts to hammer its page and any of those branching from it for no readily identifiable reason, at the rate of several hundred times every second. I found out pretty quickly, because suddenly our view stats for that page in particular rocketed into the millions.
I had to insert a little script to IP ban these fuckers, which kicks in if I see a malformed user agent string or if you try to hit this page specifically more than 100 times. Through this I discovered that the requests are coming from hundreds of thousands of individual random IP addresses, many of which are located in Singapore, Brazil, and India, and mostly resolve down into those owned by local ISPs and cell phone carriers.
Of course they ignore your robots.txt as well. This smells like some kind of botnet thing to me.
- Get a blocklist
- Enable rate limits
- Get a proper robots.txt
ProfitSilence
Can you just turn the robots.txt into a click wrap agreement to charge robots high fees for access above a certain threshold?
why do a agreement when you can serve a zip bomb :D
Puts the full EU regulations in robot.txt
This is why I use CloudFlare. They block the worst and cache for me to reduce the load of the rest. It’s not 100% but it does help.
LOL Someone took exception to your use of Cloudflare. Hilarious. Anyways, yeah, what Cloudflare doesn’t get, pFsense does.
I just geo-restrict my server to my country, certain services I’ll run an ip-blacklist and only whitelist the known few networks.
Works okay I suppose, kills the need for a WAF, haven’t had any issues with it.
Unknown Robot is your biggest fan.
Hydrogen bomb vs coughing baby type shit
Downloading you wallpapers? Lol what for
Had the same thing happen on one of my servers. Got up one day a few weeks ago and the server was suspended (luckily the hosting provider unsuspended it for me quickly).
It’s mostly business sites, but we do have an old personal blog on there with a lot of travel pictures on it, and 4 or 5 AI bots were just pounding it. Went from 300GB per month average to 5TB on August, and 10/11 TB in September and October.







