DDOSed by alibaba cloud

neptronix

Administrator
Staff member
Joined
Jun 15, 2010
Messages
20,375
Location
Utah, USA
Just FYI.. our site hit some serious instability this week multiple times.. i traced it to alibaba cloud web scrapers doing essentially a DDOS by ignoring robots.txt and scraping at such a fast rate with some dozens of IPs. ( this is clearly designed to get around common server protection mechanisms )
This time, ali eluded my previous ban by pretending to be a regular desktop computer instead of identifying themselves as a bot.

It blew up our CPU hard for the last hour, to the point where it took the site down... and has been hammering us for a week apparently

1740017449456.png

So here's a little message from the new defenses i installed to whoever is running this demon spawn of a web crawler:

1740023842792.png

 
Last edited:
But now where will they get their pictures of the bikes, batteries, and other products that they don't actually have but are advertising for sale? ;)
 
It's probably just scraping text for training AI models.
The joke is on them. Often, my writings get mistaken for AI-generated text. This has been the case on other places where I have posted, where subsequently I have been accused of being a "shitposting bot". It is amusing.
 
It's probably just scraping text for training AI models.

Bingo. AFAIK they don't have some search engine.. but they produce a number of AI based tools, like..
Alibaba Unveils AI-Powered Search Engine for Global B2B Sourcing | PYMNTS.com.

They also have an open source LLM that i happen to be a big fan of:
GitHub - QwenLM/Qwen2.5: Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Too bad their scraper is so rude, obfuscates itself, and ignores rules, otherwise they'd be sitting on >100gb of text and images.
 
Wow. I was wondering why the site to way long to load, sometimes timing out. I thought it was my ISP. Glad you found the issue.

This is because whatever they are doing exhausts the maximum amount of TCP/IP connections that Linux can handle.. they don't bother hanging up the line after making the phone call basically.
 
I automatically assume nefarious activity from any product or service from in Asia...Cellphones included. Not that they're the only bad actors, but it's rampant. I had a friend who worked for Intel tell me that he traveled to China with his cellphone. I asked him if he considered carrying a burner since they now probably own everything that's going in and out...
 
Too bad their scraper is so rude
Why not just threshold requests by IP? Sliding window of say 5 minutes and limit to 10 pages. That seems pretty reasonable for a human. Then captcha them in case they go over. Or adjustable threshold based on age of page. I'd imagine new pages get more access than older. High volume on older content means scraping.
 
Why not just threshold requests by IP? Sliding window of say 5 minutes and limit to 10 pages. That seems pretty reasonable for a human. Then captcha them in case they go over. Or adjustable threshold based on age of page. I'd imagine new pages get more access than older. High volume on older content means scraping.

I used to do that until google decided to also not follow robots.txt. This resulted in their bot getting banned ( they don't follow robots.txt or give the user manual control over scraping speed anymore ), so i had to tune it back.

At least google identifies itself as a bot, so in theory you could whitelist their user-agent.. but... that could be abused..
 
I automatically assume nefarious activity from any product or service from in Asia...Cellphones included. Not that they're the only bad actors, but it's rampant. I had a friend who worked for Intel tell me that he traveled to China with his cellphone. I asked him if he considered carrying a burner since they now probably own everything that's going in and out...

I manage a few dozen servers these days and see more tomfoolery from China than Russia lately and i totally feel that.

All extremely rude in their software behavior, to. moreso than other countries.

China's govt spies quite aggressively too, high chance the line is tapped, i refuse to use online services from there for any serious purpose. I will use their open source ( LLM models like Qwen2.5, deepseek ) but i will make the extra effort to firewall the crap out of it just in case.
 
I was top boy scout hometown trustworthy loyal helpful friendly courteous kind obedient cheerful thrifty brave clean reverent and at 14 it was habit I knew every one and when they started lying and what effect it had on them and victims 1. They delete the truth in their brains so it doesn't cause confusion during delivery that adds credibility. To test I told my sister she did something she didn't and she apologized. 2. They are bad at checking traffic and I've had sister and acquaintance admit 4 accidents some years. They dispise truth as it must be deleted quicker the better. So at times habit gets better of it and bam. 3. They lie when telling truth is known to be better way to go.4. they project their constant lieing onto everyone and everything to keep focus of them and position themselves as high brow lier's in comparison. 5. I postulated 10 years ago to Philadelphia professor that 7 out of ten will lie to you. Without hesitation said 9 out of ten.Exactly the ratio in our family. If I'm breathing I'm fact checking mostly. Done.
 
?
 
I was top boy scout hometown trustworthy loyal helpful friendly courteous kind obedient cheerful thrifty brave clean reverent and at 14 it was habit I knew every one and when they started lying and what effect it had on them and victims 1. They delete the truth in their brains so it doesn't cause confusion during delivery that adds credibility. To test I told my sister she did something she didn't and she apologized. 2. They are bad at checking traffic and I've had sister and acquaintance admit 4 accidents some years. They dispise truth as it must be deleted quicker the better. So at times habit gets better of it and bam. 3. They lie when telling truth is known to be better way to go.4. they project their constant lieing onto everyone and everything to keep focus of them and position themselves as high brow lier's in comparison. 5. I postulated 10 years ago to Philadelphia professor that 7 out of ten will lie to you. Without hesitation said 9 out of ten.Exactly the ratio in our family. If I'm breathing I'm fact checking mostly. Done.
Drugs or issue with brain. Or could be issue with me because I do not understand.

What do you call a deer with no eyes? ......
I have noIdeer.
 
Comments are in a different DB/system!
Thanks for letting me know, i don't have a notifications system on the knowledgebase software for comments yet. Seems like we need to add that feature already.
I also found out i can't delete those comments, so i submitted a bug report to our developer for it.
 
There’s some similarly odd comments in the KB Controller section that provide no value. I guess comments there don’t show up among a member’s posting history (or in searches) so I’m assuming that the KB a separate system/DB ?
I'm hesitant to admit that if I focus really hard on not focusing with my eyes as I read, and use a religious/poetic level of personal interpretation, the paragraph appears to be useful as a warning for naive men who've never met those heartless pretty women -- the ones who are congenitally incapable of seeing their mate as anything but a beast of burden.

Oh wait, when my eyes focus again, it's clearly insanity. Never mind.
 
what in tarnation doge.png
 

"Each one of us is alone in the world. He is shut in a tower of brass, and can communicate with his fellows only by signs, and the signs have no common value, so that their sense is vague and uncertain. We seek pitifully to convey to others the treasures of our heart, but they have not the power to accept them, and so we go lonely, side by side but not together, unable to know our fellows and unknown by them." - W. Somerset Maugham

Apparently my joke was really, really not part of the treasures my heart carries with me. Thanks for the pointer :)
 
It's probably just scraping text for training AI models.
Oh yeah, technical forums are absolute goldmines for datasets, particularly if you format them really well.

The Qwen team should honestly calm down with the intense polling rate of their scrapers.
 
Back
Top