Friday, May 9, 2008

Cuill Joins The Hog Spotting Limelight

Cuill will steal all your bandwidth its regular robot attacks!

Cuill, pronounced “cool” came to our attention in the last month as it pounded our sites with a single thrust of robots. Grabbing 350 Meg in a few minutes it brought down a site owning a mere 127 pages using a circular and redundant search process. It came on one Drupal site with the archive module active and brought up a page for every day this century and last no matter content or not. If a date is on a calendar it is searched. How many centuries will it go before it turns off?

If it were not for Hog Spotter I would not have figured it out even though the IP numbers were among the highest users Cuill uses a series of IP numbers so that you don’t see one big block coming at you.

The Cuill site is rather sparse with a bragging piece on their 25 million dollar venture capital infusion and a vague reference to important people hired from the real search engines. No names of course but a real home feel to it. Their site claims they are pioneering a new approach to search and that may well be but Hog Spotter wants to put them by the wayside for their rather rude spider.

Cuill sends out their robot spiders all at once and dig through all your links multiple times and it seems from all directions. Similar to the notorious Munax monsters. However Cuill seems to have some kind of spider that rams through your site at high speed. This may be good or this may be bad depending on where you stand.

This is a good thing as the spiders only ram your site for a couple minutes or so. It is a bad thing because even small sites of 120 pages with graphics taking up 1 meg disk space can find the Cuill search engine glomming a huge half gig in bandwidth to get it all before it leaves. In the case of one of our sites the Cuill spider visit meant we were over using system resources and a cutoff in service occurred. Cuill has no form on their site for web owners to recoup the money for the bandwidth or any damage they cause for their abusive programs. So my sparse users and legitimate search engine traffic was cut off from the site for four hours as we restored it.

I said it was a good thing the Cuill spiders only ram your site for a couple minutes? Well that good time wears out fast as the spider returns several times each month. If you can afford to make such gregarious and generous donations to a Silicon Valley startup, as they rip off every site they can find, you can enjoy the hits, as they too will come on as users as well as robots. Fake hits of course, as no person is there, just their spider. When the content is gathered they wont bother you anymore. No, just show your content with some great advertising to keep them in their own loop.

This is just another I wanna catch up with Google, Yahoo and Microsoft at the expense of website owners.

I am not sure where it was written that if you want to get into the search engine game you could ignore rules and propriety and just skuzzy your content in whatever way you can without regard to anyone or thing. But the day of the stealth theft of your content and bandwidth has ended with the rise of the Hog Spotter!

We have not only put a disallow in the robots.txt but have put a deny in the .htaccess to make sure they do not return disguised as browsers

For robots.txt the format to deny them is to place this at the top.
User-agent: twiceler
Disallow:/

Never trust a Hog. To insure their exclusion use .htaccess file. Their claimed IP addresses and code to deny in .htaccess:
#Cuill
deny from 208.36.144.10
deny from 208.36.144.6
deny from 208.36.144.7
deny from 208.36.144.8
deny from 208.36.144.9
deny from 38.99.13.121
deny from 38.99.13.122
deny from 38.99.13.123
deny from 38.99.13.124
deny from 38.99.13.125
deny from 38.99.13.126
deny from 38.99.44.10
deny from 38.99.44.101
deny from 38.99.44.102
deny from 38.99.44.103
deny from 38.99.44.104
deny from 64.1.215.162
deny from 64.1.215.163
deny from 64.1.215.164
deny from 64.1.215.165
deny from 64.1.215.166

For complete protection we suggest blocking their servers at the .htaccess level. As usual check our deny.txt at Tek Talk for regular updates and find out what we use in our websites. Please let us know your experiences and we can add new IPs and search hogs to the lists.

Randy Penn

Tuesday, May 6, 2008

Beware the Eyes of MUNAX LLC

Look at your sites log files and you will surely find a lot of hits from a strange user. Check your logs for a user, that is right, a user with the IP numbers from 82.99.30.2 - 82.99.30.73 and you will find this user probably took several thousand gig of bandwidth trying to suck your site in. They disguise their robots as a web browser from a normal human being to get past the robots.txt and any exclusions you may put in there.

This is the evil MUNAX search engine. Lowlifes slip onto your site in disguise of a browser but hit your site with spiders. Hundreds at a time. You think your site is very popular but the page views do not grow. Why? Cause it is a robot trying to avoid your robot.txt exclusions.

A lot of sites are talking about it now and Hog Spotting is now dedicated to finding and reporting the bandwidth hogs - specially the dirty bastards like Munax!

Read about it at Tek Talk the best news in gadgets, widgets and tek news and you can download the deny.txt from there. The deny.txt is our complete list of spammers, jammers and places where spam comes from. It has every network known so far in Russia, China, South Korea, and India. Unless you have some reason to have legitimate traffic from these networks you will find them scraping, scaming and slaming your site frequently without being denied entrance. Updated all the time so check back often as you find your bandwidth eaten.

With your help we may be able to keep them at bay so give us a tip if you see something while out Hog Spotting!