L.webis (Lumbricus webis) is a web crawler developed at the Institute of Informatics and Telematics (IIT), of the National Research Council (CNR) of Italy, in Pisa. Its purpose is to support the compilation of statistics about the .it ccTLD and portions of the WWW reachable form the .it ccTLD. Moreover it supports advanced analytic tools such as ComWatch.
The crawler follows the Robot Exclusion Standard: if you need to block or limit the automatic access to your site, you have to follow the instructions in ( http://en.wikipedia.org/wiki/Robots.txt) and (http://www.robotstxt.org ).
L.webis considers all the "Disallow" directives following a default block:
User-agent: *
or a specific block:
User-agent: L.webis

L.webis (Version 0.22 or newer) is collecting data of the .it ccTLD (domains and sub-domains not in the .it ccTLD can be crawled if directly linked by a document in the .it ccTLD). Current politics is to download a limited number of documents for every sub-domain in each crawl iteration, waiting few seconds between successive downloads. The time interval between successive crawl iterations may be of several days and L.webis could make use of version of the robots.txt cached in a previous iteration. The delay between successive downloads on the same sub-domain cannot be specified by the non standard directive "Crawl-delay" in robots.txt.
For complaints, comments or observations, please send email to: WebAlgo.
