L.webis (Lumbricus webis) is a web crawler developed at the
Institute of Informatics and Telematics (IIT), of the National Research Council (CNR) of Italy, in Pisa.
Its purpose is to support the compilation of statistics about the .it ccTLD and portions of the WWW reachable form the .it ccTLD.
Moreover it supports advanced analytic tools such as ComWatch.
The crawler follows the Robot Exclusion Standard: if you need to block
or limit the automatic access to your site, you have to follow the
instructions in (
L.webis considers all the "Disallow" directives following a default block:
or a specific block:
L.webis (Version 0.22 or newer) is collecting data of the .it ccTLD
(domains and sub-domains not in the .it ccTLD can be crawled if directly linked by a
document in the .it ccTLD).
Current politics is to download a limited number of documents for every
sub-domain in each crawl iteration, waiting few seconds between
The time interval between successive crawl iterations may be of several
days and L.webis could make use of version of the robots.txt cached in a previous iteration.
The delay between successive downloads on the same sub-domain cannot be
specified by the non standard directive "Crawl-delay" in robots.txt.
For complaints, comments or observations, please send email to:
Styled and mantained by: