I would like to have a spider that follows certain websites for example all *.COM websites (no .NET, .ORG, etc.).
My intention is to add some information into a database such as title, keywords, description (all taken from the metatags).
The spider should start from a few predefined URLs and then just follow links. But only in the specified TLD.
It should not use too many server resources. There is no rush. It should just operate at a speed where it does not eat too much of the server's RAM.