We are looking for an email crawler to use to collect emails from web pages/url´s. This email crawler must have the following functionalities:
1) We must be able to submit one of more webpage’s that it should collect emails from.
2) The program must use all possible threads on the system, and it must not lock the main application thread so we can not add more URL etc. to collect emails from.
3) The crawler must collect all emails from
the specified url/domain and ONLY emails from the same url/domain.
This mean that if it find two links on the first page in the same domain it must collect these two pages to – if it find new links in the same domain on those two pages it must crawl these pages as well, etc. etc. (so it must crawl all pages with in the same url/domain).
If it find link to other url´s/domains then it should put these link in a other list in the program. From this “link-list” it must be possible to “transfer” links from this list to the list of links to collect emails from (it must be possible to do this while to program is crawling).
4) When the application collects emails on the webpage’s that it crawls it must put these ALL emails in the same list in the program.
5) All lists etc. must be shown in one screen.
6) It must be possible to save each single list to seperate text file.
For more information about the specifications/functionalities please send an email to : four (X) [url removed, login to view] and write "CRAWLER" in the mail SUBJECT.
We will then sent you an existing crawler developed in Java/J2EE and we want the new crawler to function just like this one. So please refer to this application if there is some misunderstandings, otherwise don’t hesitate to contact me for more information on email four (X) [url removed, login to view]