A client would like to have an email crawler script with these features:
- Extract emails from any URL, folder or subfolder in the domain
- Crawl pages only in the URL specified, or folder within the URL, with a maximum of 5-7 hunting depth
- Multithread extration of emails, connection to URLs in multiple threads for faster speed.
- Capture email, owner of email if available and URL/ID where it was crawled.
- Delete duplicated emails automatically at the end
- Delete all emails (if we tick option) from URL where emails where extracted from.
- Authentification details. If it's a forum, he needs to enter user/password. The script should allow for this.
- Add different unlimited URLs to a queue. We should be able to add any URL or job and jobs should be started automatically when last job has ended. We should be able to add a new job when a crawling is being done in the background.
- Possibility of pausing, stopping or deleting a crawl job.
- Have a list of all queues and extractions done, with day/time started, day/time finnished, number of emails. That is a log of everything done.
- Password protected area to enter onlin php application
- Detection of bad formatted emails
- Export to .xls after crawl have been done.
- Language: php+mysql
- Script should be hosted in server where it can be used for this kind of operations, this should be provide by you somehow.
Maybe you can use soome ready made php email extractor scripts as the ground to build this one, starting with one like [url removed, login to view] or similar
Looking forward to your PM. If you have done something similar, show it to me and send me specs of what you've done and what it does.