I need a script that can be run from a server that can extract emails from site and forums my client needs. I've posted a similar project before but nobody answered really. This shouldn't be a big issue for people that have data scraping/extractors, since the script does just that, download pages, grab emails from them, and save them in an excel file. Not a big issue.
- Extract emails from URL, folder or subfolder in the domain, like [url removed, login to view] or only from [url removed, login to view] and on, and not from the root one
- Crawl pages only in the URL specified, or folder within the URL [url removed, login to view], with a maximum of 5-7 hunting depth
- Multithread extration of emails, connection to URLs in multiple threads for faster speed.
- Capture email, owner of email if available and URL/ID where it was crawled.
- Delete duplicated emails automatically at the end of job
- Delete all emails (if we tick option) from URL where emails where extracted from.
- Authentification details. If it's a forum, he needs to enter user/password. The script should allow for entering user/password and get identified.
- Add different unlimited URLs to a queue. We should be able to add any URL or jobs, so that they are started automatically when last job has ended. We should be able to add a new job when a crawling is being done in the background.
- Possibility of pausing, stopping or deleting a crawl job.
- Have a list of all queues and extractions done, with day/time started, day/time finnished, number of emails extracted after duplication has been applied. That is a log of everything done.
- Password protected area to enter online application
- Export to .xls after crawl have been done.
- Language: php+mysql, or asp, or whatever language you can do this
Maybe you can use soome ready made php email extractor scripts as the ground to build this one, starting with one like [url removed, login to view] or similar
Looking forward to your PM and a reasonable bid, I've been into outsourcing for quite some time now, so I know the business. I've had made quite a few extractions, and I guess those scripts could be adapted for this matter, so if you have this I'm sure you can have this done as well. If you have done something similar or if you can do this easily, send me specs of what you've done and what it does, and how you'd do this for me.