This is a very simple project for anyone who knows what they're doing. It should not take very long for you to complete at all.
1. Read a keyword from a mysql table.
2. Submit it to Overture.
3. Record the results in a mysql table.
4. Based on a config file:
a) Perform a "deep query" based on the results it gets back.
b) Re-try if the website times out or doesnt respond (takes too long).
c) Scrape 3 fields and save them to the mysql database along with 2 other fields.
5. Identify if the script gets blocked and use a different IP/port (proxy) to connect.
6. Application must be multi-threadded. This means it must run multiple instances simultaneously.
7. Contain a config file with settings that can:
a) limit the number of concurrent threads
b) can stop, pause and resume (continue from where it stopped) checking
b) limit the number of re-tries
c) Set a deep query variable (if its 0 then no deep queries are performed)
d) Sets a pause in between queries
e) Shows a progress bar
8. Filters out any duplicate data.
9. Once finished it should copy the data to another table and change the data in a particular way. (I will tell you how later).
10. Based on this second data set, it should perform a whois lookup (using URL's found in a database table) and record the results.
* All code must be modular, fast, clear and well documented.
* The program must be fast and not take up too much memory / CPU processing power.
* The program must be written in PHP (AJAX interface preferred) and be menu driven.
* The program must be able to be run remotely on a hosted server (without my desktop turned on).
Terms and Conditions
1) If you do not have experience in building scraping/search engine scripts then DO NOT bid. If you have built one before please tell me how it was similar to this, provide examples, links etc.. The more info you tell me, the more likely I will select you as the coder!
2) You will need to agree to a non-disclosure / confidentiality agreement first.
3) You must provide me with a complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
4) Deliverables must be in ready-to-run condition. For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment.
Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
5) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased.
6) All databased need to setup and working AND the code to re-create the databases also needs to be supplied.