I'm reposting my bid as the max price was incorrectly stated on the previous post. Unfortunately SL doesn't let me edit this after the job is posted.
This is a very simple project for anyone who knows PHP and Linux. Here's what you need to do:
1. Read a keyword from mysql table(s)
2. Submit it to Overture.
3. Record the results in a mysql table.
4. Based on a config file:
a) Perform a "deep query" based on the results it gets back.
b) Re-try if the website times out or doesn't respond (takes too long).
c) Scrape 3 fields and save them to the mysql database along with 2 other fields.
5. Identify if the script gets blocked and use a different IP/port to connect.
6. Application must be multi-threadded. This means it must run multiple instances simultaneously. This program should be run by cron and thus have multiple versions running at the same time. You need to factor this in, when it comes to checking Overture, as you don't want to check the same data twice!
7. Contain a config file with settings that can:
a) limit the number of concurrent threads
b) can stop, pause and resume (continue from where it stopped) checking
b) limit the number of re-tries
c) Set a deep query variable (if its 0 then no deep queries are performed)
d) Sets a pause in between queries
e) Shows a progress bar
8. Filters out any duplicate data.
9. Once finished it should copy the data to another table and change the data in a particular way. (I will tell you how later).
10. Based on this second data set, it should perform a different lookup (using information found in a database table) and record the results.
* All code must be modular, fast, clear and well documented.
* The program must be fast and not take up too much memory / CPU processing power.
* The program must be written in PHP (AJAX interface preferred) and be menu driven.
* The program must be able to be run remotely on a hosted server (without my desktop turned on).
Terms and Conditions
1) If you do not have experience in building scraping/search engine scripts then DO NOT bid. If you have built one before please tell me how it was similar to this, provide examples, links etc.. The more info you tell me, the more likely I will select you as the coder!
2) You will need to agree to a non-disclosure / confidentiality agreement first.
3) You must provide me with a complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
4) Deliverables must be in ready-to-run condition. For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment.
Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
5) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased.
6) All databased need to setup and working AND the code to re-create the databases also needs to be supplied.
*** NOTES ***
I have invited a number of people to bid on this project. If you have created a spider, crawler or scraper before then please mention this in your bid.
You can program the "back end" in perl, C or another language. However this MUST be able to be run on a hosted linux box. The management front end must be accessable via the web and therefore must be in PHP.