PHP, Linux, MySQL, Ruby, web data extraction, screen scraper, linkdirectory building, spider
if You have good knowledge in regex, cUrl, snoopy-class or similar: larbin, htdig, etc. and if You already spidered/parsed websites (please no newbies) Your bids are welcome.
We have to spider data from approx. 12 domains which have a linkdirectory/webcatalog containing categories
(e.g. cardealers, realtors, real-estate ads and others) which are interesting us (example www?catall?de).
Please bid *per each domain* = each webcatalog after reading the prog.specifications. (Skilled people need 3-6 hours
for each domain.)
Extracted data is to be inserted in two given mysql-tables. possibly we can accept data in ms-excel-format.
Please post which work You have already done in this field (screen scraping) and which tools You already
used for web data extraction (watir, firewatir, WWW::Mechanize, Rubyfulsoup, Hpricot, ..)
(we also can offer moneybookers)
php, debian-linux, cUrl or snoopy-class, good regex-skills needed, mysql
This bid request includes IMPORTANT additional attached files. Please download and read fully before bidding. There
You will find the two given mysql-tables which are to be filled with the extracted data.
0) first You mail us a mysql-dump of the two needed tables per scraped domain so we can control, if structure, field-content, character-set, etc. is o.k. (all extracted records, all fields)
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment
-- Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive
and complete copyrights to all work purchased.