Đang Thực Hiện

134383 Scraping project


I am looking for someone with experience spidering directories. I want to spider Googles directory, which is based on the ODP, and extract just the sites in each category/sub category with a page rank of 6+.

[url removed, login to view]

Google displays the page rank as an image where the green bar is displayed with a variable width. We know the total pr bar size is 40px wide so any listing where the associated green bar image ([url removed, login to view]) is set to width=24 (or more) should be recorded (title, description, url, category/sub category,and page rank).

This is all easily parsed from the html on each page of the directory so should be too hard. An ODP dump is of course easily acquired, but what I am interested in is those sites with a PR of 6+, which is not information that comes with the ODP dump.

I want the output in a MySQL database with a simple query form so I can search listings and filter by category and/or pagerank.

- Payment my PayPal or SL escrow.

- Hit the PMB with any questions.

- Examples of similar work would be great to support your bid.

- Auto bids ignored.

Kĩ năng: Bất kì công việc gì, Lập trình C, MySQL, Perl, PHP, Python

Xem nhiều hơn: wide 6 search, scraping com, We Scraping , url scraping, mysql output html, googles search, output mysql html, category filter html, simple payment form html, project bar, image query, database scraping search, mysql simple pos, hit google, mysql database query form, php odp, php project output, page scraping mysql, project extract information, variable search, scraping experience, search url scraping google, extract images url, simple payment form php html, pos php based

Về Bên Thuê:
( 143 nhận xét ) Auckland, New Zealand

ID dự án: #1880555