Đang Thực Hiện

134804 Scraping project for Siv

Hi Sebastien,

Per my private message this is a private project for the scraping project. All the details are the same...

---

I am looking for someone with experience spidering directories. I want to spider Googles directory, which is based on the ODP, and extract just the sites in each category/sub category with a page rank of 6+.

[url removed, login to view]

Google displays the page rank as an image where the green bar is displayed with a variable width. We know the total pr bar size is 40px wide so any listing where the associated green bar image ([url removed, login to view]) is set to width=24 (or more) should be recorded (title, description, url, category/sub category,and page rank).

This is all easily parsed from the html on each page of the directory so should be too hard. An ODP dump is of course easily acquired, but what I am interested in is those sites with a PR of 6+, which is not information that comes with the ODP dump.

I want the output in a CSV file so I can search listings and filter by category and/or pagerank.

Kỹ năng: Bất kì công việc gì, Perl

Xem thêm: wide search, googles search, category filter html, project bar, perl csv html, project extract information, variable search, extract images url, project pos, project information directory, spider csv, image spider, search directory listing project, perl siv, google pagerank project, spider extract, file search project perl, perl directory file, extract information html file, html image extract, spider html, search image bar, search bar image, perl html csv, google googles

Về Bên Thuê:
( 1 nhận xét )

Mã Dự Án: #1880976

Đã trao cho:

cuco

Hi!!! Thanks for the invitation. I change the time so I make a discount. Please see PMB.

$90 USD trong 7 ngày
(20 Đánh Giá)
4.5