134839 Content scraper bot

Đang Thực Hiện Đã đăng vào Apr 25, 2007 Thanh toán khi bàn giao
Đang Thực Hiện Thanh toán khi bàn giao

We would like you to develop scripts to two tab delimited text files created, where each row represents a record. Records being:

* file 1: quotes

* file 2: people who said the quotes.

Build these tables by extracting content from these sites. They're all large quotation sites:

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

Where available get these fields:

For quotes (note most of these fields won't be available on most sites. So it's not as much work as it looks.)

* quote (NOT NULL)

* author of the quote

* when it was said

* where it was said (eg Lincoln Memorial, The Moon)

* type of source where it was said (eg Movies, Literature)

* from what book it was said

* source site (don't worry about merging data when a quote is featured on more than one site. keep them as separate rows.)

* primary source site (where the site you're scraping is citing another place, perhaps even another site). This can be in HTML format if they have a link.

* origin (eg on [url removed, login to view])

* class of quote (proverb, joke, etc)

* Topic (eg computers, fear, faith)

* rating (eg on [url removed, login to view])

* number of times favourited (eg on [url removed, login to view])

* number of votes (eg on [url removed, login to view])

* URL where quote is available on the site

For people (again most of these fields will not be available for most sites, so simpler than it looks).

* person (NOT NULL) (please use exact same format as "author of the quote" above so that we can map the two tables)

* their type (celebrity, politician, etc)

* occupation(s)

* birth date

* date of death

* source site (don't worry about merging data when a person is featured on more than one site. keep them as separate rows.)

* blurb about them

* URL where description is available on the site

I'm not fussed about what technology you use or elegance of the code, provided you submit the code and you document how to install and run it, and it's runnable on a Red Hat machine without much effort. You may use your own modules, provided you submit them and allow us to use them.

Please describe

* How much you would charge to do this project

* The latest date it would be delivered.

* What experience you have writing bots.

If all goes well, there are projects you could work on with us.

Odd Jobs Perl PHP Python

ID dự án: #1881011

Về dự án

1 đề xuất Dự án từ xa Jul 11, 2012 đang mở

Được trao cho:

xhunter12sl

Check PMB

$60 USD trong 4 ngày
(0 Đánh Giá)
0.0