Đang Thực Hiện

Open Access Harvest Project (data gathering scripting)

We are looking to build an Open Access archive of freely available scholarly journals. [url removed, login to view] is a good explanation of the what the content and project field is related to.


A. Create a harvesting engine in your own choice of coding ( parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some site support

2.) If not crawling read from an input file to gleam the data, some site supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a muli-core processor should be sufficient

Kỹ năng: Lập trình C, Java, Perl, Ruby on Rails, Web Scraping

Xem thêm: access data project, oai, programming resources, programming org, one harvest, ftp engine, ftp dump site, find wikipedia, find sites programming work, file processor, programming wiki, open text, internet programming project, find programming project, parallel programming, help data gathering, harvest, find new programming, data harvest, data gathering, data en, crawl data, correctness, build accurate, access d

Về Bên Thuê:
( 5 nhận xét ) Windsor, United States

Mã Dự Án: #1608716

4 freelancer đang chào giá trung bình $600 cho công việc này


I already have this system built but I need to have the sites and output format

$500 USD trong 3 ngày
(2 Đánh Giá)

I can start on this project

$400 USD trong 20 ngày
(1 Đánh Giá)

Great to know about this task.I am interested and will do it for you

$1000 USD trong 23 ngày
(0 Đánh Giá)

Hi, I have experience of harvesting OAI data. Please see a demo as in the PMB. Thank you very much.

$500 USD trong 30 ngày
(0 Đánh Giá)