Đang Thực Hiện

Open Access Harvest Project (data gathering scripting)

We are looking to build an Open Access archive of freely available scholarly journals. [url removed, login to view] is a good explanation of the what the content and project field is related to.


A. Create a harvesting engine in your own choice of coding ( parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some site support

2.) If not crawling read from an input file to gleam the data, some site supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a muli-core processor should be sufficient

Note: we are looking for someone to develop and maintain this on a monthly basis.

Note: This is also known as data extraction. The data provided will be Article level data relative to each Journal. The detail data will need these output fields:

"Publisher", "Journal Title","ISSN", "Alternate ISSN", "Journal Year", "JournalVol","JournalIssue", "HTML URL", "PDF URL", "Start Page", "End Page"

We will be selecting two developers for this project.

Kĩ năng: Lập trình C, Java, Perl, Ruby on Rails, Web Scraping

Xem nhiều hơn: access data project, oai, what is parallel programming, what is data input, what is a method in programming, programming wiki, programming resources, programming org, programming in access, parallel programming in c, open text, one harvest, ftp engine, ftp dump site, find wikipedia, find sites for programming work, file processor, d&b supply, c programming wiki, c# parallel programming, what is open text, parallel programming c, internet programming project, find a programming project, parallel programming

Về Bên Thuê:
( 5 nhận xét ) Windsor, United States

ID dự án: #1608716

4 freelancer đang chào giá trung bình $600 cho công việc này


I already have this system built but I need to have the sites and output format

$500 USD trong 3 ngày
(2 Nhận xét)

I can start on this project

$400 USD trong 20 ngày
(1 Nhận xét)

Great to know about this task.I am interested and will do it for you

$1000 USD trong 23 ngày
(0 Nhận xét)

Hi, I have experience of harvesting OAI data. Please see a demo as in the PMB. Thank you very much.

$500 USD trong 30 ngày
(0 Nhận xét)