Find Jobs
Hire Freelancers

Coder Needed For Complex Web Scraping Script

$100-500 USD

Đã hủy
Đã đăng vào khoảng 14 năm trước

$100-500 USD

Thanh toán khi bàn giao
This will be a multi part script that will: 1. record project name and data field 2. learn data locations via a web based inter active script 3. retrieve data automatically 4. report any errors in retrieval process The scraped data will need to be incorporated into a mysql database for data extraction by an existing website. New pages will be need for this as a secondary project. ## Deliverables I need a complex webscraper built for me. I say complex because it will be required to pull data from many web sites of different layouts. The first task for the winning bidder will be to create an input file of urls from my existing database. Each of these urls will be the home page of one of the sites we will be collecting data from. This input file creation should be very simple as my current website displays these urls on one of my pages. This file will contain 2 pieces of data: the websites unique number and the url of the websites home page. There will be 4 parts to actual scraper script. The first will part will work with a user to name a project and all of the data fields that will need to be captured. For my first project with the script that you will build there might be 8 to 12 pieces of data that will need to be collected from each site and they may recide on multiple pages. Each of these data fields will need to have a unique name given to them. So, I might call the project "toy prices" and the 8 data fields might be, "mattel-truck", "Hess-truck", "dump-truck", etc, etc. . The second part of the script will work as web based interactive program. In this part of the script each data fields location at every website in the input file (both url and exact location on the page) will be recorded by the script with the help of a user. The script will start by reading from the input file of urls one at a time and display the home page of the 1st site in a work box on the users screen. By "work box" I mean that part of the screen will be for the user to communicate to the script (like the header and left hand column) while the rest of the screen will show the actual website url data screen. The user will then go through each of the data fields needed from this site one by one and define the url and exact page location on the screen so that the script can record this information of each of the fields for later automatic retrieval in part three of the script. In order to do this the user must be able to change the url (navigate from the home page) to get to the proper url where the data resides. The user will select each of the data fields (maybe they will all show on the left hand column of the users screen) one at a time and then highlight (select) the data field on the website. From the users highlighting of the data field the script must be able to record each data fields exact position so that in the end: For every data field at every website we want to collect data from, the script will learn and create a record. The record layout will look something like this: Positons: 1-6 website unique number 7-29 data-field-1-name 30-60 data-field-1-name-description (text/decimal/size) 61-90 data-field-1-name-url 91-119 data-field-1-name-page-location (starting row/column) 121-130 current date of data collection 131-140 exact time of data capture 141-150 data-field-1-data 151-180 error-message-if-any blank if none So, if there were 1,000 websites to collect data from and 8 pieces of data to collect from each we should have 8,000 records in the project file that shows the exact location of of piece of data and the data itself along with any error message there might be if the data could not be collected. I.e. The url was no good or the data was supposed to be decimal but the script found text... etc. All of these 8,000 records will be recorded/written during the user interactive second section of the script. Also, with this file you can see how we could selectivly go out and scrape the data for just one website or go to every website and just gather data-field-2 from all of them or... etc. etc. It will be able to do this because in section three of the script, the automated retrieval of the data, it will first read an input record that will contain the information it will use to determin exactly what to do. This auto update section will need to be a cron type job. The third section auto update record will look something like this: Position: 1-9 starting website number 10-20 ending website number 30 If postion 30 has a 1 in it then get data-field-1 if it's zero do not 31 If postion 31 has a 1 in it then get data-field-2 if it's zero do not 32 If postion 32 has a 1 in it then get data-field-3 if it's zero do not 33 " 34 " 35 36 37 38 All the way through data-field-8 From this record we can see that if the starting website number is 1 and the ending number is equal to the last website and all of the datafield characters are set at 1 then the script will go and retrieve all 8 data-fields from all 1000 websites. The fourth section of the script will be the exception reporting. During the auto updating cycle any time the script encounters an error a message should be written on the record as well as to an error report. This error report will discribe the error as best as possible so that a user can use section two of the script to correct the defined position for the error that was encountered.
Mã dự án: 3053946

Về dự án

6 đề xuất
Dự án từ xa
Hoạt động 14 năm trước

Bạn muốn kiếm tiền?

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
6 freelancer chào giá trung bình $300 USD cho công việc này
Avatar người dùng
See private message.
$425 USD trong 14 ngày
4,9 (59 nhận xét)
7,0
7,0
Avatar người dùng
See private message.
$85 USD trong 14 ngày
3,0 (33 nhận xét)
4,8
4,8
Avatar người dùng
See private message.
$382,50 USD trong 14 ngày
5,0 (2 nhận xét)
1,9
1,9
Avatar người dùng
See private message.
$313,65 USD trong 14 ngày
2,8 (8 nhận xét)
1,3
1,3
Avatar người dùng
See private message.
$340 USD trong 14 ngày
0,0 (0 nhận xét)
0,0
0,0
Avatar người dùng
See private message.
$255 USD trong 14 ngày
0,0 (0 nhận xét)
0,0
0,0

Về khách hàng

Cờ của UNITED STATES
Oakland, United States
5,0
97
Phương thức thanh toán đã xác thực
Thành viên từ thg 9 21, 2006

Xác thực khách hàng

Cảm ơn bạn! Chúng tôi đã gửi email chứa đường link để bạn lấy tín dụng miễn phí.
Đã xảy ra lỗi trong khi gửi email của bạn. Hãy thử lại.
Người Dùng Đã Đăng Ký Tổng Số Việc Đã Đăng
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Đang tải xem trước
Đã cấp quyền truy cập vị trí.
Phiên đăng nhập của bạn đã hết hạn và bạn đã bị đăng xuất. Hãy đăng nhập lại.