Đang Thực Hiện

PHP website crawler

Hi,

I need a programmer to set up a crawler for 9 daily deal sites and crawl 7 values on each site.

You must use this class and the functions inside whenever it is possible to crawl the websites [url removed, login to view] Each site-crawler should be placed in an unique file and just include the parser class, like this:

<?php

include_once('../[url removed, login to view]');

// parser stuff

?>

It means the folder/file structure will be like this:

- [url removed, login to view]

- [url removed, login to view]

- Image (folder)

- Crawlers (folder)

--- [url removed, login to view]

--- [url removed, login to view]

--- web....

The script will crawl the site when you visit [url removed, login to view] and the crawled content should go in a database like this:

CREATE TABLE `deal_crawler` (

`deal_id` int(11) NOT NULL AUTO_INCREMENT,

`site` varchar(64) NOT NULL,

`text` text NOT NULL,

`dealprice` int(11) NOT NULL,

`dealvalue` int(11) NOT NULL,

`endtime` datetime NOT NULL,

`picture` varchar(64) NOT NULL,

`url` varchar(256) NOT NULL,

`lastsucces` datetime NOT NULL,

PRIMARY KEY (`deal_id`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=2 ;

INSERT INTO `deal_crawler` VALUES(1, '[url removed, login to view]', 'This is todays deal...', 100, 200, '2011-05-10 23:31:47', '[url removed, login to view]', '[url removed, login to view]', '2011-05-02 23:31:47');

Where the fields contains:

deal_id

– unique id

site

- static name, will just be defined in a variable on top of the crawler php file

text

– text from the site

price

– price from the site

value

– full price of the deal

endtime

– the endtime (sometimes calculated from “times back”, sometime can the actual end time be grabbed in the code. Please make sure the time is correct Danish time[[url removed, login to view]] )

picture

– picture will be stored in the image folder with the deal_id as name, this field will then contain [url removed, login to view], [url removed, login to view] or something similar.

url

– the deals url.

lastsucces

- If NOT something goes wrong when crawling (PHP error or if a field isn’t properly filed) this field should be updated each time the crawler finds that this deal is still active on the front site.

If a new deal is active, a new row should be inserted in the database.

I will highly appreciate if you build functions/classes when the same code are used more than one time, and place the new functions/classes in the [url removed, login to view] file. For exsamle the insert statement in the MySQL database can be a function, the transfer of picture e.g.

Furthermore the code have to be well commented.

You will have to make the crawler work 3 days in a row before your work is done. Just to make sure it is not only working with todays deal. If the site changes sometime in the future, I will of cause pay you (or someone else) again.

A quick deadline and low price is essential.

----> Please see the attached file to view a detailed description of the data need to be crawled. <--- (The PDF and the Doc file have the same content)

Best regards

Peter

Kỹ năng: MySQL, PHP

Xem thêm: pdf crawler php, crawling php pdf, web crawler php pdf, php script web crawler, crawler code php, php folder crawler, working web crawler, website price value, value website, top programmer website, statement work doc, price programmer website, php website programmer, php script null, php create table, php mysql pdf, need data structure programmer, key data structure, create table php, best websites programmer, best data structure, best back link build, php visit site crawl, daily deal sites crawler, pdf crawler mysql

Về Bên Thuê:
( 38 nhận xét ) København Ø, Denmark

Mã Dự Án: #1045703