Đang Thực Hiện

126907 Web spider/crawler

I need a spider that starts crawling from the URL I enter and then starts crawling all the websites it founds.

It must have the following features:

-tld filter: it must visit only the domain names I specify (e.g. visit only .com or visit only .net)

-it must not visit subdomains

-must save only the domain names it finds and visits. I don't need to index web content: I just need domain names.

-I can set a limit to pages visited for each website

-Must be fast and index thousand of domain names per day - it must run 24/24/7. Must run forever and automatically

-Must save results into external MySQL db, so that the spider can run on several computers at once

-I must have the possibility to use some exclusion filters (e.g. don't follow [url removed, login to view],[url removed, login to view] etc)

-I don't need graphic interface, it can run even from command line

Please contact me for more info: eraser [ a t ] [url removed, login to view]

Kĩ năng: .NET, Bất kì công việc gì, Lập trình C, MySQL

Xem nhiều hơn: www web programming com, web crawler features, programming c++ web, programming computers, net programming websites, filters fast, features of a web crawler, c programming web crawler, command line programming, website crawler, spider, domain computers, fast crawler, xxx fast, command web, crawler programming, crawler interface, command programming, xxx www com, need web xxx, web filter content, url spider, web crawler mysql index, crawler website, web graphic interface

Về Bên Thuê:
( 0 nhận xét )

ID dự án: #1873075