I'm looking to create a web spider/crawler that will crawl and index any websites I specify in order to track changes. Specifically my goal is to track target websites to the point where I will know if a page has been changed or if a new page has been added.
While I'm completely open to suggestions I was thinking the best way to do it would be to have the spider visit the target site. When the spider crawls it will:
1. Mark any new URL's it finds
2. Mark any variations to pages previously found (in previous the previous crawl). To do this the spider looks at changes in the pages file size to show a change on that page.
Then there would be a way for me to generate a exportable (CSV) report of new pages and altered pages on that site.
Also I'm aware of the list of open source web crawlers as in [url removed, login to view], you can use that too if you're able to modify it to meet my needs & requirements.
Also I'm completely open to any type of setup. Ideally this would be completely web based but I'm open to a desktop setup if necessary.
Được trao cho:
21 freelancer đang chào giá trung bình $422 cho công việc này
I can create this custom web spider to scrape the website and to maintain logs of changed and added links. I can complete this work in 4 days. Thanks, Suresh
Hi sir. I'm an experienced dynamic websites coder. I'm able to write the crawler you want, based on PHP-Crawler. So it would be done with PHP and MySQL. Best regards.