Write a Simple Web Crawler Manager, using Windows Forms / WPF interface.
- Should be possible to create a job, by specifying: the site name, starter urls for crawl (simple text area, one initial url per line), url patterns to download (one per line. if not empty, use CONTAINS to see if page should be download to disk), crawl deep, folder to download the html, and retry count.
- Should be possible to pause/resume any job. Pause should also occur if, internet connection breaks down, or the program is closed. When job is finished, should be possible to run it again. Should be possible to delete a job too.
- Should be possible to specify a proxy list. Turn on/off. If on, every time you make a request you get a different proxy from the list.
- Must be fast.
- Must be Multithread.
- Must be written in C#.
- Could use any open source code, as frameworks like NCrawler or Abot to complete your job.
- Downloaded html files can have a pattern file name, like '[site name]-[date].htm'. You also can suggest me something.
- Application must have a job list, showing what is running, minor status.
- Application must have a simple textarea to show any relevant log/console message. Starting.../Downloading...Stopping.
Our system has .NET 4 and Microsoft SQLExpress.
We need a working sample with clean code including all source files in C#, that is able to index <[url removed, login to view]> with a link-depth of 3 and that can paused and resumed, when that pause and resume on disconnect internet connection and reconnect, or close program and open it again and resume the job.
Được trao cho:
17 freelancer đang chào giá trung bình $438 cho công việc này
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi