I would like a script written to crawl/spider/seek full content(aka full text) RSS feeds. The main goal of the project is to find full content feeds and continually be able to obtain fresh full content feeds. Some sources to find these full content feeds would be Search Engine Results, Blog Communities, Blog Directories, and Blog XMLRPC update services. I would like the script and features to be easy to configure as well.
Here are some of the required features.
- Executed by command line
- Threaded or Forked processes
- Multiple Network Interface Support
- Output rss feed links
Here are some of the optional features.
- Categorization of Feeds ( For instance if found by Search Engine Query or category based blog directory )
- MySQL Support
- Web Administration Interface
- HTTP Proxy Support
- Input Source Management
I am open to any ideas for this project and entertain both simple and complex project proposals. Please let me know what your proposal is along with your bid.