Personal web spider
Ngân sách $100-600 USD
Job Description:
I'd like to have a basic 'personal search spider' created. It would be used to help me manage academic research work. It should have the ability to:
1. Accept and store a history of search terms (one or a few words).
2. Be set-up to search the web for the search terms entered, by:
a. Executing searches in Google, and then looking at the first n (set by user) sites that match - and then saving those sites that match (AND contain the key words) in a table/XML results file.
b. By looking within folders (previously defined) and their sub-folders on the users local hard drive - and then saving those documents that either contain the search words in their names OR within the documents themselves (ie: TXT, DOC, RTF) in a table/XML results file.
c. Ideally (optional), the system should be able to look for changes/additions in specified blogs that are new entries AND match the search terms, adding these entries to the table/XML results file.
d. AND, Ideally (optional), the system should be able to look for updates/changes in colleague's MySpace or FaceBook pages. These changes may or not have to match the search terms. It's just nice to have all of the information I would research, or look up, updated in one place.
3. During the search process, it should flag or somehow mark sites or documents that are new or have been recently changed (not sure how to determine that). These marks should be saved with the entries in the table/XML results file.
4. The software should have a simple interface, and be able to display its status (searching, idle, or ??) when it is running.
When bidding, PLEASE let me know if you can or can't do #2.c. and #2.d, which are optional, above.
Finally, multi-threaded solutions would be nice.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
This spider would run on a user's local computer (as a stand-alone app or as a browser plug-in), and needs to be written in Java or C++ or Flash.