We are interestede in having a script (Cold Fusion preferred, PHP as an alternative) running on a IIS Web server able to:
1. index and re-index regularly (because of new additions) a relatively large amount of PDF files (PDF/a format, OCR processed)
2. the PDF files are placed (by FTP, no need for the script to do this) in a series of folders and sub-folders on the same server. The script should have an administration part where you select all the folder/subfolders where the PDF files are placed
3. the search algorythm must be very efficient and possibly allow partial match searches (ex: use of wild cards such as "*"). Simple or inefficient search results won't be considered valid
4. the front-end should have a seach box and display of results. results should be in Google-style and the publishing layout should be attractive. Layout should be easily modified, if necessary.
5. the number of results per page (10 by default) should be easily changed, possibly on the front end by the user. In case of many results, links to more pages should be managed.
6. results should be sorted by relevance
7. the searched keyword(s) inside the listed results should be highlighted.
8. in the results list, clicking the linked name of the file should allow the display of the related PDF file (by Acrobat reader, for example). Inside the PDF document the searched keyword(s) should be highlighted, if possible.
9. consider also the presence of a separate Access or Excel file including few fields, like:
* file name (the PDF document placed in the folders)
* source name
* (possibly a couple of other fields)
When the script finds the PDF document related to a keyword(s) search, the script should also query that separate Access/Excel file (uploaded on the server by FTP, done separately), read the additional data related to the found document(s) and display them accordingly in the result page.
For example, if the result finds the PDF file [url removed, login to view] , the script should have a look at that Access or Excel file and see which other data fields are associated to [url removed, login to view]
In such a case, the published result could look something like this:
[Date] [Source Name]
Summary of text find in the PDF, around the found keyword(s)
URL link (to display the PDF document in a new browser window)
The presence of the source and date fields should not be compulsory, since many documents won't have it.
10. the text labels of the front end must be in English, but easily translatable
The script should be of easy maintanance and open to future upgrades.
We also accept customization of existing scripts, if matching the requirements reported above.
ATTENTION! Please use the Message Boards (besides placing a bid) to let us know where you are from and, above all, references of past works like this or similar enough. Links to working applications will be much appreciated.
By the way, this simple project is a sort of test bed for us. We are looking for a skilled and reliable Cold Fusion developer (with excellent communication skills and confidentiality) to partner for our Cold Fusion projects.