I need a system for research and translation of documents.
Translation involves translating a source text using a memory.
I am looking for a person with knowledge of xml and logical organization of data when bookmarking sites and the places of reference.
Search (how engines treat spaces, autocorrect or ignore capitalization) and regular expression knowledge required.
The general process will be
1. Analysis of text (simple types such as English, plus others including CJK) using segmentation rules and database/memory. This produces an extracted term list.
--- Analyze text for individual terms (these can be multiple words and multiple character types (e.g. Japanese with symbols, English numericals, letters) - corresponding example for English may be "4-stroke engine") ---
2. After initial resolution of terms by the system, some are corrected manually and the database/memory is updated.
--- Searching of new terms. ---
--- Take a list of the new terms to the browser. Click on them one-by-one, i.e. parse them into different site searches. Once the translation for the term has been found, it is stored with the source data in a database.
4. Finally, text is batch translated.
If you have any experience in this type of project, please give your recommendations on the best scripts/technologies, best browser and whether you can make it on a server/desktop, with your quote. I am open to suggestions and the most important thing is that you understand the current FOSS software that is available to do this type of work.