Microsoft SQL 2008 has FREETEXT and FREETEXTABLE commands that apply thesaurus and stemming to input parameters. The problem is that the thesaurus is bi-directional. While it is OK to have a search on 'moan' to return results for 'bitch' I don't want to have a search on 'dog' give results for 'moan'. Ordering by relevancy (rank) also does not work well with MSSQL.
I have a thesaurus in MS SQL format. The thesaurus also contains dictionary meanings etc. which will be discarded.
I have a query which returns related words (Lemma) from this thesaurus for a given input word. This could be used as the basis for (1.) below.
Stemmers return stemmed input words. e.g 'bask' for inputs of 'basking','basked' or 'basker' for example.
I have a stemmer algorithm in MS SQL function format. (or VB, C# functions) which returns stemmed forms of input words. This can be used for the requirements below.
The text to be searched (Keywords) is a single column containing comma delimited lower case words and phrases. Max. 250 characters and 10,000 rows.
I need as queries, functions or procedures for the following, including source code where applicable:
1. All words and related words extracted from the thesaurus to relational tables 'MyThesaurus' (base,Lemma) stemmed words (base stem,Lemma stem) attached to both of those and indexed.
2. Column(s) added to keyword table with stems of all keywords and phrases and indexed. Probably a seperate related table would be best for this. You can assume lower case,trimmed,comma delimited keywords and phrases. You can assume English language only. Phrases are short with 3 or 4 words only. There is no need to consider stop words.
3. Query to enter search words or short phrases which will be stemmed, then Lemma word or phrases stems looked up from the MyThesaurus and return rows containing those terms from the keywords table. If the input is a phrase (more than one word) then the query should search for the phrase and each separate word within it. The output should be ordered by a simple relevancy. i.e perfect match to original input phrases first followed by those from input separate words then those from Lemma words. Order other than that is not important. Stopwords should be removed from the query input. System stopwords list or a separate list can be used. A count of lines returned should be available, or at least a Boolean value if zero rows are returned.
I haven't used technical terms for all the thesaurus functions as this is probably a job for a technician rather than a linguist, however if you've worked with thesauri before it would be an advantage.
Đã trao cho:
3 freelancer đang chào giá trung bình $197 cho công việc này
We understand the Project Requirements as regarding to the algorithm that is used(Stemmer), we can build the Queries and Procedures as Required. we will deliver the project within the Time Line ,Thank-you