I have vector sets, of the format:
(and 8 more)
The numbers are relevance scores for each word.
The goal is to reduce/cluster this into fewer sets. e.g., to 2 sets.
There are existing methods for clustering, e.g., cosine similarity or
[url removed, login to view]
Your input is a textfile similar to the above (my file format is slightly more complex and larger). The output should include a measure of similarity. i.e., a way to know when to stop clustering. In some cases, we will reduce to 8 clusters, in others we cluster down to 1-2 clusters.
If you are familiar with nltk or similar, this should be a simple task.
Được trao cho:
Hello Sir. I have a really strong understanding in NLP. In addition, I have worked with nltk for quite a long time. Therefore, I totally can handle this task.