I have an input list of band names (line by line). I would like these names to be compared to eachother and I would like all close matches reported in an output file. I would recommend using something like Lucene. Do not try to create your own search algorithm. For example: Input file: Fresh Band Dave Mathwes Dave Matthews Band Dave Matthew Band Dave Matthe Funky Band Output file: Set 1: Dave Mathwes Dave Matthews Band Dave Matthew Band Dave Matthe The only close band names are the 4 that look like Dave Matthews. You should have a variable in the program that I can set that determines what "closeness" cut-off you should choose. For example, in one version of Lucene that I've seen there is a number between 0 and 1 to show the probability that it is a match. In this case, I would want to see everything that is a match with probability of .7
Đã trao cho:
This is for a refined version of what I showed you in PMB, made to work on a list instead of just 2 files :) I have all the messengers for easy communication too.