I am involved in a language/grammar project.
I need a simple application that will open a large text file (for example a book in electronic format), extract the words, and sort them in order of frequency.
This sorted list is then exported to a spreadsheet with the list of words in one column and the frequency of the word in the column beside it.
The following is an example of the spreadsheet …
The program has one variable, “Phrase length”. This can be set before the text file is processed, and has a range of 1 to 5.
For example if the “Phrase length” variable is set to 3 the spreadsheet might look like the following …
went to the 12
you have been 9
1. Capitals are ignored
2. The phrases do not transverse punctuation marks such as , . ; : ? ! ” ( ) - /
For example, (with the phrase-length is set at 3), the following text is processes as below:
I am a boy. The
I am a 1
Am a boy 1
(NB “a boy. The” does not appear in the list)