Latent Dirichlet Allocation (LDA) is a method for clustering documents.
Write a Latent Dirichlet Allocation implementation for PHP, most probably using Gibbs Sampling. I do not know of any PHP implementation, but there are for python ([url removed, login to view]), C ([url removed, login to view]), and other languages.
The input will be an array of strings (each string = a document).
The output will be the LDA clusters (topics, word probabilities within each topic, and topic assignments for each document). It should run in native PHP, no extensions.
We will incorporate your code into a larger system which will cluster WordPress blog articles.
Please read the project first and write the word "Understood". Also write your steps and suggestions to complete the project with a short description of what you understood.