The project is based on Laravel/ PHP.
Here is the case, I get news from several news sources every minute. Basically they are wordpress post, as the script we are using for news aggregator is based on Wordpress Plugin.
Now, we are fetching those post to Laravel site via one of those Wordpress to Laravel([login to view URL]).
So far, we can using TextRank([login to view URL]), we can do following for any posts:
Create integer values by find and count the matching words,
Change the integer values by the related words' integer values,
Normalize values to create scores,
Order by scores
To be more precise, we can get bag of words from any wordpress Post.
Now, I am gonna need a complete algorithm and guide, preferably on PHP(if there is any library) that will be able to cluster/ group lists of articles into a same Coverage table. Coverage can have any data(as whatever you say to make algorithm good), what I think is we need coverage ID field, and a field that accepts array of post ID that is similar to each other and has same Coverage ID.
We also have a table called newsTag, that has following field: postId, most important topic mentioned. You can ignore the topic mentioned because, it depends on only the topic that is category, so if we cluster based on topic mentioned from newsTag, we will be limiting clustering ability because in some post there are no topic mentioned.
Provide me complete algorithm, based on it, ask me any questions if you need to and send me a PDF file of algorithm and possible an examples.