I need to extract articles based on keywords from extract articles from [url removed, login to view]
It will extract all the articles from a given category, for example
I will input the Keyword category:
[url removed, login to view]
then the script will extract the 30 articles in that page, following the links.
Then it will follow the link Next 30, and it will extract the next 30 articles .... and so on, until there is no Next 30 link, so the articles have finished all articles at that category.
Content wanted to collect from every article:
Ok.. The script will collect:
author in "By" field: NO
Article word count: NO
body article: YES
Adsense ads: NO
whatever followed by http or www : NO (i dont want any url to be collected)
Article Source: http://EzineArticles ..... NO
The script will save all the articles in txt files in 2 ways: first, separated, one file per article, in txt files named from 001 to .... second, in 40 articles files . So if there is 120 articles in a category it will save them in 3 files. If there is 130 articles it will use 4 files and so.
Scritp should be proved widely to prove that no ban ip occurs by Ezinearticles.
I have available a dedicated server. So if your solution is not feasible for running in my PC we will host there.
Please Post your experience with extracting scripts and samples if possible.