Given web articles containing textual content, we'd the following entities extracted:
This will be around at certain points in the article which we will specify.
Your module will take in English text that we provide (html or straight flat text if you prefer) as input and return the entities at these points.
You will probably want to use existing software. If so, please specify the name of the software in your bid.
We will provide the hardware for this. You must install it on our linux servers.
If you have more questions, please don't hesitate to ask.
Some answers to questions so far:
* Hardware: would preferably run on a single P4, but may be on a Dual Xeon.. Max 2GB memory.
* It absolutely cannot be done manually. There will be millions of articles
* (See [url removed, login to view] for more info on the process.)
* We will provide the data you need (html or text) in a database. You will then update it to this database, where you can also store a flag for whether you've processed individual records or not.