We are looking for a script (preferably in php) that will parse a wiki page from either a wikipedia dump or mediawiki API. The information need to be returned are:
1. abstract of the page
2. page image
For more info please see the attached image from facebook, look at how they parsed image and text from wiki.
We can discuss exact detail as to how to accomplish this. We can be flexible in terms of your skill set. However, we desire competent programmers to participate.
We intend to parse a large amount of wiki pages (possibly the entire wikipedia database). So please come up with a proposal on how to generate pages (whether scraping or using API or static dump)
If you go to http://www.facebook.com/pages/Alec-Baldwin/112352762113130 , you will see that facebook has links in the paragraph like wikipedia does but the links are pointing back to facebook version of the page. It would be nice to achieve that.
For the most part we believe that wiki dump will probably give us the best result, we'd like to hear your opinion.
Check out our other project. Winner will get more work from us for sure!