*XML data design
*XML & XHTML parsing and serializing
*UTF-8 and ISO-8859 character encoding and serialization
Screen scraping and data repackaging guru needed to extract html from multi-paged and formatted forum from portuguese html text into UTF-8 xml files containing XHTML data.
1) Experience with screen scraping, and both html and xml parsing tools. Alternative 1: Person has significant experience in use of robot software that can crawl, parse, and repackage data into xml from a site's html. Alternative 2: Person has significant experience in use of standard open source tools that can accomplish this task. I AM NOT INTERESTED IN DEVELOPERS CREATING THEIR OWN TOOL FROM SCRATCH
2) Person has significant experience in UTF-8 and Latin character encoding and entity escaping in XML and text
Brasilieros/as, o pelo menos fala Portuguese do Portugal. O conteudo todo fica em Portuguese.