We need to export structured data from a german paintings catalogue spread over around 10 quark documents, probably as XML-Export. The catalogue contains about 1500 paintings and descriptions of those paintings. We want the description of the paintings - work number, title, title ennglish, Owner, Notes etc. - in a database. We have these Quark files, where every painting has a text frame that contains these descriptions. We believe these can be exported as XML data in a structured way, so that we can import them into a database. The content of the frames is fairly well structured by different rules, so it should be possible to export that to fields in a XML files: The attached [url removed, login to view] show pages with those fields. Rules would include:
* First paragraph, first row: Work number
* Second paragraph, first row, bold font: Title German
* Second paragraph, second row, normal font: Title English
* Third paragraph, first row: Size of the painting
* Third paragraph, second row: signature
* Any of the next paragraphs, containing the word "Provenienz:" at the beginning: Sales history
* Any of the next paragraphs, containing the word "Literarure:" at the beginning: Literature concerning the painting
* Three or four more rules may be addes.,
In the end, we need SQL or some structured data that we can use to create SQL. I suppose the way to do it would be Quark XML export (maybe with the Atomik extension), but I am open to anything that will give me structured data. I have attached a sample Quark file containing a few pages of this catalogue and, for quicker viewing, a sample PDF file (also sample pages). We don't have Quark ourselves, so it has to be possible that the bidder downloads the files from our server (around 250-300 MB) and hands back the structured data, without us opening anything in Quark.
I select the lowest budget range for this project. If, however, someone can make it clear to my that this project is harder to do than I anticpate, I am willing to pay more.