Đang Thực Hiện

145109 Word Data Extract to PHP/MySQL

Word Doc Data Extraction & PHP/MySQL Database Script w/ Search

Summary: Extraction to a searchable database of formatted text and images out of Word docs organized into tables.

This is a unique project in its specifications, so please read carefully before bidding.

On a server, I will have an indefinite number of Microsoft Word files in a specified folder with a specified file mask. Say, test*.doc. The number and content of the files will change over time.

Each Word file consists of a Word table, consistent across each page of the document, three rows and two columns. The Word files will be indeterminate in length but always conform to this format.

In each cell of the document will be a mix of formatted text and occasionally pictures. The content of the cells correspond to basically, a question bank, with one cell having a question and another cell having the answer. They are laid out for duplex printing, which means the question and answer are on different pages (and reversed from left to right), but there is a consistent relationship in the layout.

Odd numbered pages have answers and even numbered pages have questions.

You will need to program some sort of script that will, once a day, scan any new or changed documents (including all of the files matching the file mask the first time it is run) and input the questions and answers (paired together and extracted together based on the duplex printing cell pattern) into a MySQL database, preserving the formatting information, layout and images, if present.

The database table used to store the content would have columns for a unique ID of the question/answer combination, the filename it was extracted from, a date/timestamp of when the information was captured, the question, the answer, and a boolean of active/inactive.

The database/script would need a mechanism to detect changes or new files when it is run subsequently- this should be based on a checksum of the file, not a date modified check. If a file is changed, all previous question/answer entries associated with that file would be marked as "inactive" in the table (so this is another boolean variable necessary).

The interface for this database needs to be limited and secure. Basically, we need a "Google-like" full text search of the question and answer. A search would return results from matching question/answer pairs to the query. The matching results could source from multiple Word files.

The results should be returned to an html page, in a table, with full formatting applied and any images embedded, with a layout exactly like the Word doc from which it was extracted, except that the question and answer will be side by side in the table.

Access to this interface will be controlled by a username and password.

We need the ability to set usernames and passwords for it, as well as customized clearances as to number of queries per day for the user, and the maximum number of results to show for a query (not just on a page, but that will be shown, no matter what, to the user). The queries should be returned sorted by relevance.

However, certain portions of the "question" field will need to be suppressed, but still searchable. This will be based on text patterns in the question cells that we will show after you accept the project.

Also, a master user account will need to available that has no restrictions on queries, number of results, and no suppression of the above-mentioned content.

This project, while unique, should be straightforward to someone who knows what they're doing. Samples of target files provided upon project start.

Kĩ năng: Bất kì công việc gì, MySQL, PHP

Xem nhiều hơn: what to program, what is data input, this account is inactive, text pattern matching, source formatting, sorted data set, sorted data, set of pairs, out source printing, Mysql questions, mysql data access, master page samples, google docs secure, format mask, first source bank, database one word or two, data access interface, associated bank, are google docs secure, applied bank, account is inactive, account inactive, access to php mysql, what is microsoft word, what is google docs used for

Về Bên Thuê:
( 10 nhận xét ) United States

ID dự án: #1891285

Được trao cho:


Hello Tom, This is Reuben Mollel. I had an opportunity to work with you in another project. I would like to help you with this project. I am going to build you a min-admin system using PHP and MYSQL. This system wil Thêm

$500 USD trong 10 ngày
(0 Đánh Giá)