Đang Thực Hiện

158743 scrape HTML table data for DB

Summary

Ruby or Python script for Win32 that will convert table data in html source files into a database

consumable file format (XML). Creating a relational structure to the data is also welcome.

Steps

1) write a html file parser (prefer Python or ruby code) that grab all the relevant table data from a

set of html files. Iterate this step thru a series of sequentially numbered files.

2) after grabbing all the table data in a file, the program will analyze certain html tables. The table

can be populated in several variations, so code should be intelligent to recognize the various table

structure (not that difficult since there are just a couple of slight variations).

3) after parsing the table, turn the relevant element data into an XML txt file (or some other format

that can be easily imported into a MySQL database)

(optional)

4) Create a relational structure to the data in the MySQL database.

Speed or efficiency not a priority. This is basically a one-time data port.

Skill Requirements

Ruby or Python (prefer Ruby). Simple MySQL database knowledge would help (how to import xml or other

data into database).

Kỹ năng: Bất kì công việc gì, Python, SQL

Xem thêm: tables data structure, table data structure, simple data structure, set data structure, html program code, data structure set, data structure code, python html parsing, mysql data files, time table, table, summary table, python data, html table, database html sql, data source sql, convert txt files, html table txt, python parsing script, table set html, python html script, mysql data grab, write relational database, mysql time series, python convert file

Về Bên Thuê:
( 5 nhận xét ) Cambridge, United States

Mã Dự Án: #1904932

Đã trao cho:

andrepl

This can be done easily. I suggest bypassing the XML and entering the data directly from the parser into the database. I could accomplish it faster with php, but can also do it no problem in ruby or python.

$45 USD trong 1 ngày
(0 Đánh Giá)
0.0