Đang Thực Hiện

158743 scrape HTML table data for DB

Summary

Ruby or Python script for Win32 that will convert table data in html source files into a database

consumable file format (XML). Creating a relational structure to the data is also welcome.

Steps

1) write a html file parser (prefer Python or ruby code) that grab all the relevant table data from a

set of html files. Iterate this step thru a series of sequentially numbered files.

2) after grabbing all the table data in a file, the program will analyze certain html tables. The table

can be populated in several variations, so code should be intelligent to recognize the various table

structure (not that difficult since there are just a couple of slight variations).

3) after parsing the table, turn the relevant element data into an XML txt file (or some other format

that can be easily imported into a MySQL database)

(optional)

4) Create a relational structure to the data in the MySQL database.

Speed or efficiency not a priority. This is basically a one-time data port.

Skill Requirements

Ruby or Python (prefer Ruby). Simple MySQL database knowledge would help (how to import xml or other

data into database).

Kỹ năng: Bất kì công việc gì, Python, SQL

Xem thêm: tables in data structure, table data structure, simple data structure, set in data structure, set data structure, html program code, data structure that, data structure set, data structure code, a data structure, python html parsing, mysql data files, how to html program, xml scrape, write a python script, time table, table, summary table, scrape sql database, scrape python, scrape html, python data, html table, html parser, database html sql

Về Bên Thuê:
( 5 nhận xét ) Cambridge, United States

Mã Dự Án: #1904932

Đã trao cho:

andrepl

This can be done easily. I suggest bypassing the XML and entering the data directly from the parser into the database. I could accomplish it faster with php, but can also do it no problem in ruby or python.

$45 USD trong 1 ngày
(0 Đánh Giá)
0.0