Đã hoàn thành

Scrapping WebPage Extenda

Dự án này đã kết thúc thành công bởi INneerajlodhi với giá €94 EUR trong 3 ngày.

Nhận tin báo giá cho dự án tương tự
Ngân sách dự án
€30 - €250 EUR
Đã hoàn thành trong
3 ngày
Tổng đặt giá
19
Mô tả dự án

This project consist in developing a program/script to make scraping of this webpage: [url removed, login to view]

It's necesary to obtain all company profiles information from all pages of this website in a CSV dataset.

The result of this project is the program and the CSV with all profiles information.

You can see the fields to extract in the image attached (see the name fields below too).

Also you can see an example of the extraction of information from that profile in the CSV attached.

You must search profiles by clicking in the listboxes of Sector, Provincia (province) and Municipio (municipality) fields.

The number of company profiles by province are:

province Nº of profiles

ALMERIA 822

CADIZ 1010

CORDOBA 1379

GRANADA 1066

HUELVA 515

JAEN 741

MALAGA 1447

SEVILLA 3813

TOTAL 10793

The CSV must have:

- Separator: pipe -> '|'

- Codification: latin1 8859-1

- Enclosure (of the strings fields): "field_string"

- Trim all the fields

- Fields (all fields of a company profile):

f1 : sector (only 4 posible values: "AGROALIMENTARIO", "CONSUMO", "INDUSTRIA","SERVICIO")

f2 : provincia (only 8 posible values: "ALMERIA", "CADIZ", "CORDOBA","GRANADA","HUELVA","JAEN","MALAGA","SEVILLA")

f3 : municipio (number of municipalities values depending of the province)

f4 : razon_social

f5 : nombre_empresa

f6 : direccion

f7 : telefono

f7b : telefono2 (if there were additional phone number must be add with the suffix "2", "3", etc)

f8 : fax

f8b : fax2 (if there were additional fax number then must be add with the suffix "2", "3", etc)

f9 : actividad (if there were multiple values then must be separeted by ", ")

f10 : productos (if there were multiple values then must be separeted by ", ")

f11 : codigo_postal

f12 : correo_electronico

f12b: correo_electronico2 (if there were additional email address then must be add with the suffix "2", "3", etc)

f13 : web

f14 : marcas (if there were multiple values then must be separeted by ", ")

(...)

Aditional rules to fields:

- If there were more fields in any profile then must be added the new field in the CSV.

- If there were any field with null value like "-" then must be empty ""

- For "telefono", "telefono2", "fax", "fax2" fields must be a numeric value without any spaces " "

Finally, to make the payment of the project you must generate an invoice with my company information that I will give it to you.

Được hoàn thành bởi:
Các kỹ năng được yêu cầu

Muốn kiếm tiến?

  • Hạn định ngân sách và khung thời gian
  • Mô tả đề xuất của bạn
  • Nhận thanh toán cho công việc của bạn

Thuê những Freelancer đã đặt giá trong dự án này

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online