Đang Thực Hiện

Parquet is more space efficient than JSON/CSV

Job Description:

It is generally said that the parquet format is better in terms of storage than JSON and CSV. The first link below says "Apache Parquet is a columnar file format that provides optimizations to speed up queries and is a far more efficient file format than CSV or JSON".

[login to view URL] to an external site.

[login to view URL] to an external site.

Now, let us try to demonstrate this. Download this CSV file (with 50,000 rows).

[login to view URL] to an external site.

Load the file as dataframe in Spark and save the dataframe again in JSON and Parquet format and check their file sizes. Do you see differences in file sizes? Report here.

Parquet is supposed to run faster than CSV. Show one query result to demonstrate that (such as finding the number of unique values in a certain column or so).

Kĩ năng: Big Data, JSON, Spark

Về khách hàng:
( 1 Nhận xét ) kansas, United States

ID dự án: #35261383

Được trao cho:


✔✔✔✔ Nice to see your posting ✔✔✔✔ Hi, sir. I read your job posting and I am interested in Parquet. SO what I have to do? Please tell me via chat. I hope to work with you. Best regards. Thanks!

$8 USD / giờ
(0 Đánh Giá)