Find Jobs
Hire Freelancers

bigdata project

$10-30 USD

Đã hủy
Đã đăng vào hơn 7 năm trước

$10-30 USD

Thanh toán khi bàn giao
In this project, you will be asked to work on the ETL concept. ETL stands for Extract, Transform, and Load process. The extract phase may refer to the process of obtaining and mine the data required for the analysis project. This may include some cleaning and combining of the data. The transform phase is the process to make the acquired data comply with the format you are planning to use in the future steps. The load phase is about shipping the data into the enterprise systems (e.g. the database). In this project our focus is on the first two stages, namely Extract and Transform. We may see the last stage in a future assignment. You will be given a group of files. These files represent the customer reviews of a group of products. These products are: Canon G3 camera, Dvd player, Jukebox, Nikon Coolpix, and Nokia 6610. These review files are semi-structured in a format very specific to the website that generated this data. Usually this format limits the gains that can be attained from this data. To overcome this, you need to change this format into a popular format which is the JSON format. Please see below: 1. General Information: Data Description a. You will be given a folder that has the review data files. Each file is fully about one product. b. Each file is semi-structured in the following manner: i. The top of the file has a block of information related to the source of the data. This part is important but it is not of an interest for us in the moment. 1. This block will be ignored. ii. Any review starts with [t] followed by the review title 1. Ex: [t]great camera iii. Any positive aspects will start with [+n], where n can be 1, 2, 3….or any number of points representing a good score. Followed by ## and then the text of the review. 1. Ex: [+2]##i have only had this camera for one full day and i have to say that it is wonderful . iv. Any negative aspects will start with [-m], where m can be 1, 2, 3….or any number of points representing a bad score. Followed by ## and then the text of the review. 1. Ex: [-1]##* main dial is not backlit . 2. First part: Extract, Clean, and Combine a. In this part you will extract the data from the files given in point 1 above. b. In the extract process, you have to identify the product that is being reviewed. c. After identifying the product, each review has to be identified by the title. d. In a review you will identify the positive and negative aspects. To distinguish both, you can look and point 1 above. i. Extract the positive aspects of a review ii. Extract the negative aspects of a review. e. You should know that the text of any review, title and content, will not be clean. The cleaning should be done in the following manner: i. The text should not contain any special characters (!,@,#,$,%,^,&,”,’,<,>,/,? and *) ii. Also speech punctuation symbols have to be removed (, ; : \t) iii. The reviews are written in English. If for any reason a review has a language other than English, you can ignore such a review and do not include it in the output. f. After cleaning all the positive aspects, these will be combined in one single text block of positive criticism. g. The same will be done to the negative aspects. 3. Second part: Data Transformation a. After extracting, cleaning, and combining every review, the data will be put into the outpu
Mã dự án: 11657395

Về dự án

5 đề xuất
Dự án từ xa
Hoạt động 8 năm trước

Bạn muốn kiếm tiền?

Lợi ích khi chào giá trên Freelancer

Thiết lập ngân sách và thời gian
Nhận thanh toán cho công việc
Phác thảo đề xuất của bạn
Miễn phí đăng ký và cháo giá cho công việc
5 freelancer chào giá trung bình $31 USD cho công việc này
Avatar người dùng
Hi, I'm Abhijit Mondal from Bangladesh and my background is in Computer Science and Engineering at Bangladesh University of Engineering and Technology. I am an expert Python, Java and Android developer and I have 5 years of experience coding with these languages. I have read the job description you provided above. I am very much familiar and interested to develop your required software with Python/Java. If I get the proper job description, I will start from right then working on it as I am completely free to work in next few weeks. I think it will be nice working for you.
$61 USD trong 1 ngày
5,0 (5 nhận xét)
3,0
3,0
Avatar người dùng
We can use Apache Spark to do this. It has by default inbuilt libraries to read the unstructured data and convert into JSON. Further preprocessing can be done. I have worked with spark and NoSQL databases as well like Cassandra and HBase.
$40 USD trong 3 ngày
4,8 (2 nhận xét)
2,2
2,2
Avatar người dùng
experienced in handling big data projects experienced in python used pandas module in python for handling big data projects
$15 USD trong 3 ngày
0,0 (0 nhận xét)
0,0
0,0

Về khách hàng

Cờ của UNITED STATES
United States
0,0
0
Thành viên từ thg 9 25, 2016

Xác thực khách hàng

Các công việc khác từ khách hàng này

python using pandas
$30-250 USD
bigdata python
$10-30 USD
Cảm ơn bạn! Chúng tôi đã gửi email chứa đường link để bạn lấy tín dụng miễn phí.
Đã xảy ra lỗi trong khi gửi email của bạn. Hãy thử lại.
Người Dùng Đã Đăng Ký Tổng Số Việc Đã Đăng
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Đang tải xem trước
Đã cấp quyền truy cập vị trí.
Phiên đăng nhập của bạn đã hết hạn và bạn đã bị đăng xuất. Hãy đăng nhập lại.