Đã Đóng

Big data management

Task: Data Warehouse Design and Implementation in Apache Hive

The objective of this task is to design and implement a sample data warehouse in Apache Hive, which

is described in the following narrative.

A university plans to create a data warehouse to store information about the submissions of student

assignments and later on to analyse the contents of a data warehouse. It is expected that the planned

data warehouse will contain historical information collected over a long period of time.

This data warehouse will contain information about assignment submissions (abbreviated as

“submissions” hereafter), assignments, subjects, students, and degrees.

The following relationships exist between the above domain entities: Each submission belongs to one

assignment and is submitted by one or more students (for individual or group submissions). Each

student is enrolled into one degree. Each assignment belongs to one subject.

A submission is described by a mark, a submission date, and a file path (which refers to a location on

HDFS).

An assignment is described by a weight (percentage), a due date and a specification file path.

A subject is described by a subject code and subject name.

A student is described by a student number, first name, last name, and email address. A student number

and email address separately identify each student.

The time dimension contains four levels: day, week, session (Autumn or Spring) and year.

This data warehouse should support OLAP queries, including the common aggregations about

submissions per subject, per student, per degree, per day, per week, per session, or per year.

You can make reasonable assumptions on the keys of domain entities.

Complete the following questions:

Part 1. Develop a conceptual model for the above data warehouse. The dimensions and hierarchies

must be correctly presented.

Part 2. Specify the OLAP operations for the following specific queries by using

relational algebraic notations (in the slides of Lectures 4 and 5):

(i) “Find the average slack period (i.e., number of days between submission date and due date) for

submissions per subject and per session”

(ii) “Find the average mark for each assignment for the subject ‘ISIT312-912’ in 2017” (Hint. Use

the “DICE” operation at page 46 of Lecture 4 slides.)

Part 3. Transform your conceptual model for Question 1 into a logical model with a star schema.

Note that all level tables in a star schema are flatten, i.e., denormalized.

part 4. Create an (internal or external) Hive table (schema) for each table in your logical model for

part 3.

part 5. Populate the Hive tables for Question 4 with some sample data. More specifically, create a

file containing a few (e.g., three) sample records, which are determined by yourself, for each table in

the local system, and then load those files into Hive. Once done, use HQL to show all data.

Part 6. Implement the OLAP operations for Question 2 as HQL statements on the Hive tables for

Part 5.

Kĩ năng: Ubuntu, Apache, Hive, Virtual Machines

Xem nhiều hơn: help music data management system, design implementation frontend interface analyzing financial market data, master data management implementation plan, informatica big data management training, informatica big data management interview questions, informatica big data management documentation, informatica big data management tutorial, informatica big data management architecture, oracle big data management system, data management for data science uw madison, informatica big data management installation and configuration guide, master data management and data governance second edition pdf, data management in data science, how does data management big data analytics and records management help business to go digital, informatica big data management udemy, big data management pdf, big data management definition, big data management techniques, big data management course, big data management tools

Về Bên Thuê:
( 19 nhận xét ) MARSFIELD, Australia

ID dự án: #31602148

3 freelancer chào giá trung bình$238 cho công việc này

sufyanjamil9

Hi, I've read the description of your posted job with the title "Big data management”. I’m doing this job for the last 7 years. Which carries out different complexity level SQL/MySQL/PostgreSQL/MS Access/Maria DB task Thêm

$600 AUD trong 7 ngày
(1 Nhận xét)
2.1
talha39

I'm data scientist and big data expert working Hadoop Scala spark hive pig HBase pyspark python. I can easily do your work and deliver your work before time

$55 AUD trong 2 ngày
(0 Nhận xét)
0.0
jsljaisingh

I do have 6 years of IT experience with Python, pyspark, Spark, SQL, hive, ETL, Databricks, data analysis and engineering. I do have exposure to Azure and Aws cloud platforms. I am a Microsoft Certified Azure Data Engi Thêm

$58 AUD trong 7 ngày
(0 Nhận xét)
0.0