University | University of Wollongong (UOW) |
Subject | CSCI312: Big Data Management |
Question 3
Consider the following logical schema, that implements a two-dimensional data cube.
The data cube contains information about the enrolments of subjects performed by the students.
Assume, that the files student . txt, subject . txt, and enrolment. txt contain data consistent with a logical schema of two-dimensional data cube given above. Internal format of each file is a sequence of values separated with the commas (CSV format).
(1)Write a sequence of commands, that load the files into HDFS. A location for the files in HDFS is up to you.
(2) Write HQL statements that create the Hive tabular views of the files student. txt, subject . txt, and enrolment . txt loaded into HDFS.
(3) Write HQL statements to retrieve the following information from the data warehouse. Each correctly implemented statement is worth 1 mark.
(i) Find the total number of enrolments per student, per subject, and per both students and subject and the total number of enrolments. List the values of the attributes: student-number and subject code and the total number of enrolments.
(ii) For each subject and for each year list year (enrolment-date) , subject-code and the scores in a subject in a year ordered in an ascending order of scores, and an average of all scores in a year.
(iii) Find an average score in all subjects per year, and per both subject and year, and per both student and year. You can use the row functions year to extract a year from a date. List the values of the attributes: year (enrolment date) , student number and subject code, and an average score.
(iv) For each student and for each subject list a pair: student-number and subj e c t-code together with an average score of all subjects enrolled by a student.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Question 4
Consider the following logical schema of a relational database, that implements a data cube with historical information related to the subjects enrolled and dropped by the students.
Write HBase shell commands to create a single HBase table, that implements a logical schema given above.
Write HBase commands to load into the table information about at least two subjects, one student, two enrolments and one drop. Please remember, that the students are allowed to enrol and/or drop many subjects and a subject can be enrolled dropped by many students.
Your HBase table must be created in a way, that does not contribute to any data redundancies when information about students, subjects, enrolments and drops is entered into the table.
(2) Write HBase shell commands, that implement the following queries and data manipulations on the HBase table created and loaded with data in the previous step. Each correctly implemented task is worth 1 mark.
(i) Find all information (student number and full name) about the students enrolled in a subject ISIT312.
(ii) Find all information (subject code and title) about a subject ISIT312.
(iii) Add a column family LECTURER and allow for two versions in each cell of the new column family. (iv) Assume that lecturers are described by an employee number and full name. Insert into the table information about a lecturer and about a subject taught by a lecturer. Assume, that a lecturer teaches one subject and each subject is taught by one lecturer.
Question 5
In this question, we use the same logical schema of the two-dimensional data cube as in Question 3.
Assume, that the student of the file. txt, subject. txt, and enrolment. txt contains data consistent with a logical schema of the two-dimensional data cube given above. The internal format of each file is a sequence of values separated by the commas (CSV format).
Assume, that the files have been already loaded to HDFS. Write Pig-Latin statements that implement the following queries. Correct implementation of each query is worth I mark. (1) Find the full names of students who enrolled in a subject with a code 1S1T312.
(2) Find the student numbers of students of the customers who never enrolled in a subject with a code 1S1T312.
(3) Find the student numbers of students who enrolled in both subjects with the codes ISIT312 and CSCI317.
(4) Find the subject codes together with the total number of students enrolled in each subject.
Question 6
In this question, we use the same logical schema of the two-dimensional data cube as in Question 3.
Assume, that the student of the file. txt, subject. txt, and enrolment. txt contains data consistent with a logical schema of the two-dimensional data cube given above. The internal format of each file is a sequence of values separated by the commas (CSV format).
Assume, that the files have been already loaded to HDFS. Implement the following Spark-shell operations. Correct implementation of each operation is worth I mark.
(1) Create the DataFrames, that contain information about students, enrolments and subjects.
(2) Implement a query, that accesses the data frames created in the previous step and finds the total number of enrolments in a subject 1S1T312.
(3) Implement a query, that accesses the data frames created in the previous step and for each student finds the total number of enrolments performed by a student.
(4) Register the DataFrames, which contains information about the students, enrolments and subjects as SQL temporary views.
(5) Use SQL views created in the previous step to find the titles of subjects together with the total number of students enrolled in each subject.
Buy Custom Answer of This Assessment & Raise Your Grades
Singapore Assignment Help presents high-quality computer science assignment help on CSCI312: Big Data Management Assignment. Our experts are well qualified and talented to deliver the best solution on data management assignments at a cheap price.
Looking for Plagiarism free Answers for your college/ university Assignments.
- ComfortDelGro Organisational Design Assignment Report: ESG Alignment with UNGC Principles & Sustainability Strategy
- Bomb Threat Management Assignment: Incident Response Plan for High-Risk Facilities in Singapore
- Security Concept Plan Assignment Report: International School Campus Protection Strategy at Jurong East
- CM3065 Intelligent Signal Processing Assignment Report: Midterm Exercises on Audio Captcha, Steganography & Speech Recognition
- BUS306 Risk Assessment Case Study: Outback Retail Ltd Audit Strategy and Substantive Testing Plan
- PSB6013CL Digital Marketing Strategies Project: Exploring Consumer Purchase Intentions in the Fashion E-Commerce Industry
- FinTech Disruption Assignment Report: Case Study on Digital Transformation in Financial Services Industry
- Strategic Management Assignment : Netflix vs Airbnb Case Analysis on Competitive Strategy and Innovation
- Strategic Management Assignment Report: Unilever Case Study on Industry Analysis and Growth Strategy
- PSB6008CL Social Entrepreneurship Assignment Report: XYZ Case Study on Innovation and Sustainable Impact