University | Singapore University of Social Science (SUSS) |
Subject | ICT233 Data Programming Assignment. |
Question 1 (46 marks)
Objectives:
● Understand datasets with a data scientist mindset.
● Understand and design computation logic and routines in Python.
● Assess the use of Python only and Python data structures to perform extract, load, and transformation operations.
● Assess the use of the Pandas data frame to perform extract, load, transformation, and calculation operations.
● Structure code in appropriate methods (functions), looping and conditions.
● Conduct visualization in an appropriate way.
The dataset in question provides a rich overview of Housing and Development Board (HDB) flat transactions in Singapore. Derived from the national database managed by Singapore’s open data initiative.
The data captured includes vital information such as the resale price, flat type, address, lease commencement date, and floor area, among other details. These elements allow for robust analysis on a multitude of aspects such as price trends and geographical price disparities. You may refer to more information at `https://data.gov.sg/dataset/resale-flat-prices`.
Additionally, this dataset provides an invaluable resource for understanding the evolution of Singapore’s public housing landscape, the preferences of the populace, and market dynamics over time. As such, it is an essential tool for policy makers, real estate professionals, urban planners, and researchers studying Singapore’s unique public housing model.
By addressing the given tasks, you will gain data analysis competencies, including data reprocessing and manipulation, fundamental for preparing and managing datasets.
Reference source not found. you’ll enhance your ability to comprehend data relationships through the practice of creating data
visualizations and executing correlation analysis.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
(a) Load all CSV files containing transacted flats in a given `data` directory and merge all them into a single Pandas DataFrame. Drop the `remaining_lease` column from the merged DataFrame. Are there any columns that contain null values or empty strings?
(b) Convert the `month` column to date-time format. Design a visualization to analyse the `month` column by considering it as a numeric date-time and share insights.
(c) The column `storey_range` is in the format “lower TO upper” (e.g. 1 TO 3). Compute a new column called `storey_level` by calculating the average of the lower and upper storey values. Drop the `storey_range` column from the DataFrame.
(d) Identify inconsistent `flat_model` and `flat_type` values and perform the standardization of the values.
(e) To perform the following visualizations:
(i). Plot a histogram of the `resale_price` to understand its distribution. Is it normally distributed or skewed?
(ii). Generate a boxplot for the `floor_area_sqm` column. Are there any values that lie outside the expected range? If outliers are present, please provide an explanation for their occurrence.
(f) Design and identify FIVE (5) factors that influence the resale price and offer a rationale for each of these correlations.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
Native Singapore Writers Team
- 100% Plagiarism-Free Essay
- Highest Satisfaction Rate
- Free Revision
- On-Time Delivery
Question 2 (60 marks)
Objectives:
● Understand dataset with data scientist mindset
● Design computation logic and routines in Python
● Conduct visualization in an appropriate way
● Assess the design and use of database ORM / SQLite methods to perform extract, load, transformation and calculation operations
The Mass Rapid Transit (MRT) exits dataset, obtained via Singapore’s open data portal spatial dataset, providing data on exit coordinates and associated metadata, is instrumental in geographic-based analysis such as the calculation of distance metrics. Harnessing this data source facilitates a deeper understanding of the impact of public transportation infrastructure on various
urban phenomena, such as residential property resale prices.
(a) Use the `geopandas` and `contextily` libraries to visualize MRT exits based on the contents of the GeoJSON file named `mrt-exits.geojson`.
(b) Perform the following tasks:
Extract the longitude and latitude values from the `geometry` field and create two new columns in the GeoPandas DataFrame.
Use `KMeans` (https://scikitlearn.org/stable/modules/generated/sklearn.cluster.KMeans.html) clustering from
the `sklearn` library to identify `5` clusters of these MRT exits based on their
geographical coordinates.
Create a plot visualizing these clusters with different colors and add the map of Singapore as the background using `geopandas` and `contextily`.
Buy Custom Answer of This Assessment & Raise Your Grades
(c) Perform the following tasks:
Map each cluster of MRT exits to one of the five main regions of Singapore: Central
Region, East Region, North Region, North-East Region, and West Region.
Update the GeoPandas DataFrame by adding a new column `region` representing the region to which each MRT exit belongs.
(d) Calculate the number of MRT exits for each region using three different methods:
1) Utilize the pandas DataFrame.
2) Leverage the sqlite3 library.
3) Employ SQLAlchemy and ORM approach: Here, we first define a Python class representing the MRT exits (`longitude`, `latitude`, `region`). We then use this class to insert our data into a SQLite database and execute a query to get the number of exits for each region.
(e) Perform the following tasks:
Draw a random sample of 100 transacted flats from Question 1 with the random seed set to 0.
Utilize the `geopy` library’s `Nominatim` or `GoogleV3` geocoder to obtain the longitude and latitude data for the 1000 transacted flats.
(f) Perform the following tasks:
Incorporate the data from the `data/town_to_region_mapping.json` file to introduce a new column named `region` into the DataFrame. (Note: Disregard the `region` column present in the `addresses.csv` file during this process.)
Based on your visualizations and data analyses, articulate two key conclusions.
(g) Perform the following tasks:
Formulate a scatter plot to depict the correlation between the resale prices of flats and their haversine (https://scikitlearn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances. html) distances to the Central Business District.
Incorporate additional dimensions into your plot: the year of the transaction
(specifically 2015, 2020, and 2023) and the region of the flat’s location.
Use distinct color codes to denote different regions.
Also, display the town of each transaction as individual data points on the plot.
Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers
Encountering challenges with your ICT233 Data Programming Assignment at Singapore University of Social Science (SUSS)? Look no further! We specialize in offering top-tier support through tutor marked assignments in Singapore, enabling you to excel in your academic pursuits. Our comprehensive assistance includes access to our Computer Science Assignment Help in Singapore, designed to meet your unique academic requirements. Our team is committed to aiding you in your academic journey. Let us guide you in loading all CSV files and merging them into a single Pandas DataFrame, an essential element of your SUSS coursework. Say goodbye to academic stress and welcome academic excellence with our unparalleled expertise.
Looking for Plagiarism free Answers for your college/ university Assignments.
- ECE210 Advocacy and Collaborations with Families Assignment: Supporting Young Children Through Grief and Family-Centered Partnerships
- ACC707 Accounting and Finance Assignment: Evaluating Investment Decisions, Budgeting Strategies, and Financial Performance Analysis
- NCO201 Learn to Learn, Learn for Life TMA-01: Developing Self-Directed Learning Through the Journey of Mastering Public Speaking
- PSS219 Public Safety and Security in Singapore Group-Based Assignment: Strengthening National Resilience Through Policy Responses from the 2025 Committee of Supply Debate
- MTH240 Engineering Mathematics I Assignment: Heat Transfer, Chemical Balancing, Circuit Analysis, Signal Processing, and Matrix Theory
- Engaging Youth with IBM Skills Build Assignment: Designing Innovative Strategies for Skill Development and Career Growth
- BUS368 Innovation Management and Digital Transformation Assignment: Managing Innovation in Foldable, Trifold, and Stretchable Display Technologies
- BUS366 Assignment: Process Improvement and Recruitment Optimization Using Lean Six Sigma Methodology
- HBC203 Statistics and Data Analysis for the Social and Behavioural Sciences TMA-01: A Comparative Analysis of Workplace Wellbeing Interventions and Their Impact on Employee Productivity
- BCAF003 Business Accounting Assignment: A Comprehensive Study on Bank Reconciliation, Cash Controls, Inventory Valuation, and Financial Analysis