University | Singapore University of Social Science (SUSS) |
Subject | ANL303: Fundamentals of Data Mining |
Question 1
Diabetes is a chronic disease. Because of patients’ low awareness, it may exist in a patient’s body many years before clinical diagnosis. Beyond the earlier point-in-time for diagnosis, diabetes may have an irreversible influence on patients’ condition. Fatal complications such as heart attacks, strokes, and kidney damage can result as the disease progresses.
Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers
Yet, these complications would be easily controlled or even prevented in some cases with early detection and treatment of diabetes. Medical researchers have proposed various data analytics solutions for the provision of timely diagnosis and improvement of disease knowledge.
A medical firm GoodHealth proposed association analysis. It collected a dataset (“diabetes_symptoms.csv”) that records the demographics of diabetic patients as well as their diabetes-related symptoms. The description of the dataset is listed in Table 1.
On the other hand, another medical firm PerfectCare proposed cluster analysis. It collected a dataset (“diabetes_diagnosis.csv”) that contains the diagnostic measurements of 768 patients who may or may not be diabetic (as indicated in the field Outcome). The description of the dataset is listed in Table 2
Table 1. Description of the dataset “diabetes_symptoms.csv”
Category | Field | Values |
Demographics | Age | Continuous |
Gender | Male / Female | |
Symptom | Polyuria | Yes (Presence) / No (Absence) |
Polydipsia | Yes (Presence) / No (Absence) | |
Sudden weight loss | Yes (Presence) / No (Absence) | |
Weakness | Yes (Presence) / No (Absence) | |
Polyphagia | Yes (Presence) / No (Absence) | |
Genital thrush | Yes (Presence) / No (Absence) | |
Visual blurring | Yes (Presence) / No (Absence) | |
Itching | Yes (Presence) / No (Absence) | |
Irritability | Yes (Presence) / No (Absence) | |
Delayed healing | Yes (Presence) / No (Absence) | |
Partial paresis | Yes (Presence) / No (Absence) | |
Muscle stiffness | Yes (Presence) / No (Absence) | |
Alopecia | Yes (Presence) / No (Absence) | |
Obesity | Yes (Presence) / No (Absence) |
Table 2. The description of the dataset “diabetes_diagnosis.csv”
Field | Values |
Pregnancies | Continuous |
Glucose | Continuous |
Blood Pressure | Continuous |
Skin Thickness | Continuous |
Insulin | Continuous |
BMI | Continuous |
Diabetes Pedigree Function | Continuous |
Age | Continuous |
Outcome | 0 (Negative, i.e. non-diabetic) / 1 (Positive, i.e. diabetic) |
a. Both association analysis and cluster analysis can generate value-added information for improving healthcare services for diabetic patients. Yet, they analyze the data differently, thus the insights obtained from them are also different.
(i). Based on the dataset collected by good health, identify one (1) data mining objective that can be achieved by using association analysis. Discuss how the healthcare sector can benefit from this application of association analysis.
(ii). Based on the dataset collected by PerfectCare, identify one (1) data mining objective that can be achieved by using cluster analysis. Discuss how the healthcare sector can benefit from this application of cluster analysis.
(b) As good health is interested in symptoms that patients have, rather than those they do not have, only the presence of symptoms is considered in the association analysis. Using the dataset collected by good health, execute the Apriori algorithm in IBM SPSS Modeler with the minimum antecedent and confidence thresholds to be 20% and 85%, respectively. Examine the association rules from the output and select one (1) association rule to explain its meaning in terms of support, confidence, and rule support. Provide a
the screenshot that can show clearly the values of support, confidence, and rule support of your selected rule.
(c) A medical expert suggested that age could be a risk factor for diabetes. It would be interesting to find out how age above 40 associates with the presence of diabetes-related symptoms. To prepare the data, derive a new field that indicates whether the patient’s age is above 40. Then, revise your association analysis in (b) (i.e., the minimum antecedent and confidence thresholds remain to be 20% and 85%, respectively) by including the
newly derived field. In your answer, explain, with screenshot(s), how you prepare the new field for association analysis in IBM SPSS Modeler. In addition, provide a screenshot that can show the total number of association rules obtained.
(d) GoodHealth tried to include the flag field “Gender” in the association analysis that considers only true values for flags. However, it is observed that there is a problem with the rules obtained. Identify the problem in this case and suggest a method to solve the problem. You are not required to generate any association analysis results from IBM SPSS Modeler for this question.
(e) Using the dataset collected by PerfectCare, only continuous fields are selected as the clustering criteria. PerfectCare thinks that the clustering model is useful when it can identify particular group(s) of patients who are more likely to be diabetic. Assuming that there is no compelling reason to omit outliers from the cluster analysis, construct a means model with two clusters. Describe the profile of each cluster and assess if the model is useful.
In your answer, provide screenshots of your model (showing Model Summary, Cluster Quality, and Cluster Comparison) and any analytics results that you have generated to support your assessment.
Buy Custom Answer of This Assessment & Raise Your Grades
Looking for a reliable ANL303: Fundamentals of Data Mining Assignment Help then request us to "write my assignment". We have a group of data management assignment experts who are knowledgeable in their area of big data analytic, data programming, and data management Assignment. Our Assignment experts work 24 hours to support you on your complicated assignment and provide you error-free solution at a very low-cost price.
Looking for Plagiarism free Answers for your college/ university Assignments.
- BM0973 BCRM Assignment: Genting Highlands Case Study for Crisis Response and AI-Supported Recommendations
- AC0779 Strategic Management Assignment Essay: Key Activities & Importance in Dynamic Healthcare Settings
- ComfortDelGro Organisational Design Assignment Report: ESG Alignment with UNGC Principles & Sustainability Strategy
- Bomb Threat Management Assignment: Incident Response Plan for High-Risk Facilities in Singapore
- Security Concept Plan Assignment Report: International School Campus Protection Strategy at Jurong East
- CM3065 Intelligent Signal Processing Assignment Report: Midterm Exercises on Audio Captcha, Steganography & Speech Recognition
- BUS306 Risk Assessment Case Study: Outback Retail Ltd Audit Strategy and Substantive Testing Plan
- PSB6013CL Digital Marketing Strategies Project: Exploring Consumer Purchase Intentions in the Fashion E-Commerce Industry
- FinTech Disruption Assignment Report: Case Study on Digital Transformation in Financial Services Industry
- Strategic Management Assignment : Netflix vs Airbnb Case Analysis on Competitive Strategy and Innovation