ANL303: Diabetes is a chronic disease Because of Patients’ Low Awareness: Fundamentals of Data Mining Assignment, SUSS, Singapore

University Singapore University of Social Science (SUSS)
Subject ANL303: Fundamentals of Data Mining

Question 1
Diabetes is a chronic disease. Because of patients’ low awareness, it may exist in a patient’s body many years before clinical diagnosis. Beyond the earlier point-in-time for diagnosis, diabetes may have an irreversible influence on patients’ condition. Fatal complications such as heart attacks, strokes, and kidney damage can result as the disease progresses.

Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers

Yet, these complications would be easily controlled or even prevented in some cases with early detection and treatment of diabetes. Medical researchers have proposed various data analytics solutions for the provision of timely diagnosis and improvement of disease knowledge.

A medical firm GoodHealth proposed association analysis. It collected a dataset (“diabetes_symptoms.csv”) that records the demographics of diabetic patients as well as their diabetes-related symptoms. The description of the dataset is listed in Table 1.

On the other hand, another medical firm PerfectCare proposed cluster analysis. It collected a dataset (“diabetes_diagnosis.csv”) that contains the diagnostic measurements of 768 patients who may or may not be diabetic (as indicated in the field Outcome). The description of the dataset is listed in Table 2

Table 1. Description of the dataset “diabetes_symptoms.csv”

Category Field Values
Demographics Age Continuous
Gender Male / Female
Symptom Polyuria Yes (Presence) / No (Absence)
Polydipsia Yes (Presence) / No (Absence)
Sudden weight loss Yes (Presence) / No (Absence)
Weakness Yes (Presence) / No (Absence)
Polyphagia Yes (Presence) / No (Absence)
Genital thrush Yes (Presence) / No (Absence)
Visual blurring Yes (Presence) / No (Absence)
Itching Yes (Presence) / No (Absence)
Irritability Yes (Presence) / No (Absence)
Delayed healing Yes (Presence) / No (Absence)
Partial paresis Yes (Presence) / No (Absence)
Muscle stiffness Yes (Presence) / No (Absence)
Alopecia Yes (Presence) / No (Absence)
Obesity Yes (Presence) / No (Absence)

Table 2. The description of the dataset “diabetes_diagnosis.csv”

Field Values
Pregnancies Continuous
Glucose Continuous
Blood Pressure Continuous
Skin Thickness Continuous
Insulin Continuous
BMI Continuous
Diabetes Pedigree Function Continuous
Age Continuous
Outcome 0 (Negative, i.e. non-diabetic) / 1 (Positive, i.e. diabetic)

a. Both association analysis and cluster analysis can generate value-added information for improving healthcare services for diabetic patients. Yet, they analyze the data differently, thus the insights obtained from them are also different.

(i). Based on the dataset collected by good health, identify one (1) data mining objective that can be achieved by using association analysis. Discuss how the healthcare sector can benefit from this application of association analysis.

(ii). Based on the dataset collected by PerfectCare, identify one (1) data mining objective that can be achieved by using cluster analysis. Discuss how the healthcare sector can benefit from this application of cluster analysis.

(b) As good health is interested in symptoms that patients have, rather than those they do not have, only the presence of symptoms is considered in the association analysis. Using the dataset collected by good health, execute the Apriori algorithm in IBM SPSS Modeler with the minimum antecedent and confidence thresholds to be 20% and 85%, respectively. Examine the association rules from the output and select one (1) association rule to explain its meaning in terms of support, confidence, and rule support. Provide a
the screenshot that can show clearly the values of support, confidence, and rule support of your selected rule.

(c) A medical expert suggested that age could be a risk factor for diabetes. It would be interesting to find out how age above 40 associates with the presence of diabetes-related symptoms. To prepare the data, derive a new field that indicates whether the patient’s age is above 40. Then, revise your association analysis in (b) (i.e., the minimum antecedent and confidence thresholds remain to be 20% and 85%, respectively) by including the
newly derived field. In your answer, explain, with screenshot(s), how you prepare the new field for association analysis in IBM SPSS Modeler. In addition, provide a screenshot that can show the total number of association rules obtained.

(d) GoodHealth tried to include the flag field “Gender” in the association analysis that considers only true values for flags. However, it is observed that there is a problem with the rules obtained. Identify the problem in this case and suggest a method to solve the problem. You are not required to generate any association analysis results from IBM SPSS Modeler for this question.

(e) Using the dataset collected by PerfectCare, only continuous fields are selected as the clustering criteria. PerfectCare thinks that the clustering model is useful when it can identify particular group(s) of patients who are more likely to be diabetic. Assuming that there is no compelling reason to omit outliers from the cluster analysis, construct a means model with two clusters. Describe the profile of each cluster and assess if the model is useful.
In your answer, provide screenshots of your model (showing Model Summary, Cluster Quality, and Cluster Comparison) and any analytics results that you have generated to support your assessment.

Buy Custom Answer of This Assessment & Raise Your Grades

Get Help By Expert

Looking for a reliable ANL303: Fundamentals of Data Mining Assignment Help then request us to "write my assignment". We have a group of data management assignment experts who are knowledgeable in their area of big data analytic, data programming, and data management Assignment. Our Assignment experts work 24 hours to support you on your complicated assignment and provide you error-free solution at a very low-cost price.

Answer

Looking for Plagiarism free Answers for your college/ university Assignments.

Ask Your Homework Today!

We have over 1000 academic writers ready and waiting to help you achieve academic success