MTHH20 is a natural continuation of MTH219 and aims to equip students with more in-depth statistical knowledge. Students will be able to apply concepts and methods for analyzing data, carrying out hypothesis testing, estimation, drawing inferences. Correlation analysis, as well as regression techniques, are introduced including non-parametric statistics that can further help the student make intelligent decisions about real-world problems using advanced statistical models.

## TOA, TMA, GBA Assignment sample of Statistical Methods and Inference module SUSS Singapore

At the end of this course, Singaporean students will be able to learn Statistical Methods and Inference module with the help of the following learning outcomes:

### 1. Calculate statistical parameters from data.

Figure out the population and sample size (most students don’t need to know these; if anyone needs an example, let me know)

Calculate the mean, median, and standard deviation.

Find the highest and lowest values for each variable. Estimate outliers – usually 3+ standard deviations from the mean (A student should be able to do this calculation in their head).

Finally, we want to report whether we should reject or fail to reject the null hypothesis that all of our data points come from a single distribution. For simple univariate data sets: reject if p<0.01 || Z<1 || Z>3; fail otherwise. Rules for more than one variable: I’m not sure anyone will ever need this. The student should be able to communicate their understanding of the topic as well.

### 3. Determine the equation of the least squares linear regression line.

The equation of the least squares linear regression line is \(y = ax + b\) where \(a\) and \(b\) are constants to be determined.

This answer might make the math question more understandable. The slope(linear) is constant for all points on a line, so unless you’re looking at a point that’s not on a straight line, it doesn’t matter what values “a” and “b” equal. If you want to find these values, then take residuals from your fitted regression model and use them in this formula.

The following information needs to be included in the above answer: Determine the equation of the least squares linear regression line is \$\$y=ax+b\$\$ where \$\$a\$\$ and \$\$b\$\$ are constants to be determined. This answer is explained above; the slope(linear) is constant for all points on a line, so unless you’re looking at a point that’s not on a straight line, it doesn’t matter what values “a” and “b” equal.

### 4. Comment on the results of hypothesis tests.

Hypothesis tests can be described as a statistical process used to identify the probability of response error and assure accuracy in clinical trials. The results will show whether an observed difference or association is actually true, statistically significant, and unlikely to be due to chance alone (you’ll need a significance level). For example; A study was conducted on 100 participants with Parkinson’s who were given 40mg of Cognex for 30 days. Afterward, they found that there was no statistical interaction between Cognex and the PDD-melatonin combo.

### 5. Apply suitable hypothesis tests, non-parametric tests, or goodness-of-fit tests.

Non-parametric tests, or goodness-of-fit tests, are typically used for data that is not normally distributed. The Chi-Square test is the most popular non-parametric statistical test and can be applied to both categorical and continuous variables. This cannot be done with hypothesis tests or conventional statistical significance testing because the distributions of continuous variables differ so much between groups.

The Chi-Square test can tell you whether there’s a statistically significant difference in the classification of one variable (called an “event”) for two categories of another variable (called “sampling”).

This means that even if all other necessary experimental conditions are met, there will still arise differences between observational results which may well depend on chance alone.

It’s not possible to tell which tests are necessary without looking at your data, so you’ll need to use at least one of them. If you have no preference, I recommend the t-test.

If you’re using SPSS or other statistics software, it will automatically create these tests for you. After correlating the second variable (x) with the first variable (y) in your raw data table, there will be three types of tests that show up when correlating x and y: linear regression coefficients; correlation coefficients; and P-value (statistical significance).

### 6. Compute probability or expected frequency of an event.

Compute the probability of an event by subtracting the number of non-success events from the total number of outcomes and multiplying that result by 100 (1 – (number of failures ÷ total number of times you tested)).

It’s best to compute the expected frequency using a formula like this: P=x/n. Where x is the probability, n is any large population should your aforementioned situation occur repeatedly, and P equals x * n.

### 7. Use R to perform data analysis.

R is a programming language and software environment for statistical computing. With an easy-to-use interface, it has clean data structures to store data in vectors, matrices, or higher dimensions. It has built-in elegant graphics which contain a number of library routines to visualize the density of a probability distribution function around observed values.

With different features such as vectorized operations, integrated support for other languages like C, Java, and .NET languages (i.e., Python), transparent access to other statistics packages via the foreign library interface (e.g., stats4), and powerful graphics capability (particularly with SVG device output), R uniquely balances elegance and power through well-developed user interfaces while still being quite extensible using CLOS.

