Analyzing Data: Hypothesis Testing, Confidence Intervals, and Correlations

mu <- 70

> xbar <- 69.1
> sigma <- 3.5
> n <- 49
> 
> z <- (xbar - mu) / (sigma/sqrt(n))
> p <- pnorm(z)   # one-tailed
> z
[1] -1.8
> p
[1] 0.03593032
> 
> 
> 
> xbar <- 85
> sigma <- 8
> n <- 64
> z_alpha <- 1.96
> 
> se <- sigma/sqrt(n)
> lower <- xbar - z_alpha*se
> upper <- xbar + z_alpha*se
> c(lower, upper)
[1] 83.04 86.96
> 
> 
> 
> 
> # Girls data
> girls_goals <- c(4, 5, 6)
> girls_grades <- c(49, 50, 69)
> girls_popular <- c(24, 36, 38)
> girls_time <- c(19, 22, 28)
> 
> # Boys data
> boys_goals <- c(4, 5, 6)
> boys_grades <- c(46.1, 54.2, 67.7)
> boys_popular <- c(26.9, 31.6, 39.5)
> boys_time <- c(18.9, 22.2, 27.8)
> 
> # Create dataframe
> df <- data.frame(girls_goals, girls_grades, girls_popular, girls_time,
+                  boys_goals, boys_grades, boys_popular, boys_time)
> 
> # Correlations
> cor(df)                         
              girls_goals girls_grades girls_popular girls_time boys_goals boys_grades boys_popular boys_time
girls_goals     1.0000000    0.8873565     0.9244735  0.9819805  1.0000000   0.9897433    0.9894203 0.9890517
girls_grades    0.8873565    1.0000000     0.6445509  0.9585035  0.8873565   0.9441243    0.9448614 0.9456833
girls_popular   0.9244735    0.6445509     1.0000000  0.8357661  0.9244735   0.8605276    0.8593826 0.8580918
girls_time      0.9819805    0.9585035     0.8357661  1.0000000  0.9819805   0.9989061    0.9990085 0.9991175
boys_goals      1.0000000    0.8873565     0.9244735  0.9819805  1.0000000   0.9897433    0.9894203 0.9890517
boys_grades     0.9897433    0.9441243     0.8605276  0.9989061  0.9897433   1.0000000    0.9999975 0.9999887
boys_popular    0.9894203    0.9448614     0.8593826  0.9990085  0.9894203   0.9999975    1.0000000 0.9999968
boys_time       0.9890517    0.9456833     0.8580918  0.9991175  0.9890517   0.9999887    0.9999968 1.0000000
> cor(df, method="pearson")       
              girls_goals girls_grades girls_popular girls_time boys_goals boys_grades boys_popular boys_time
girls_goals     1.0000000    0.8873565     0.9244735  0.9819805  1.0000000   0.9897433    0.9894203 0.9890517
girls_grades    0.8873565    1.0000000     0.6445509  0.9585035  0.8873565   0.9441243    0.9448614 0.9456833
girls_popular   0.9244735    0.6445509     1.0000000  0.8357661  0.9244735   0.8605276    0.8593826 0.8580918
girls_time      0.9819805    0.9585035     0.8357661  1.0000000  0.9819805   0.9989061    0.9990085 0.9991175
boys_goals      1.0000000    0.8873565     0.9244735  0.9819805  1.0000000   0.9897433    0.9894203 0.9890517
boys_grades     0.9897433    0.9441243     0.8605276  0.9989061  0.9897433   1.0000000    0.9999975 0.9999887
boys_popular    0.9894203    0.9448614     0.8593826  0.9990085  0.9894203   0.9999975    1.0000000 0.9999968
boys_time       0.9890517    0.9456833     0.8580918  0.9991175  0.9890517   0.9999887    0.9999968 1.0000000
> cor(df, method="spearman")      
              girls_goals girls_grades girls_popular girls_time boys_goals boys_grades boys_popular boys_time
girls_goals             1            1             1          1          1           1            1         1
girls_grades            1            1             1          1          1           1            1         1
girls_popular           1            1             1          1          1           1            1         1
girls_time              1            1             1          1          1           1            1         1
boys_goals              1            1             1          1          1           1            1         1
boys_grades             1            1             1          1          1           1            1         1
boys_popular            1            1             1          1          1           1            1         1
boys_time               1            1             1          1          1           1            1         1
> 
> # Scatterplots
> pairs(df, main="Scatterplot Matrix")
> 
> # Correlogram
> install.packages("corrgram")




This assignment investigates whether a new cookie machine meets the manufacturer’s specifications using hypothesis testing and p-values, showing that in some scenarios the machine may fall short. I also calculated a 95% confidence interval to estimate the population mean for another dataset, demonstrating how sample data can predict population parameters. Finally, correlation analysis compared girls and boys goals, grades, popularity, and time spent on assignments, revealing very strong positive relationships across all variables, which I illustrated with scatterplots and a correlogram.



Comments

Popular posts from this blog

Descriptive Statistics: Comparing Two Data Sets in R

The Art of Programming Assignment

Understanding Regression Models: Predicting, Analyzing, and Interpreting Data in R