Posts

Netflix Data Final Analysis: Duration, Release Year, and Growth Trends

Image
Introduction            The purpose of this project is to examine patterns in Netflix content using a dataset of 8,807 titles. This analysis focuses on three main questions: whether movies and TV shows are released in different average years, whether movie duration varies across rating categories such as G, PG-13, and TV-MA, and whether the number of titles added to Netflix has increased over time. Together, these questions help explore how Netflix’s content has evolved, how rating categories relate to the length of films, and the overall growth of Netflix’s library in recent years.           This project uses the same inferential techniques practiced in class, including two-sample t-tests to compare group means, ANOVA to test for differences across multiple categories, and linear regression to examine trends over time. Together, these statistical methods allow us to draw evidence-based co...

A Time Series Analysis of Student Credit Card Charges (2012–2013)

Image
 The time series plot shows student credit card charges from January 2012 to December 2013. Charges generally increased over the two years, with noticeable rises during the summer and at the end of each year, likely reflecting holiday periods and the start of the school year. This suggests students tend to spend more during these predictable seasonal peaks. The Exponential Smoothing (ETS) model was applied to the combined data and selected ETS(M,N,N), which means a multiplicative error with no trend and no seasonality. The smoothing parameter α = 0.8635 indicates that the model gives relatively high weight to recent observations. The initial level (l) is 30.9107, and the model’s sigma (standard deviation of the residuals) is 0.1354. The fitted values closely follow the actual charges, showing that the ETS model effectively captures the upward trend and smooths out month-to-month fluctuations. Overall, the time series plot and ETS model together reveal a consistent increase in s...

Additive Models vs. Paired t-Tests: Insights from the Ashina Data

Image
  The ANOVA results show that treatment has a highly significant effect on pain scores ( F (1,15) = 10.41, p = 0.0056), indicating that the active treatment significantly reduced pain compared to the placebo. The period effect is also statistically significant ( F (1,15) = 5.15, p = 0.038), suggesting that the order or timing of treatment sessions slightly influenced pain levels. The subject effect shows marginal significance ( p ≈ 0.069), which makes sense because each individual patient has different baseline pain responses. Comparison with Paired t -Test When comparing these results to the paired t-test , the conclusion is consistent, both methods indicate that the active treatment is significantly more effective than the placebo. However, the additive model is more informative because it also adjusts for differences between subjects and periods, giving a more accurate estimate of the treatment effect. Summary   The additive model analysis of the ashina d...

Exploring the Cystfibr Dataset: What Affects Lung Performance?

Image
  For this analysis, I used the cystfibr dataset from the ISwR package in R to explore what factors influence maximum expiratory pressure (pemax) in cystic fibrosis patients. I focused on four predictors: age , weight , bmp (body mass percentile), and fev1 (lung function). I ran a multiple linear regression: model <- lm ( pemax ~ age + weight + bmp + fev1 , data = cystfibr ) summary ( model ) Key Results Intercept: 179.30 represents the baseline pemax when all predictors are zero. Age (-3.42, p = 0.31): Older age slightly decreases pemax, but it’s not statistically significant. Weight (2.69, p = 0.033): Higher weight is associated with higher pemax. BMP (-2.07, p = 0.020): Surprisingly, a higher body mass percentile slightly lowers pemax in this dataset.  FEV1 (1.09, p = 0.047): Better lung function increases pemax, as expected. The model explains about 59% of the variation in pemax (R² = 0.59), and the overall F-test confirms that the mod...

Exploring Data Frames and Tables in R

Image
  In this assignment, I made a simple data frame in R with details like country, age, salary, and whether someone made a purchase. Then I used the built-in mtcars dataset to create a table showing how car gears and cylinders relate to each other. I added totals using addmargins() and found both overall and row proportions with prop.table() . Doing this helped me see how R makes it easy to organize, compare, and understand data.

Exploring Stress Levels, Drug Effects, and Reaction Time with ANOVA in R

Image
For this study, I wanted to see whether stress level affects how a drug influences reaction time. I used three groups: high stress, moderate stress, and low stress. Each group’s reaction times were recorded, and I ran a one-way ANOVA in R to check for differences between them. The results from the ANOVA showed something pretty interesting. There was a clear difference between the stress levels, the F-value was about 18.92 and the p-value was less than 0.05. That means we can reject the null hypothesis and say that stress levels have a significant effect on reaction time after taking the drug. In simpler terms, the drug didn’t affect everyone the same way, how stressed a person was made a noticeable difference. Next, I explored the zelazo dataset from the ISwR package, which includes four groups labeled active , passive , none , and ctr.8w . I converted the data into a format R could read easily and ran another one-way ANOVA to compare the groups. This time, the F-value came out ar...

Understanding Regression Models: Predicting, Analyzing, and Interpreting Data in R

Image
 In this assignment, I explored simple and multiple linear regression using different datasets in R. Regression basically helps us understand how one variable affects another, and it can also be used to make predictions. For the first example, I looked at a dataset with x as the predictor and y as the response. Using the linear model Y = α + β X + ϵ Y = \alpha + \beta X + \epsilon Y = α + βX + ϵ , I used R’s lm() function to find the intercept and slope. The slope told me how much y changes for each one-unit change in x , while the intercept gave the value of y when x = 0 . This simple model makes it easy to see the trend in the data and predict y for any given x . Next, I worked with the faithful dataset to predict eruption durations based on the waiting time since the last eruption. Again, the regression model helped me quantify the relationship. Using lm(eruptions ~ waiting, data=faithful) and the predict() function, I estimated the discharge duration for a waiting ti...