t tests — make sure you know every detail of these

including $v a r . e q u a l = TR U E / F A L SE$

When true: population variances are equal. Called Pooled $t$ -Test. Use on fail 2 rej null hypothesis

t.test(group1_data, group2_data, 
       var.equal = TRUE, 
       alternative = "two.sided")

When false Welch $t$ -Test. safer choice unless there is strong evidence for equal variances Use when reject null hypothesis from var.test. Same as above, but false.

NULL $H_{0}$ and Alternate hypotheses $H_{a}$

$H_{0}$ : Represents a statement of “no effect,” “no difference,” or “no relationship”
- always includes an equality sign ( $=$ , $\leq$ , or $\geq$ ).
$H_{a}$ what you’re trying to find.

t.test(Group\_A, Group\_B, alternative = "\text{less}")

Using data in R to create tests

Predictions

SLR:

$\hat{E} (Y ∣ x_{p}) = \hat{Y}_{p} = \hat{β}_{0} + \hat{β}_{1} x_{p}$ - expected value of the response variable $Y$ for a specific value p of the predictor variable $x$ - mean response of the population when the predictor variable is fixed at $x_{p}$ -

Least Squares estimates
- Find values for the slope ( $\hat{β}_{1}$ ) and the intercept ( $\hat{β}_{0}$ ) that minimize Sum of Squared Errors (SSE) use

a = lm(y_data ~ x_data)
summary(a)

ci confidence interval creation in r

# Find the 95% Confidence Intervals for the Intercept and Slope
confint(my_model, level = 0.95)

Use s20x package

MGF

Packages all raw moments of random variable $X$ into an expr. 2 ways to find $σ^{2}$ (vaience) and $μ$ (mean). To find $r$ -th raw moment, $E [X^{r}]$ take derative a@ 0 $M^{(r)} (0) = \frac{d ^{r}}{d t ^{r}} M (t)_{t = 0} = E [X^{r}]$

$μ = E [X] = M^{'} (0)$
Find $σ^{2}$ : calculated using the second and first raw moments: $Va r (X) = E [X^{2}] - (E [X])^{2} = M^{''} (0) - [M^{'} (0)]^{2}$

Identify distributions

Gaps

$L = a_{1} Y_{1} + a_{2} Y_{2} + \cdot \cdot \cdot + a_{n} Y_{n}$ (or $l = a_1 Y_1 + ... + a_4 Y_4$ stuff)

$E (L)$ (or $E (l)$ )
$V (L)$ (or $V (l)$ )

Power

Probability of correctly rejecting $H_{0}$ when $H_{a}$ true aka $Power = 1 - β$

power_result <- power.t.test(
  n = 30,             # Sample size
  delta = 1,          # The true difference in means (Effect Size)
  sd = 2,             # Population standard deviation
  sig.level = 0.05,   # Significance level (alpha)
  type = "one.sample",# Type of t-test
  alternative = "two.sided" # Hypothesis direction
)

Calculate probabilities in R

LSE and varience

make predictions using the estimated regression line

$\hat{β}_{1} = \frac{S S _{x y}}{S S _{xx}}$ LSE for slope
- Sample estimate of the slope, calculated by dividing the sample covariance of $X$ and $Y$ (scaled by $n - 1$ ) by the sample variance of $X$ (scaled by $n - 1$ ).
- defines the steepness of the estimated regression line
$V (\hat{β}_{0}) = σ^{2} (\frac{\sum _{i = 1}^{n} x _{i} ^{2}}{n S S _{XX}})$ Varience of LSE for intercept $B_{0}$
- calculate the Standard Error of $\hat{β}_{0}$ , which is then used in the $t$ -test for $H_{0} : β_{0} = 0$ and the CI for $β_{0}$ .

 my_model <- lm(y_data ~ x_data)
coefficients(my_model)
 
beta_0_hat <- coefficients(my_model)[1] # The Intercept
beta_1_hat <- coefficients(my_model)[2] # The Slope (x_data)

Show the estimators $\hat{θ}$ are unbiased

unbiased if its expected value is equal to the true population parameter $θ$ being estimated:

Find their variances etc

t-tests:

Can you derive results by hand (P-values, t values, cis)

Perform tests using the three methods: CI, P values, RAR

Setup: We’re already given the output from t.test, and need to confirm the rest of the things Testing for population mean $μ$ :

H_{0} : μ = 50 H_{a} : μ \neq = 50

Confidence interval:
- Reject $H_{0}$ if the null value ( $μ_{0}$ ) is not in CI

To get bounds:

estimate <- 7.9140
std_error <- 0.2471
t_critical <- 2.048
lower_bound <- estimate - t_critical * std_error
upper_bound <- estimate + t_critical * std_error
#which are our bounds..

P-value:
- Fail to Reject $H_{0}$ if the P-value $\geq α$ from t.test
Rejection/Acceptance Region (RAR):
- Fail to Reject $H_{0}$ if the calculated test statistic ( $t_{calc}$ ) falls into the acceptance region (i.e., $∣ t_{calc} ∣ \leq t_{crit}$ ).

# RAR example
alpha <- 0.05 df <- 28 # From the SLR summary example in the file 
t_critical <- qt(1 - alpha/2, df)
#then if t_calc is 3.0, and t_crit = 2, reject H_0

$t_{calc} = \frac{( Estimate ) - ( Null Value )}{Standard Error of the Estimate}$

When use a $v a r . t es t$ ?
- conduct two-sample independent $t$ -test

Estimation:

Max Lik: can you perform these calculations and proofs?
Bootstrap — do you understand the code?

# Basic Bootstrap for a mean in R
n <- length(my_data)
B <- 10000 # Number of resamples
results <- numeric(B)
 
for(i in 1:B) {
  # Resample with replacement
  resample <- sample(my_data, size = n, replace = TRUE)
  results[i] <- mean(resample)
}
 
# Find the Bootstrap Confidence Interval
quantile(results, c(0.025, 0.975))

Interval estimation: can you derive results that we did in class?

Expected Value $E (L)$ :
- $E (\sum Y_{i}) = \sum E (Y_{i})$
- For Exponential data, $E (Y_{i}) = 1/ λ$ , so $E (L) = n / λ$
Variance $V (L)$ :
- Assuming independence, $V (\sum Y_{i}) = \sum V (Y_{i})$
- For Exponential data, $V (Y_{i}) = 1/ λ^{2}$ , so $V (L) = n / λ^{2}$
Distribution:
- By the Central Limit Theorem, if $n$ is large, $L$ will be approximately Normal11.
- $L \sim N (\frac{n}{λ}, \frac{n}{λ ^{2}})$

Regression:

#use
a= lm()
summary(a)

$\hat{β}_{0}$ (Intercept) = 5.14…
$\hat{β}_{1}$ (Slope) = 7.91..
for SLR sample size = DOF +2 (30 in this case)
Residual Standard Error ( $\overset{σ}{^}$ ): 11.71
Multiple R-squared: 97.3% of the variation in $Y$ is explained by the linear relationship with $X$
F-statistic: p-value is very small, we reject the null hypothesis
- considered small if it is less than or equal to $α$
verify t-vales:
- dividing the Estimate by the Std. Error in x row.
Find t_crit:

t_crit <- qt(0.975, 28)# 28 = dfm 0.975  for 95% conf interval

Bootstrap and other R functions are shown in the labs

Be able to recognize options, default values and ellipses

All relevant Labs — especially tests, Max lik (lab 10) and labs 14-16 on regression

Arika's Notes

Explorer

Stats_cheat_sheet_fina

t tests — make sure you know every detail of these

NULL $H_{0}$ and Alternate hypotheses $H_{a}$

Using data in R to create tests

Predictions

SLR:

Use s20x package

MGF

Gaps

Power

Calculate probabilities in R

LSE and varience

Show the estimators $\hat{θ}$ are unbiased

Find their variances etc

t-tests:

Perform tests using the three methods: CI, P values, RAR

Estimation:

Regression:

Bootstrap and other R functions are shown in the labs

Be able to recognize options, default values and ellipses

All relevant Labs — especially tests, Max lik (lab 10) and labs 14-16 on regression

Graph View

Table of Contents

Backlinks

Arika's Notes

Explorer

Stats_cheat_sheet_fina

t tests — make sure you know every detail of these

NULL H0​ and Alternate hypotheses Ha​

Using data in R to create tests

Predictions

SLR:

Use s20x package

MGF

Gaps

Power

Calculate probabilities in R

LSE and varience

Show the estimators θ^ are unbiased

Find their variances etc

t-tests:

Perform tests using the three methods: CI, P values, RAR

Estimation:

Regression:

Bootstrap and other R functions are shown in the labs

Be able to recognize options, default values and ellipses

All relevant Labs — especially tests, Max lik (lab 10) and labs 14-16 on regression

Graph View

Table of Contents

Backlinks

NULL $H_{0}$ and Alternate hypotheses $H_{a}$

Show the estimators $\hat{θ}$ are unbiased