see here for canvas stuff
see here for Calvins notes (this should be transcried w/ OCR later..)

Left off on: stats_quiz 4 Tables and Probability & dpqr2

Some basic variable defitions (maths)

In say

X \sim N (μ = 2, σ = 5)

Expected and varience

E(X) = expected value. Always = to $μ = 2$

V(X) = Varience. Always = to $σ^{2} = 5^{2} = 25$

Directly plugin these values

E(X) is always mu and varience is always sigma^2?

z-score: How far a data pt is frm mean z-score (standard score) aka z-transformation

read/filter/table

Read in csv, filter some cols, generate table, find probility

# read in csv
file = read.csv("filename")
 
# short reminder filtering data : 
file[file$LENGTH > 50 & file$SPECIES == "CCATFISH",]
 
# we can also use filter for sm other things: 
ddt %>% filter(LENGTH > 50 & SPECIES == "CCATFISH")# Todo this should have a better example
 
#if we want to make a table:
tab <- with(ddt,table(SPECIES,RIVER))#with just means !need 2 
addmargins(tab)

Tables w/ OR, AND, GIVEN

stats_quiz 4 Tables and Probability

# given MTBE dataset load into table:
tb = table(MTBE$col1, MTBE$col2)
addmargins(tb)
tb
# outputs below..
          Below Limit Detect Sum
  Private          81     22 103
  Public           72     48 120
  Sum             153     70 223

$P (P r i v a t e ∣ De t ec t)$

22/70 #note its js sum of detect..

$P (P u b l i c \cap B e l o w_L im)$

72/223 # intersection over total

$P (De t ec t)$

70/223

Filtering qns

Sources:

#How many fish have a WEIGHT strictly between 1000 and 1600 and are of the LMBASS SPECIES? Use dplyr!
ddt %>% filter(WEIGHT > 1000 & WEIGHT < 1600 & SPECIES == "LMBASS")
 
# How many fish have a WEIGHT larger than 1600? Use "[]"!
d = ddt[ddt$WEIGHT > 1600,]

Outliers: 3 types (extreme,mild,all) IMPT using 3xIQR

#Boxplot method using 3x IQR
b1 = boxplot(ddt$DDT, range = 3) # Extreme outliers
length(b1$out)
[1] 12 
 
b2 = boxplot(ddt$DDT, range = 1.5) #All outliers
setdiff(b2$out, b1$out) # Only mild outliers
[1] 28 31 33

Stats_Quiz 3 boxplot

Outliers using z-score method

We use the keyword scale() on an array to generate the z-score of the array. We then use the

# z-score method..
df <- ddt %>% 
filter(SPECIES == "CCATFISH") %>% # create subset w/ only catfish
mutate(z = scale(DDT)) #compute z-scores subarray
 
sum(abs(df$z) > 3) #why 3? TODO./.

z-score (standard score) aka z-transformation 2) Find all possible DDT outliers. Submit the number of fish classified as possible outliers using boxplot()!

dpqr

Stats_mix_problems ch1-6

Formal defition (not too impt)

density:
- binom/poisson = prob getting certain value
- norm = prob over certain range (probability of getting a value between 0.9 and 1.1?)
probability: total probability of getting a value $\leq$ certain point (aka AUC up till x-value)
quantile: inverse of p, xth % gives y value. What x-value has 97.5% of the data below it
random: generates random sample,

Table condensed of Discrete and cont. probilities:

Sm basic defs; upper tail = $X > ∣∣ \geq #$ , lower tail OPA.

Discrete (bar chart, countable)
Cont. (AuC)

Condensed table:

$P (X < k)?$ p function,
$P (X > k)?$ 1-p..
$\leq ∣∣ \geq ?$ k-1 (first arg -1)
$P (X = k)$ : $d$ function

Probability you want	R expression
$P (X \leq k)$ (lower tail)	`pbinom(k, n, p)`
$P (X < k)$	`pbinom(k-1, n, p)`
$P (X \geq k)$	`1 - pbinom(k-1, n, p)`
$P (X > k)$	`1 - pbinom(k, n, p)`
$P (X = k)$	`dbinom(k, n, p)`
$P (k < X \leq m)$	`pbinom(m, n, p) - pbinom(k, n, p)`

Distribution types

Normal distribution (bell curve)
Binom: # sucess in fixed # trials
Geo: # trials till first sucess
Poisson: # events happening in x interval
hyper: successes w/o replacement (card draw ! go back into deck)

Distribution	PMF/PDF (`d*`)	CDF (`p*`)	Quantile (`q*`)	Random (`r*`)
Normal	`dnorm`	`pnorm`	`qnorm`	`rnorm`
Binomial	`dbinom`	`pbinom`	`qbinom`	`rbinom`
Geometric	`dgeom`	`pgeom`	`qgeom`	`rgeom`
Hypergeometric	`dhyper`	`phyper`	`qhyper`	`rhyper`
Poisson	`dpois`	`ppois`	`qpois`	`rpois`

Train type problem -m dpqr

canvas

The problem:

f (y) = {\frac{c}{500} (25 - y^{2}) 0 if - 5 < y < 5 elsewhere Solving for c a = \frac{c}{500} a \int_{- 5}^{5} 25 - y^{2} d y = 1 a [25 y - \frac{y ^{3}}{3}]_{- 5}^{5} = 1 a (\frac{500}{3} + C) = 1 \frac{c}{500} (\frac{500}{3}) = 1 \frac{c}{3} = 1, c = 3

Generating own dpqr

dtrain <- function(x){
  # -5,5 is from \int bounds, its a triangle we're finding so we use .2 for base, .04 for slope
ifelse(x > -5 & x < 5, 0.2 - 0.04 * abs(x), 0)
}
 
ptrain <- function(q){
ifelse(q <= -5,
         0,  # Case 1: Left triangle
         ifelse(q <= 0,
                0.02 * (q + 5)^2,  # Case 2:left of slope
                ifelse(q < 5,
                       0.5 + 0.2 * q - 0.02 * q^2, # Case 3:  right slope
                       1))) # Case 4:right of the triangle
}
 
qtrain <- function(p){
  myroot <- function(p) {
     k <- function(x){
     p -  1/500*(75*x - x^3 + 250)
  }
    l <- stats::uniroot(f = k, interval = c(-5, 5))
    l$root
  }
}
#Doubt this will be tested
rtrain <- function(n){
  r <- runif(n, min = 0, max = 1)  
  qtrain(r)
}

dpqr code examples

1) Y ~ Bin(n = 10, p = 0.4). $P (Y \geq 8)$

1- pbinom(8-1,10,.4)
#just following the table we saw

2) X ~ Pois(lambda = 5). $P (3 < x \leq 8)$

ppois(8,5) - ppois(3,5) #no need for 1-x since we're alrdy doing that here

Bayes Testing problem (drug testing)

{1-user} $\cdot$ user ------- P(positive | user) $\cdot$ user + (1-tru neg) $\cdot$ truNeg

Just plug in:

$P (User) = 0.05$

$P (Non-user) = 0.95$ aka 1-P(user)

$P (Positive ∣ User) = 0.95$

$P (Positive ∣ Non-user) = 1 - 0.87 = 0.13$

$P (User ∣ Positive) = \frac{0.95 \cdot 0.05}{0.95 \cdot 0.05 + 0.13 \cdot 0.95}$

Event	Notation	Value	Description from Problem
Prior Prob. (User)	$P (U)$	$0.05$	5% of people actually use cannabis.
Prior Prob. (Non-user)	$P (N)$	$1 - 0.05 = 0.95$	$1 - P (U)$
Sensitivity (True Positive Rate)	$P (P ∣ U)$	$0.95$	$P (Positive ∣ User)$
Specificity (True Negative Rate)	$P (N ∣ N)$	$0.87$	$P (Negative ∣ Non-user)$
False Positive Rate	$P (P ∣ N)$	$1 - 0.87 = 0.13$	$P (Positive ∣ Non-user)$

P (U ∣ P) = \frac{P ( P ∣ U ) \cdot P ( U )}{P ( P ∣ U ) \cdot P ( U ) + P ( P ∣ N ) \cdot P ( N )} P (U ∣ P) = \frac{0.95 \cdot 0.05}{( 0.95 \cdot 0.05 ) + ( 0.13 \cdot 0.95 )}

nonUser * user / { P()}

See here for the acutal therom: 8. Bayes’ Theorem
See here if we ran again

Drug test problem

$P (A ∣ B) = \frac{P ( A ) P ( B ∣ A )}{\sum P ( A ) P ( B ∣ A )}$
A particular test for whether someone has been using cannabis is 95% sensitive and 87% specific, meaning it leads to 95% true “positive” results (meaning, “Yes he used cannabis”) for cannabis users and 87% true negative results for non-users. Assuming 5% of people actually do use cannabis, what is the probability that a random person who tests positive is really a cannabis user?

A: The person is a cannabis user.
$A^{c}$ (or $A^{'}$ ): The person is a non-user (i.e., not a cannabis user).
B: The test result is positive.
$B^{c}$ (or $B^{'}$ ): The test result is negative.

Birthday problem

Birthday Problem

Central formula: n= # of people share 2 bdays

\frac{365 !}{( 365 - n )! 36 5 ^{n}}

birthday <- function(k){
  1 - exp(lchoose(365,k) + lfactorial(k) - k*log(365))
}

w-F theory: Wright-Fisher model

MGF & MOM

Moment Generating Functions

Estimating unknown parameters of a probability distribution using sample data. Given $X \sim Bern (p)$ We use the general formula: $μ_{k}^{'} = \frac{d ^{k} M _{X} ( t )}{d t ^{k}}_{t = 0}$ to find the k’th moment. taking the $k$ -th derivative of the MGF with respect to $t$ and then plugging in $t = 0$ .

For example if our MGF is $M_{X} (t) = q + p e^{t}$ Then we would take the first derative $\frac{d}{d t} (q + p e^{t}) = 0 + p e^{t}$ Eval at t=0 $\frac{d M _{X} ( t )}{d t}_{t = 0} = p e^{0} = p (1) = p$ mean $μ_{X}$ is $p$ which is Parameter (Probability of Success)

Z-score + emnpirical

z-score (standard score) aka z-transformation

T.test Samples

t.test(x,y, 
       var.equal = TRUE, #equal variances ? (default false)
       conf.level = 0.80 #confidence interval
       )
...

Take line below 95% conf interval, L = Left most value OPA

t_test_one_and_two_sample_Stats_Quiz

without t.test we need to do:

y <- c(3,4,5) #sample dataset
a = 0.2 #define alpha
n = length(y) #sample size
t <- qt(1-a/2,n-1) # critical t-value
 
mp = c(-1,1) #jsut to get both ends..
mean(y)+mp*t*sd(y)/sqrt(n) #final conf interval

Linear Combinations in Expected and Varience

Finding E(#) plug mu in
V(#) drop +b square entire thing (we always have to square all terms in V(#))!
- if iid directly plug $σ^{2}$ into entire expr ^2, stats_linear_combo p2 Y=aX+b,L$

Arika's Notes

Explorer

stats_Midterm_review

Some basic variable defitions (maths)

read/filter/table

Tables w/ OR, AND, GIVEN

Filtering qns

Outliers: 3 types (extreme,mild,all) IMPT using 3xIQR

Outliers using z-score method

dpqr

Formal defition (not too impt)

Table condensed of Discrete and cont. probilities:

Distribution types

Train type problem -m dpqr

Generating own dpqr

dpqr code examples

1) Y ~ Bin(n = 10, p = 0.4). $P (Y \geq 8)$

2) X ~ Pois(lambda = 5). $P (3 < x \leq 8)$

Bayes Testing problem (drug testing)

Birthday problem

w-F theory: Wright-Fisher model

MGF & MOM

Moment Generating Functions

Z-score + emnpirical

T.test Samples

Linear Combinations in Expected and Varience

Graph View

Table of Contents

Backlinks

Arika's Notes

Explorer

stats_Midterm_review

Some basic variable defitions (maths)

read/filter/table

Tables w/ OR, AND, GIVEN

Filtering qns

Outliers: 3 types (extreme,mild,all) IMPT using 3xIQR

Outliers using z-score method

dpqr

Formal defition (not too impt)

Table condensed of Discrete and cont. probilities:

Distribution types

Train type problem -m dpqr

Generating own dpqr

dpqr code examples

1) Y ~ Bin(n = 10, p = 0.4). P(Y≥8)

2) X ~ Pois(lambda = 5). P(3<x≤8)

Bayes Testing problem (drug testing)

Birthday problem

w-F theory: Wright-Fisher model

MGF & MOM

Moment Generating Functions

Z-score + emnpirical

T.test Samples

Linear Combinations in Expected and Varience

Graph View

Table of Contents

Backlinks

1) Y ~ Bin(n = 10, p = 0.4). $P (Y \geq 8)$

2) X ~ Pois(lambda = 5). $P (3 < x \leq 8)$