Left off on: stats_quiz 4 Tables and Probability & dpqr2
Some basic variable defitions (maths)
In say
Expected and varience
- E(X) = expected value. Always = to
- V(X) = Varience. Always = to
- Directly plugin these values
E(X) is always mu and varience is always sigma^2?
z-score: How far a data pt is frm mean z-score (standard score) aka z-transformation
read/filter/table
Read in csv, filter some cols, generate table, find probility
# read in csv
file = read.csv("filename")
# short reminder filtering data :
file[file$LENGTH > 50 & file$SPECIES == "CCATFISH",]
# we can also use filter for sm other things:
ddt %>% filter(LENGTH > 50 & SPECIES == "CCATFISH")# Todo this should have a better example
#if we want to make a table:
tab <- with(ddt,table(SPECIES,RIVER))#with just means !need 2
addmargins(tab)Tables w/ OR, AND, GIVEN
stats_quiz 4 Tables and Probability
# given MTBE dataset load into table:
tb = table(MTBE$col1, MTBE$col2)
addmargins(tb)
tb
# outputs below..
Below Limit Detect Sum
Private 81 22 103
Public 72 48 120
Sum 153 70 223
22/70 #note its js sum of detect..
72/223 # intersection over total
70/223Filtering qns
Sources:
#How many fish have a WEIGHT strictly between 1000 and 1600 and are of the LMBASS SPECIES? Use dplyr!
ddt %>% filter(WEIGHT > 1000 & WEIGHT < 1600 & SPECIES == "LMBASS")
# How many fish have a WEIGHT larger than 1600? Use "[]"!
d = ddt[ddt$WEIGHT > 1600,]Outliers: 3 types (extreme,mild,all) IMPT using 3xIQR
#Boxplot method using 3x IQR
b1 = boxplot(ddt$DDT, range = 3) # Extreme outliers
length(b1$out)
[1] 12
b2 = boxplot(ddt$DDT, range = 1.5) #All outliers
setdiff(b2$out, b1$out) # Only mild outliers
[1] 28 31 33Outliers using z-score method
We use the keyword scale() on an array to generate the z-score of the array. We then use the
# z-score method..
df <- ddt %>%
filter(SPECIES == "CCATFISH") %>% # create subset w/ only catfish
mutate(z = scale(DDT)) #compute z-scores subarray
sum(abs(df$z) > 3) #why 3? TODO./.z-score (standard score) aka z-transformation 2) Find all possible DDT outliers. Submit the number of fish classified as possible outliers using boxplot()!
dpqr
Formal defition (not too impt)
density:- binom/poisson = prob getting certain value
- norm = prob over certain range (probability of getting a value between 0.9 and 1.1?)
probability: total probability of getting a value certain point (aka AUC up till x-value)quantile: inverse ofp, xth % gives y value. What x-value has 97.5% of the data below itrandom: generates random sample,
Table condensed of Discrete and cont. probilities:
Sm basic defs; upper tail = , lower tail OPA.
- Discrete (bar chart, countable)
- Cont. (AuC)
Condensed table:
-
pfunction, -
1-p.. - k-1 (first arg -1)
- : function
| Probability you want | R expression |
|---|---|
| (lower tail) | pbinom(k, n, p) |
pbinom(k-1, n, p) | |
1 - pbinom(k-1, n, p) | |
1 - pbinom(k, n, p) | |
dbinom(k, n, p) | |
pbinom(m, n, p) - pbinom(k, n, p) |
Distribution types
- Normal distribution (bell curve)
- Binom: # sucess in fixed # trials
- Geo: # trials till first sucess
- Poisson: # events happening in x interval
- hyper: successes w/o replacement (card draw ! go back into deck)
| Distribution | PMF/PDF (d*) | CDF (p*) | Quantile (q*) | Random (r*) |
|---|---|---|---|---|
| Normal | dnorm | pnorm | qnorm | rnorm |
| Binomial | dbinom | pbinom | qbinom | rbinom |
| Geometric | dgeom | pgeom | qgeom | rgeom |
| Hypergeometric | dhyper | phyper | qhyper | rhyper |
| Poisson | dpois | ppois | qpois | rpois |
Train type problem -m dpqr
The problem:
Generating own dpqr
dtrain <- function(x){
# -5,5 is from \int bounds, its a triangle we're finding so we use .2 for base, .04 for slope
ifelse(x > -5 & x < 5, 0.2 - 0.04 * abs(x), 0)
}
ptrain <- function(q){
ifelse(q <= -5,
0, # Case 1: Left triangle
ifelse(q <= 0,
0.02 * (q + 5)^2, # Case 2:left of slope
ifelse(q < 5,
0.5 + 0.2 * q - 0.02 * q^2, # Case 3: right slope
1))) # Case 4:right of the triangle
}
qtrain <- function(p){
myroot <- function(p) {
k <- function(x){
p - 1/500*(75*x - x^3 + 250)
}
l <- stats::uniroot(f = k, interval = c(-5, 5))
l$root
}
}
#Doubt this will be tested
rtrain <- function(n){
r <- runif(n, min = 0, max = 1)
qtrain(r)
}dpqr code examples
1) Y ~ Bin(n = 10, p = 0.4).
1- pbinom(8-1,10,.4)
#just following the table we saw2) X ~ Pois(lambda = 5).
ppois(8,5) - ppois(3,5) #no need for 1-x since we're alrdy doing that hereBayes Testing problem (drug testing)
{1-user} user
-------
P(positive | user) user + (1-tru neg) truNeg
Just plug in:
- aka 1-P(user)
| Event | Notation | Value | Description from Problem |
|---|---|---|---|
| Prior Prob. (User) | 5% of people actually use cannabis. | ||
| Prior Prob. (Non-user) | |||
| Sensitivity (True Positive Rate) | \text{U})$ | ||
| Specificity (True Negative Rate) | \text{N})$ | ||
| False Positive Rate | \text{N})$ |
nonUser * user / { P()}
- See here for the acutal therom: 8. Bayes’ Theorem
- See here if we ran again
Drug test problem
A particular test for whether someone has been using cannabis is 95% sensitive and 87% specific, meaning it leads to 95% true “positive” results (meaning, “Yes he used cannabis”) for cannabis users and 87% true negative results for non-users. Assuming 5% of people actually do use cannabis, what is the probability that a random person who tests positive is really a cannabis user?
- A: The person is a cannabis user.
- (or ): The person is a non-user (i.e., not a cannabis user).
- B: The test result is positive.
- (or ): The test result is negative.
Birthday problem
Central formula: n= # of people share 2 bdays
birthday <- function(k){
1 - exp(lchoose(365,k) + lfactorial(k) - k*log(365))
}w-F theory: Wright-Fisher model
MGF & MOM
Moment Generating Functions
Estimating unknown parameters of a probability distribution using sample data. Given We use the general formula: to find the k’th moment. taking the -th derivative of the MGF with respect to and then plugging in .
For example if our MGF is Then we would take the first derative Eval at t=0 mean is which is Parameter (Probability of Success)

Z-score + emnpirical
z-score (standard score) aka z-transformation
T.test Samples
t.test(x,y,
var.equal = TRUE, #equal variances ? (default false)
conf.level = 0.80 #confidence interval
)
...- Take line below 95% conf interval, L = Left most value OPA
t_test_one_and_two_sample_Stats_Quiz
without t.test we need to do:
y <- c(3,4,5) #sample dataset
a = 0.2 #define alpha
n = length(y) #sample size
t <- qt(1-a/2,n-1) # critical t-value
mp = c(-1,1) #jsut to get both ends..
mean(y)+mp*t*sd(y)/sqrt(n) #final conf intervalLinear Combinations in Expected and Varience
- Finding E(#) plug mu in
- V(#) drop +b square entire thing (we always have to square all terms in V(#))!
- if iid directly plug into entire expr ^2, stats_linear_combo p2 Y=aX+b,L$