# Stats Lab Homework using r

Stats Lab Homework using r.

I’m studying and need help with a Statistics question to help me learn.

### Save your time - order a paper!

Get your paper written from scratch within the tight deadline. Our service is a reliable solution to all your troubles. Place an order on any task and we will take care of it. You won’t have to worry about the quality and deadlines

Order Paper NowLab homework using r instructions below

LAB HOMEWORK INSTRUCTIONS

——————————————————————-

**#save your plots as an image file and upload separately (export > save as image)**

**#save them as png with filename your_name_(question number)_(histogram/plot)**

**#You are told that in Providence Rhode Island the height of women averages 64 inches (5’4″) with a standard deviation of 1.5 inches.**

**#Of the 178 thousand people in Providence Rhode Island, 52% are women.**

**#PART 1**

**#1.1) Simulate the population described above. Call the population variable popri**

**#1.2) Compute the mean and sd of this population.**

**#How does this mean and sd compare to the true mean (64) and sd (1.5)?**

**#1.3) Plot a density plot of this population. What does this distribution look like?**

**#PART 2**

**#2.1) Take a random sample of popri containing 10 people and call the variable sam10**

**#2.2) Compute the mean and sd of the sample sam10 **

**#2.3) Take a random sample of popri containing 1000 people and call the variable sam1000.**

**#2.4) Compute the mean and sd of the sample sam1000 **

**#2.5) Which is closer to the true population mean and sd; the sample with 10 people or the sample with 1000 people? Why?**

**#PART 3**

**#3.1) Create a matrix called samsri containing 200 random samples of 500 subjects in each sample from the population variable popri.**

**#HINT: Declare an empty matrix and then use a ‘for loop’ to fill it**

**#3.2) Create a vector called samsri200means that contains the means for each column (sample) from samsri.**

**#3.3)Plot a density plot of samsri200means (0.5 pt). What does this distribution look like?**

——————————————————————————————

HERE’S A LECTURE ON WHAT WE LEARNED IN CLASS FOR THE LAB HOMEWORK DOWN BELOW FOR YOUR REFERENCE

#Lab 4-Contents

#0. Review of Normal Probability Distribution Functions

#1. Simulating Populations using Random Variables

#2. Taking Samples from a Population: The sampling Distribution

#3. Programming in R: Using Loops

#4. Programming in R: The apply function

#5. Sampling Distribution of the Uniform Distribution

#——————————————————–

# 0. Review of Normal Probability Distribution Functions

#——————————————————–

#Last week we learned how to calculate:

#1) Probabilities from a Normal Distribution using pnorm(Z, mean, sd)

#Ex: What is the probability of a student getting a 75 or less on the exam

#2) Quantiles from a Normal Distribution using qnorm(Z, mean, sd)

#Ex: What score would a student have to achieve to be in the top 10% on the exam

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 0-1: What is the probability of a student

#getting a 75 or less on the exam given that

#the scores on the exam follow a normal distribution

#of mean 78, and standard deviation of 10?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

pnorm(75, 78, 10)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 0-2: What score would a student have to achieve

#to be in the top 10% on the exam given

#the scores on the exam follow a normal distribution of mean 78,

#and standard deviation of 10?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

qnorm(.9, 78, 10)

#——————————————————–

#1. Simulating Populations using Random Variables

#——————————————————–

#The difference between a population and a sample:

#Your sample is the group of individuals who participate

#in your study, and your population is the broader group

#of people to whom your results will apply.

#Therefore: “population” in statistics includes all members of a defined group.

#A part of the population is called a sample.

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

#Last week (Lab 3), we used the command rnorm()

#to create a variable with a normal distribution

#Random Normal variable: rnorm(n, mean, sd)

#NOTE: We can specify how large our population is,the mean and SD.

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#EXERCISE 1-1:

# Create a normally distributed population variable called pop

# that consists of 10,000 subjects with mean of 15 and sd of 2

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#set.seed(1)

pop = rnorm(n=10000, mean=15, sd=2)

#Let’s verify that this did what we wanted.

hist(pop); mean(pop); sd(pop)

#??????????????????????????????????????????????????????????????????????????????????#

#Thought Question 1: Why is your mean and sd for pop slightly different

#than mine OR rather, why is noones exactly a mean of 15 and sd of 2

#??????????????????????????????????????????????????????????????????????????????????#

#??????????????????????????????????????????????????????????????????????????????????#

# Now, let’s pretend this variable x is a population of

# undergrads + graduate students at USC who have ever used marijuana

#Thought Question 2: Considering that USC is ~20k students,

#In reality, could I actually collect this information from every USC

#student to form this distribution? What should I do instead?

#??????????????????????????????????????????????????????????????????????????????????#

#—————————————————————

#2. Taking Samples from a Population: The sampling Distribution

#—————————————————————

#All (most) research studies deal with samples

#We can take a sample from data we consider to be our population

#by using the sample() function in R

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Random sample:sample(x, size, replace=TRUE)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Example 1: Let’s pretend we are researchers.

#While ideally we would like to study the POPULATION of marijuana users

#at USC, we realize that we only have funding to ask 200 students.

#We can see what our data might look like if we take a sample from

#the population variable pop.

sam = sample(x=pop, size=200, replace=TRUE)

#Exercise 2-1: Calcualte the mean and SD of the sample sam.

#How do these results differ from the means

# and SDs in the pop variable?

mean(sam); sd(sam); hist(sam)

#Exercise 2-2:

# A) Create a random sample called sam20 from pop containing 20 subjects.

# B) Create a random sample called sam750 from pop containing 750 subjects.

# C) Compute the Means and SDs for sam20 and sam750. Create Histograms for both.

# D) How do the means from each sample compare to the true population mean of 15?

# E) Is there an association of the number of people in the sample with the magnitude of

# the difference from the populaton mean?

#A)

sam20=sample(pop, 20, replace=TRUE)

#B)

sam750=sample(pop, 750, replace=TRUE)

#C)

mean(sam20); mean(sam750)

sd(sam20); sd(sam750)

plot(density(sam20)); lines(density(sam750), col=”red”)

#D)

abs(15-mean(sam20)); abs(15-mean(sam750))

#E)

#As the number of people in the sample goes up,

#the mean becomes closer to the true population mean

#—————————————————————

#3. Programming in R: Using Loops

#—————————————————————

#In practice, as researchers we almost always have samples

#and NEVER really know the true population

#Simulating a population and taking samples from it can tell us something

#about how well a given estimator (mean, trimmed mean, median etc.)

#represents a distributions (eg. normal vs skewed)

#To begin to understand how taking samples can give us information about an estimator,

#we need to take MANY samples from our simulated population.

#Let’s say we wanted to have 100 different samples of pop with 200 subjects in each sample.

#We could do this two ways:

#1) Write the function sample() many times

sam1=sample(pop, 200, replace=TRUE)

sam2=sample(pop, 200, replace=TRUE)

# …

sam100=sample(pop, 200, replace=TRUE)

#2) Or use a loop

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Loop over ii: for (ii in X:Y) { # ii is the counter

#COMMANDS WITH ii # X is the first value the counter

#} # Y is the last value of the counter

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# x=23+24;x

#print(x)

#Here is a simple loop going from a value of 1 to 10

for (jj in 1:10) {

print(jj)

} #NOTE: When executing the loop,

#you MUST highlight and run the entire loop from { to } including the brackets.

# ii will take the values specified with the “in 1:10” argument

#At first ii = 1, but then will increase by 1 (1,2,3,4,5…) until it reaches 10

#Also we can change ii to be whatever we want.

#Below I’ve used my name to demonstrate this

for (kk in 1:15) {

print(kk)

}

#Back to our goal:

#We want to be able to take 100 different samples of pop with 200 people in each sample

#A good way to do this is to first create an EMPTY matrix to put our data

#into using the matrix() command

mysams = matrix(, ncol=100, nrow=200) #? Why ncol=100 and nrow=200?

#Then we can use a loop to place each of our samples into a column of this empty matrix called “mysams”

for (ii in 1:100) {

mysams[ ,ii] = sample(pop, size=200, replace=TRUE)

}

#Look to your right and double click over ‘mysams’

#—————————————————————

#4. Programming in R: The apply function

#—————————————————————

# We just learned how to use a loop to take MANY random samples

# from a population variable. While the purpose of this

# may not be clear just yet, it will be later on in the semester.

#Once we have our dataset containing 100 samples of 200 people, I’d like to find out the mean of each sample

#I could do this two ways:

#1) By manually doing it

mean(mysams[,1])

mean(mysams[,2])

#…

mean(mysams[,100])

#2) By using a loop

for (jj in 1:100) {

print( mean(mysams[,jj]) ) #I have to use print() here because things in loops don’t get output to the screen without it

}

#However, I don’t just want to KNOW the means of each sample, instead I’d like to have a variable

#where each observation is the mean of a given sample so that I can analyse the means of the samples

#We can do this by first creating an empty Vector of length 100

sam100means = numeric(100)

#And then using a loop to populate the vector

for (jj in 1:100) {

sam100means[jj] = mean(mysams[,jj])

}

#With this, I can examine the average (mean) of the means for each sample

mean(sam100means)

#And their distribution

hist(sam100means)

#There is an easier way to get this sam100means variable,

#We can use the apply() function!

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Collapse Data (apply): apply(X, MARGIN, FUN)

# X=dataset; MARGIN: 1=Rows, 2=Columns; FUN=Function

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#I will create a variable called sam100means2 containing the column means of mysams

sam100means2 = apply(X=mysams, MARGIN=2, FUN=mean)

#In the above: MARGIN=2 tells R to do the operation on the columns

#FUN=mean tells R to take the mean

#A Density plot of this:

plot(density(sam100means2))

#Exercise 4:

# A) Create a variable called sam100sd that contains the standard deviations of each

# sample from mysams. Use whatever method you prefer to do this.

# B) Show a density plot of the SDs from mysams

#A)

sam100sd = apply(X=mysams, MARGIN=2, FUN=sd)

#Alternately

sam100sd1=numeric(100)

for (i in 1:100) {

sam100sd1[i]=sd(mysams[, i])

}

#B)

plot(density(sam100sd))

#—————————————————————

#5. Sampling Distribution of the Uniform Distribution

#—————————————————————

#While we’ve seen that the distribution of means

#from random samples taken from a NORMAL population

#are normally distributed, what if our population is not normally distributed?

#Let’s see for example the uniform distribution

# which can be created using the runif() function

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Random Uniform variable: rnunif(n)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#I’ll create a uniform population distribution of 10,000 subjects

popunif = runif(10000) #Unifor distribution

#This distribution looks like:

hist(popunif)

#Exercise 5:

# A) Create a matrix called unifsams that contains 150 random samples of 250 subjects from popunif

# B) Create a variable called unif150means that contains the means for each of the 150 samples

# C) Create a density plot of means in unif150means. What does this plot look like?

#A)

unifsams = matrix(, ncol=150, nrow=250)

for (ii in 1:150) {

unifsams[,ii] = sample(popunif, 250, replace=TRUE)

}

#B)

unif150means = apply(unifsams, 2, mean)

#C)

plot(density(unif150means ))

#Normally Distributed remember the Central Limit Theorem.

# Read from the book section 5.3.2 to get a theoretical explanation about this last excercise

#Section:5.3.2 Approximating the Sampling Distribution of the Sample Mean: The General Case

## Leave a Reply

Want to join the discussion?Feel free to contribute!