AP Statistics Study Guide

Google doc Cheat sheet with stuff not on Formula Sheet here


From Simple Studies, **https:%%//%%simplestudies.edublogs.org** & @simplestudies4 on Instagram

Statistics: The science of data

Data always involves individuals and variables

There are two varieties of variables:

Distribution tells us what values a variable takes and how frequently it takes these values

How to go from Data Analysis to Inference:

A Two-way Table describes two categorical variables, organizing counts according to a row variable and a column variable

Source:

https:%%//%%www.statology.org/conditional-relative-frequency-two-way-table/

The Marginal Distribution of one of the categorical variables is the distribution of values of that variable among all individuals described by the table

● Ex: Marginal distribution of gender: Male: 48/100 = 48% Female: 52/100 = 52%

These are the steps to take to examine a marginal distribution:

A Conditional Distribution of a variable describes the values of that variables among individuals who have a particular value of another variable

Here are the steps to take to examine or compare conditional distributions:

When describing distribution of quantitative data, we use the acronym SOCCS

Stem-and-Leaf Plots are a simple graphical display for small sets of data

image4.jpeg

Source:

https:%%//%%en.wikipedia.org/wiki/Stem-and-leaf_display

These are the steps on how to make a Stem-and-Leaf Plot:

Histograms are graphs that display the distribution of a quantitative variable by showing each interval of the values as a bar

image6.jpeg

Source: https:%%//%%online.stat.psu.edu/stat500/book/export/html/539

These are the steps to take on how to construct a histogram:

The median is the midpoint of the distribution

These are the steps to take to find the median:

The mean is the average of all individual data values

These are some observations you should look at to determine if you should use the mean or median to measure the center of a distribution of data:

These are the steps to take to calculate quartiles:

The standard deviation - average distance between each value and the mean

A five-number summary is a quick summary of the distribution of a data set

image9.jpeg

Source: https:%%//%%www.simplypsychology.org/boxplots.html

Percentile: The nth percentile of a distribution is the value with n percent of the observations less than it

Adding or subtracting the same number n to each observation:

The z-score tells us how many standard deviations away from the mean an observation falls, and what direction it falls in

When data has a regular overall pattern, we can use a simplified model called a density curve to describe it

Normal distributions are often shown in Normal curves

  • A normal curve is described by its mean and standard deviation

    • The mean of a normal distribution is at the center of the normal curve

      • It is the same as the median

        • The standard deviation is the distance from the center to the change-of-curvature points on either side

          Source:

          http:%%//%%www.stat.yale.edu/Courses/1997-98/101/normal.htm

          The Empirical Rule: In the normal distribution with mean m and standard deviation s:

          image12.jpeg

          Source: http://stevegallik.org/cellbiologyolm_statistics.html

          The Standard Normal Distribution is the normal distribution with mean 0 and standard deviation 1

          Source: https://statistics-

          made-easy.com/standard-normal-distribution/

          We use Table A to find the proportion of observations in a standard normal distribution that satisfies each z-score:

          We can also use the calculator to find the proportion of observations in a standard normal distribution that satisfies each z-score:

          A normal probability plot provides a good assessment of the adequacy of the normal model for a set of data

          Source:

          https:%%//%%mathcracker.com/normal-probability-plot-maker

          When analyzing two or more variables, there are two types you should keep in mind:

          When examining the relationship between variables, these steps should be taken:

          image15.jpeg

          Source:

          https:%%//%%www.mathsisfun.com/data/scatter-xy-plots.html

          For a linear association between two quantitative variables, the correlation ® measures both the direction and strength of the association

          A regression line displays the relationship between two variables, but only when one of the variables helps explain or predict the other

          image17.jpeg

          Source: https:%%//%%learningstatisticswithr.com/book/regression.html

          A regression line relating y to x has the equation ŷ = a + bx

          The Coefficient of Determination measures the percent of the variability in the response variable that is accounted for by the least-square regression line

          A residual is the difference between the actual value of y and the predicted value of y by the regression line

          image19.jpeg

          Source:

          https:%%//%%www.statisticshowto.com/least-squares-regression-line/

          Residual Plot: A scatter plot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis

          Source: https:%%//%%opexresources.com/analysis-residuals-explained/

          Here are some vocabulary terms regarding sampling and surveys:

          These are the different types of sampling designs:

          These are the different types of bias:

          Observational studies of the effect of one variable on another often fail because of these reasons:

          These are some vocabulary terms that deal with experiments:

          The three principles of experimental design are:

          Probability: any outcome of chance process is a number between 0 and 1 that describes the proportion of times the outcome would occur in a series of repetitions

          Law of Large numbers: If we observe more and more repetitions of any chance process, the proportion of times that a specific outcome occurs approaches its probability

          Probability Model: A description of some chance process that consists of two parts: a list of all possible outcomes and the probability for each outcome.

          If all outcomes in the sample size are equally likely, the probability that event A occurs can be found using this formula:

          Basic Rules of Probability:

          Two events are mutually exclusive if they have no outcomes in common and can never occur together

          If A and B are any two events resulting from some chance process, the general addition rule says that:

          Intersection: The event “A and B” is called the intersection of events A and B

          Union: The event “A or B” is called the union of events A and B

          Conditional Probability: The probability that one event happens given that another event is known to have happened is called a conditional probability

          Independent: Two events are independent if the occurrence of one event has no effect on the chance that the other will happen

          General Multiplication Rule: For any chance process, the events A and B both occur can be found using the general multiplication rule:

          Tree Diagram: Shows the sample space of a chance process involving multiple stages

          image26.jpeg

          Source:

          https:%%//%%www.onlinemathlearning.com/probability-tree-diagrams.html

          If A and B are independent events, the probability that A and B both occur is:

          Discrete Random Variable: Takes a fixed set of possible values with gaps between them

          Continuous Random Variable: Can take any value in an interval on the number line

          For any two random variables X and Y, if S = X + Y, the mean of S is:

          For any two random variables X and Y, if D = X - Y, the mean of D is:

          For any two independent random variables X and Y, if S = X + Y, the variance of S is:

          For any two independent random variables X and Y, if D = X - Y, the variance of D is:

          A binomial setting arises when we perform n independent trials of the same chance process and count the number of times that a particular outcome (a success) occurs. It must pass these conditions:

          The variable X = the number of successes is called a binomial random variable To find the probability of exactly k successes: binompdf (n, p, k)

          If a count of X successes has a binomial distribution with n number of trials and p probability of success:

          When taking an SRS of size n from a population of size N, we can use a binomial distribution to model the count of success in the sample as long as:

          As the number of trials increases, the binomial distribution gets closer to a normal one

          A geometric setting arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success It must pass these conditions:

          The variable Y = The number of trials it takes to get a success in a geometric setting

          The shape of a geometric distribution is always skewed right

          If Y is a geometric random variable with probability of success p on each trial:

          The sampling distribution of the sample proportion describes the distribution of values taken by the sample proportion in ALL POSSIBLE samples of the same size from the same population.

          The sampling distribution of the sample mean describes the distribution of values taken by the sample mean in ALL POSSIBLE samples of the same size from the same population.

          The Central Limit Theorem states that when n is large (>30), the sampling distribution of the sample mean is approximately normal

          Shape of the Sampling Distribution of the Sample Mean x:

          The Point Estimator is a statistic that provides an estimate of a population parameter

          A Confidence Interval gives an interval of plausible values for a parameter based on sample data

          Interpreting a Confidence Interval:

          A Confidence Level gives the overall success rate of the method used to calculate the confidence interval

          Interpreting a Confidence Level:

          A Critical Value is a multiplier that makes the interval wide enough to have the stated captured rate

          The margin of error gets smaller when:

          When the conditions are met, a C% confidence interval for the unknown proportion p is p̂

          ± ∗√ ̂(1̂)

          These are the conditions we need for estimating p:

          To summarize, these are the conditions for constructing a confidence interval about a proportion:

          When the standard deviation of a statistic is estimated from data, the result is called the standard error of the statistic

          These are the four-steps you MUST take when constructing a confidence interval:

          We can also construct a confidence interval for an unknown population proportion on our calculator by using Stat > Tests > 1-PropZInt

          To determine the sample size n that will give us a C% confidence interval for a population with a

          maximum margin of error, solve the following equality for n: $\sqrt{\frac{\hat{p}(1-\hat{p})}{n}\ge ME}$

          When estimating the population mean using a sample standard deviation, we use a t-distribution:

          There is also a different t distribution for each sample size, specified by its degrees of freedom

          When the conditions are met, a C% confidence interval for the unknown mean is

          These are the conditions we need for estimating μ:

          Null Hypothesis (Ho): The claim we weigh evidence against in a significance test

          Alternative Hypothesis (Ha): The claim that we are trying to find evidence for

          The significance level (α) is the value that we use as a boundary for deciding whether an observed result is unlikely to happen by chance alone when the null hypothesis is true

          The p-value of a test is the probability of getting evidence for the alternative hypothesis as strong or stronger than the observed evidence when the null hypothesis is true.

          This is the formula to use when asked to interpret a p-value for a one-tailed test:

          This is the formula to use when asked to interpret a p-value for a two-tailed test:

          0.7 in either direction from a random sample of 160 students in Ivy’s school

          This must be included in the conclusion for a significance test:

          To summarize, here is everything you should include in a significance test:

          When drawing conclusions from a significance test, there are two types of mistakes we can make:

          These are the four possible outcomes of a significance test:

          • If Ha is true:

            • Our conclusion is correct if we find convincing evidence that Ha is true

              • We make a Type II error if we do not find convincing evidence that Ha is true

                errors_fig1_268035363

                The probability of making a Type I error in a significance test is equal to the significance level

                Standardized Test Statistic: Measures how far a sample statistic is from what we would expect if the null hypothesis were true in standard deviation units

                These are the conditions for using a standardized test statistic (proportion):

                One Proportion Z-Test: To perform a test of Ho: = 0, compute the standardized test statistic

                Conditions for using the standardized test statistic (mean):

                One Sample t Test for a Mean: To perform a test of $\mu = \mu_0$ compute the standardized test statistic

                There is a link between two-sided tests and confidence intervals for a population mean:

                The power of a test is the probability that the test will find convincing evidence for Ha when a specific alternative value of the parameter is true

                These are some things you can do to increase the power of a significance test:

                Sampling Distribution of p̂1 - p̂2: Choose a simple random sample of size n1 from population 1 with proportion of successes p1 and an independent simple random sample of size n2 from population 2 with proportion of successes p2

                In a significance test when comparing two proportions, the null hypothesis has this form:

                To run a significance test of p1 - p2 = 0, this is the standardized test statistic:

                Sampling Distribution of x̅1 - x̅2: Choose a simple random sample of size n1 from population 1 with mean μ1 and standard deviation σ1 and an independent simple random sample of size n2 from population 2 with mean μ2 and standard deviation σ2

                In a significance test when comparing two means, the null hypothesis has this form:

                To run a significance test of μ1 - μ2 = 0, this is the standardized test statistic:

                Source: https:%%//%%apcentral.collegeboard.org/pdf/ap-statistics-course-and-exam-description.pdf

                1)
                p(1-p