Statistics
the mathematical science that deals with the collection, analysis, and presentation of data, which can be then be used as a basis for inference and induction
- infer a conclusion
Data and Information
Data | Information |
---|---|
values with no meaning (raw facts, measurements of interest) | once the data is processed it then is used for making a decision or can be used for a specific purpose. |
Data Set and Database
Data Set | Database |
---|---|
collection of data points | row / columns |
Sources of Data
Primary Data | Secondary Data |
---|---|
collected by self who is using the data | Collected by someone else |
expensive, time consuming | quick, readily available |
reliable | not reliable |
sole control over the data | published on magazines, newspapers or any published data |
Collection Methods (Primary Data)
Experiments | Direct Observation | Surveys /Questionnaires |
---|---|---|
Biased Surveys: Choose one from below (Forcing the user to take survey which can lead to ambiguous answer)
Two Types of Data
| Qualitative | Quantitative | | — | — | | Classified by descriptive terms: For example, | Described by numerical values | | Marital Status | | | Political Party | |
Within Quantitative Data:
Counted | Measured |
---|---|
Number of Children | Weight |
Defects per hour (Counted items) | Voltage (Measured Characteristics) |
Data by Level of Measurement
Level | Description | Example |
---|---|---|
Nominal | No ranking allowed | Postal Codes |
Ordinal | Ranking Allowed but no measurable meaning to the number difference | Education Level (PHD, Masters, Bachelors) |
Interval | Meaningful but no zero points | Calendar Year (2018, 2019) |
Ratio | Has zero points to reference from | Income ($80,000) |
Time Series vs. Cross-Sectional Data
Time Series | Cross-Sectional Data |
---|---|
Over the multiple years (2010-2020) | Within 2010 Data of (TX, AL, NY, CA, MN) |
With time series we can observe trend | Compare data at one particular point of time |
Population vs. Sample
| Population | Sample | | — | — | | all possible subjects | refers to a portion of the population (represents the population) |
Parameter vs Statistics
| Parameter | Statistics | | —– | — | | Values calculated from population | Values computed from sample |
Inferential Statistics
- Biased Samples
- a sample that does not represent the population
Ways to Misuse Statistics
- Changing the graph scale
- Choosing biased samples
Branches of Statistics:
- Descriptive
- collecting, summarizing and displaying data
- Inferential
- make conclusions/claims based on the sample data
- Predictive
- take data from the past and predict the future values and make decisions
Chebyshev’s Theorem
- not commonly used.
- we work mostly with Normal Distributions
Average is the best way to represent the entire group if there are no outliers.
Probability
- Numerical value ranging from 0 to 1.
- 0 being no chance of probability to 1 being 100% occurring of the event.
Experiment
Sample Space
- All the possible outcomes.
Event
- One of the outcome of an experiment.
- Outcome is basically a subset of the sample space.
Simple Event
- Single Outcome which the most basic form that cannot be further simplified.
Three Methods of Assigning Probability
- Classical
- Empirical
- Subjective
Classical Probability
P(A) = Number of possible outcomes / Total number of possible outcomes
Experiment: Roll a die once Sample space = {1, 2, 3, 4, 5, 6}
P(A) = 1/6 = 0.167 or a 16.7% probability.
Empirical Probability
- Conducting the experiment to observe the frequency with which an event occurs.
P(A) = Frequency in which Event A Occurs/Total number of observations
Law of Large Numbers Whenever the experiment is done more than
Subjective Probability
- Used when classical and empirical probabilities are not available.
- Example: The probability of inflation will be more than 4% next year.
Five Basic Properties of Probability
- Event A must occur.
- Even A will not occur.
- Must range from 0 to 1.
- The sum of all the probabilities for the simple events in the sample space must be equal to 1.
- Complement to Event A is defined as all of the outcomes of
Formula for the complement rule: P(A) + P(A’) = 1
Baye’s Theorem
Qualitative and Quantitative
Nominal
- deals with qualitative data
Ordinal