Essays24.com - Term Papers and Free Essays
Search

Course Note

Essay by   •  February 26, 2017  •  Course Note  •  1,346 Words (6 Pages)  •  912 Views

Essay Preview: Course Note

Report this essay
Page 1 of 6

Basic Definitions

  • Experimental unit(样本单位)) – an object (e.g., person, thing, transaction, or event) upon which we collect data
  • Population (总体)– set of units that we are interested in studying
  • Sample (样本)– subset  of units of a population
  • Variable (变量)– a characteristic or property of an individual experimental unit
  • Statistical Inference (统计推断)– making an estimate or prediction or some other generalization about a population based on information contained in a sample

Elements of Descriptive Statistics

  • (1) Data set (population or sample), (2) variable of interest, (3) graphs or numerical measures, (4) conclusions about the data pattern

Elements of Statistical Inference

  • (1) Population, (2) variable of interest, (3) sample, (4) inference, (5) measure of reliability

Types of data

  • 1) Quantitative data – can be described numerically
  • Age, height, weight, size of family, income, GDP, CPI, Stock market average, monthly sales revenue
  • Quantitative Data:  Cross-sectional / Time Series
  • 2) Qualitative data – not inherently numerical
  • Also called categorical data or attributes data
  • Color of eyes, accurate / not, left or right handed, yes/no variables, employment status, defect/no defect, occupation code…

Collecting Data & Sampling Techniques

  • Published source
  • Designed experiment
  • Observational study (Survey)
  • Random sample selected from the target population of interest
  • Selection bias – a subset of the experimental units in the population are excluded from the sample
  • Nonresponse bias – researchers are unable to obtain data on all experimental units selected for the sample。(Solution – sample the non-respondents to determine characteristics of non-respondents.

Sampling Designs

  • 1)  Stratified random sample
  • Split up the population into strata
  • Randomly sample from each strata
  • More representative sample?
  • Dependent on the researcher’s strata
  • 2)  Systematic random sampling
  • Take every kth item
  • Useful for processes
  • Watch out for systematic sampling biases (cycles)
  • Security lines at airports - TSA
  • 3)  Cluster and multi-stage clustering
  • Cluster the population into subpopulations
  • Randomly select clusters to get to the elements of the population
  • 4) Convenience samples – sample elements are selected that are convenient to the researchers

Types of Sample Design Errors

  • Sampling error(抽样误差) – difference between the estimator (sample statistic) and the true population parameter.
  • Due to sample vs. population
  • 抽样方法本身所引起的误差。当由总体中随机地抽取样本时,哪个样本被抽到是随机的,由所抽到的样本得到的样本指标x与总体指标μ之间偏差,称为实际抽样误差。当总体相当大时,可能被抽取的样本非常多,不可能列出所有的实际抽样误差,而用平均抽样误差来表征各样本实际抽样误差的平均水平
  • Nonsampling (measurement) error(非抽样误差) – all other errors that cause a difference between an estimator and a population parameter.
  • 非抽样误差是指除抽样误差以外所有的误差的总和。应该说非抽样误差的产生贯穿了市场调查的每一个环节,任何一个环节出错都有可能导致非抽样误差增加而使数据失真。我们平时说的控制误差主要指的就是控制非抽样误差。
  • Poor sampling design
  • Interviewer errors / interviewer biases
  • False information provided by respondent
  • Poorly worded or loaded questions
  • Data errors
  • Undercoverage
  • Non respondents

Descriptive Statistics

  • Four things to know about any distribution  
  • 1)  Measures of Location (Central Tendency)
  • Midrange

[pic 1]

  • Mode(众数)
  • Median(中间数,比Median大的有50%,比Median小的有50%)
  • Mean

[pic 2]

  • trimmed mean:mean of the middle x% of the data
  • 2) Measures of Variability (Dispersion)
  • Range = [max minus min]
  • interquartile range :Qu - QL
  • Upper (3rd) Quartile – Lower (1st) Quartile
  • 75th percentile - 25th percentile
  • ‘Upper management’ - range of the middle 50% of the distribution
  • Qu = 3/4 (n + 1) Round to nearest integer
  • QL = 1/4 (n + 1) Round to nearest integer
  • Box plot
  • IQR is the box, median is the line in the box
  • Hinge points are at the edges of the box
  • QU and QL
  • Inner fence: Hinge point +/-1.5 (IQR)  
  • Suspect outliers
  • Outer fence: Hinge point +/- 3 (IQR)
  • Designated as highly suspect outliers
  • Whiskers – lines to the edges of the inner fence
  • Variance(方差): population / sample
  • Population variance    

[pic 3]

  • Sample variance  

[pic 4]

  • Computational equation for sample variance

[pic 5]

  • Tells how far on average each value is away from the mean
  • standard deviation
  • Population:  σ       Sample:  s

[pic 6] [pic 7]

  • coefficient of variation (变异系数,比较两组数据离散程度大小)

[pic 8]

  • 3) Shape
  • Symmetry / skewness / mathematical form
  • Mound-shaped Distributions-Use the Empirical Rule
  • z-score(标准分数):  

[pic 9] [pic 10]

[pic 11]

  • Approximately 68% of the data is within 1 standard deviation
  • Approximately 95% of the data is within 2 standard deviations
  • Approximately 99.7% (essentially all) of the data is within 3 standard deviations
  • z > 2 = possible outlier, z > 3 = outlier.
  • For any shaped distribution-Chebyshev’s Inequality(切比雪夫不等式),k是标准差的个数。

[pic 12]

  • 4) Data patterns (for time series data)
  • Time series

Graphical Techniques

  • Bar chart
  • Vertical axis:frequency, relative frequency
  • Horizontal axis:variable of interest (Xi)
  • Pareto diagram - Bar chart with bars ordered by frequency (highest to lowest)

Random Variables and Probability Distributions(随机变量与概率分布)

  • Types of Random Variables-Discrete / Continuous
  • Discrete – random variable that can only take on a finite number of values (countable)

Number of defects per product, occupation code, type of failure, reason for customer return, type of customer complaint

  • Continuous – random variable that can take on any infinite value within an interval (measurement)

Wait time at a fast-food window, strength of a laptop case, response time of a computer system

...

...

Download as:   txt (8 Kb)   pdf (744.9 Kb)   docx (222.5 Kb)  
Continue for 5 more pages »
Only available on Essays24.com