Intro to Descriptive Statistics

20th August 2020 at 2:19pm
Course Data Science

Intro to Descriptive Statistics

NameIntro to Descriptive Statistics
InstructorUdacity Team
PlatformUdacity

Intro to Descriptive Statistics is a free udacity course. The courseware is located in Dropbox\Journey\Resources\Courses\2017-Intro-to-Descriptive-Statistics. Some annotations may be written in the slides.

Intro to Research Methods

Concepts:

  • population parameters, sample statistics
  • X-axis: independent variable / predicted variable, Y-axis: dependent variable / outcome
  • lurking variable, extraneous factor, correlation does not prove causation
  • A construct is a variable that is not directly observable or measurable, and units are at the heart of measurement. What units do you think should be used to measure distance? Some possibilities are miles, kilometers, calories expended, state lines crossed, road signs passed, etc.
  • operational definition 不一定需要是客观可测量的东西,可以是主观的。可以是自己受试者自己打的分,比如针对睡眠质量给一个 1~10 分的打分体系,1 为最差,10 为最佳。
    • 有另外一个例子很奇特,针对随机两组人给安眠药 / Placebo,进行双盲测试。这个测试中的 operation definition 并不是受试者打出的睡眠质量分,而是两组人的睡眠质量分差异。原文:「Note that while the operational definition of quality of sleep is the 10-point scale, the way we actually measure success comes from the difference in quality between groups.」

如何判断一个 histogram 的分布 (distribution) 是 positive skew 还是 negative skew?看它的长尾在哪边。在右边就是 positive,在左边就是 negative。

Lesson 17: PS 6: Normal Distribution

If 33% of scores are less than Score X, then the percentile of Score X is 33%. That's how percentile is defined!

percentile 的含义比较奇特。这里的 score 指样本里面的某个数据值 (比如身高多少厘米 / 考试多少分这种)。

Lesson 18: Sampling Distribution

这章是比较重要的。它用来分析 population 中的 sample means。关键概念:standard error, The Central Limit Theorem 等。

Population 和 Sample 一直让我搞不明白。Population 有时候表示 sample 的超集,比如某一学校的学生 (population) 和这一学校中某一年级的学生 (sample);有时候 (比如这个例子),population 像是表示 sample 中可能的值的集合。

The Central Limit Theorem 表示 distribution of sample means 是一条正态曲线。这可以用来判断一个样本是不是典型的 (观察它的平均值落在正态曲线的哪里)。