Assignment 2
A. Non-Computer Problems
1. (This problem uses the same data as Problem 4 from Assignment 1.) Show all work to receive full credit.
Suppose that a professor is interested in determining how much time her students spend on studying. On a Monday morning, she decided to conduct a survey of her class in which she asked students to list the number of hours they spent on their coursework during the past two days. The numbers reported by each student were as follows:
7, 5, 12, 10, 8, 6, 33, 13, 8, 7, 11, 10, 6, 9, 43, 7, 9, 8, 11, 9, 10, 38, 12, 9, 11.
A. Assume that the data collected by the professor constitute a population, and compute the following: mean, median, mode, range, interquartile range, and semi-interquartile range.
B. Consider again what it is the professor ultimately wants to know. Think about how it is being measured. Look at the data that resulted (you may wish to look over your various answers to part A and at your histogram from Assignment 1). Is there any aspect of these data that you find particularly striking? State what you think it is. Now, offer one reasonable explanation for why you think it occurred. You should be able to do this in one paragraph or less.
C. Which measure of central tendency do you think provides the best summary for these data? Explain your answer in a few sentences.
D. Consider whether there is anything that can be done to these data that would bring the mean more in line with the median. In other words, what change can be made to the data so that the new distribution would have a mean that is closer to its median? (HINT: Think about outliers). Once you think of something, state what you are going to do, then try it out and re-calculate both the median and the mean.
B. Computer Problems
For the following problems, use the data set “2.Asg.attain.dta”. This data set is an extract from the 1998 General Social Survey (GSS). For more information on the GSS (including definitions of the variables), see the following website:
http://www.norc.org/GSS+Website/Browse+GSS+Variables/
As you are doing the problems below, copy (as picture) and paste relevant Stata output into a Word document and type your answers to the questions in that document.
2. Using the “2.Asg.attain.dta” data set, calculate the following summary statistics for
both the
educ (respondent’s highest year of schooling) and
paeduc (father’s highest year of schooling) variables. Use the commands “
summarize educ, detail” and
“summarize paeduc, detail” Provide the Stata output and show how you got each answer.
a. Mean
b. Median
c. Range
d. 25th percentile
e. 75th percentile
f. Interquartile range
3. The variables
educ_o and
paeduc_o have been recoded from ratio variables in problem 2 to ordinal variables for this problem. These variables have categories “< HS,” “HS,” “Some Coll,” and “Collmedian1
An example with an odd number of observations:
game hours
rank order
5
1
Min
6
2
Mean
12.48
6
3
Median
9
7
4
Mode
9
7
5
Range
38
7
6
IQR
3
8
7
Q1
Semi-IQR
1.5
8
8
8
9
9
10
9
11
9
12
9
13
Median
10
14
10
15
10
16
11
17
11
18
11
19
Q3
12
20
12
21
13
22
33
23
38
24
43
25
Max
median2
An example with an even number of observations:
flfp
rank order
2
1
Min
Mode
38
3
2
Median
38.5
31
3
Mean
38.375
31
4
Range=55-2=53
34
5
IQR = 44-37.5=6.5
34
6
SIQR=6.5/2=3.25
37
7
37
8
Q1 = (37+38)/2 = 37.5
38
9
38
10
38
11
38
12
38
13
38
14
38
15
38
16
Median = (38+39)/2 = 38.5
39
17
39
18
39
19
40
20
41
21
42
22
44
23
44
24
Q3 = (44+44)/2 = 44
44
25
44
26
45
27
49
28
49
29
50
30
51
31
55
32
Max
SD calculation
If Population
If sample
Coffee
Deviation
Squared Deviation
Mean
5
5
0
0-5=
-5
(-5)^2=
25
Deviations
Col G
Col G
2
2-5=
-3
(-3)^2=
9
Sq. Deviations
Col I
Col I
3
3-5=
-2
(-2)^2=
4
Sum of Squares
90
90
3
3-5=
-2
(-2)^2=
4
Variance
4.50
4.74
3
3-5=
-2
(-2)^2=
4
St. Deviation
2.12
2.18
4
4-5=
-1
(-1)^2=
1
4
4-5=
-1
(-1)^2=
1
4
4-5=
-1
(-1)^2=
1
4
4-5=
-1
(-1)^2=
1
5
5-5=
0
0^2=
0
5
5-5=
0
0^2=
0
6
6-5=
1
1^2=
1
6
6-5=
1
1^2=
1
7
7-5=
2
2^2=
4
7
7-5=
2
2^2=
4
7
7-5=
2
2^2=
4
7
7-5=
2
2^2=
4
7
7-5=
2
2^2=
4
8
8-5=
3
3^2=
9
8
8-5=
3
3^2=
9Lecture Two
· When to use what:
· Univariate description: when we describe the distribution of a single variable, we use various numerical measures to characterize the center and dispersion
of that distribution.
· Measures of central tendency:
·
Mode (frequency-based): value(s) for the most frequently observed categor(ies).
· Can be unimodal, bimodal, or even multimodal.
·
Median (rank-order-based): the value of the middle one (or the average of the middle two if the number of cases is an even number) in a sorted distribution.
· Less sensitive to skewness (or outliers).
·
Mean (value-based): sum of all values divided by the number of cases.
· Sensitive to skewness (or outliers).
· In a histogram:
· From bins to density curve. [draw on board]
· Height of the curve = relative frequency; Area under the curve=1 (or 100%).
· Where are the mode, median, and mean (all on X-axis).
· Symmetric distribution: mode = median = mean.
· Positive skew: median < mean.
· Negative skew: median > mean.
· Measures of variability/spread/dispersion:
· Applicable to continuous variables only.
· Rank-order-based measures (i.e., you need to sort the distribution first):
· Introducing percentiles:
· If the Pth percentile is the value X, then P% of the observations in the distribution fall below X. For example, if the 95th percentile of exam grades of SYA 4450 is 86, then 95% of the students in this class have a lower score than 86.
· Graphically, division of the area under the curve. [draw on board]
· Min – 25th quartiles (Q1) – 50th quartile (Median, or Q2) – 75th quartile (Q3) – max. Usually identifiable in a frequency table.
· Percentiles are also called quantiles. In addition to quartiles, you will also see centiles
·
Range = max – min
· Very sensitive to extreme values at the ends.
·
Interquartile Range (IQR) = Q3 – Q1
· Not sensitive to extreme values beyond the quartiles.
·
Semi-interquartile range: IQR/2
· Value-based measures:
· Population data:
·
Variance (Var):
·
Standard Deviation (SD):
· Sample data:
·
Variance (Var):
·
Standard Deviation (SD):
·
Degrees of Freedom (DF,
df)
· Given that: sample size = n, and the sample mean is known;
· DF = n-1
· Notations:
Population
Sample
Number of observations
N
n
Mean
Deviation (D)
<
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.