Assignment 2

A. Non-Computer Problems

1. (This problem uses the same data as Problem 4 from Assignment 1.) Show all work to receive full credit.

Suppose that a professor is interested in determining how much time her students spend on studying. On a Monday morning, she decided to conduct a survey of her class in which she asked students to list the number of hours they spent on their coursework during the past two days. The numbers reported by each student were as follows:

7, 5, 12, 10, 8, 6, 33, 13, 8, 7, 11, 10, 6, 9, 43, 7, 9, 8, 11, 9, 10, 38, 12, 9, 11.

A. Assume that the data collected by the professor constitute a population, and compute the following: mean, median, mode, range, interquartile range, and semi-interquartile range.

B. Consider again what it is the professor ultimately wants to know. Think about how it is being measured. Look at the data that resulted (you may wish to look over your various answers to part A and at your histogram from Assignment 1). Is there any aspect of these data that you find particularly striking? State what you think it is. Now, offer one reasonable explanation for why you think it occurred. You should be able to do this in one paragraph or less.

C. Which measure of central tendency do you think provides the best summary for these data? Explain your answer in a few sentences.

D. Consider whether there is anything that can be done to these data that would bring the mean more in line with the median. In other words, what change can be made to the data so that the new distribution would have a mean that is closer to its median? (HINT: Think about outliers). Once you think of something, state what you are going to do, then try it out and re-calculate both the median and the mean.

B. Computer Problems

For the following problems, use the data set “2.Asg.attain.dta”. This data set is an extract from the 1998 General Social Survey (GSS). For more information on the GSS (including definitions of the variables), see the following website:
http://www.norc.org/GSS+Website/Browse+GSS+Variables/
As you are doing the problems below, copy (as picture) and paste relevant Stata output into a Word document and type your answers to the questions in that document.

2. Using the “2.Asg.attain.dta” data set, calculate the following summary statistics for
both the
educ (respondent’s highest year of schooling) and
paeduc (father’s highest year of schooling) variables. Use the commands “
summarize educ, detail” and
“summarize paeduc, detail” Provide the Stata output and show how you got each answer.

a. Mean
b. Median
c. Range
d. 25th percentile
e. 75th percentile
f. Interquartile range

3. The variables
educ_o and
paeduc_o have been recoded from ratio variables in problem 2 to ordinal variables for this problem. These variables have categories “< HS,” “HS,” “Some Coll,” and “Collmedian1 An example with an odd number of observations: game hours rank order 5 1 Min 6 2 Mean 12.48 6 3 Median 9 7 4 Mode 9 7 5 Range 38 7 6 IQR 3 8 7 Q1 Semi-IQR 1.5 8 8 8 9 9 10 9 11 9 12 9 13 Median 10 14 10 15 10 16 11 17 11 18 11 19 Q3 12 20 12 21 13 22 33 23 38 24 43 25 Max median2 An example with an even number of observations: flfp rank order 2 1 Min Mode 38 3 2 Median 38.5 31 3 Mean 38.375 31 4 Range=55-2=53 34 5 IQR = 44-37.5=6.5 34 6 SIQR=6.5/2=3.25 37 7 37 8 Q1 = (37+38)/2 = 37.5 38 9 38 10 38 11 38 12 38 13 38 14 38 15 38 16 Median = (38+39)/2 = 38.5 39 17 39 18 39 19 40 20 41 21 42 22 44 23 44 24 Q3 = (44+44)/2 = 44 44 25 44 26 45 27 49 28 49 29 50 30 51 31 55 32 Max SD calculation If Population If sample Coffee Deviation Squared Deviation Mean 5 5 0 0-5= -5 (-5)^2= 25 Deviations Col G Col G 2 2-5= -3 (-3)^2= 9 Sq. Deviations Col I Col I 3 3-5= -2 (-2)^2= 4 Sum of Squares 90 90 3 3-5= -2 (-2)^2= 4 Variance 4.50 4.74 3 3-5= -2 (-2)^2= 4 St. Deviation 2.12 2.18 4 4-5= -1 (-1)^2= 1 4 4-5= -1 (-1)^2= 1 4 4-5= -1 (-1)^2= 1 4 4-5= -1 (-1)^2= 1 5 5-5= 0 0^2= 0 5 5-5= 0 0^2= 0 6 6-5= 1 1^2= 1 6 6-5= 1 1^2= 1 7 7-5= 2 2^2= 4 7 7-5= 2 2^2= 4 7 7-5= 2 2^2= 4 7 7-5= 2 2^2= 4 7 7-5= 2 2^2= 4 8 8-5= 3 3^2= 9 8 8-5= 3 3^2= 9Lecture Two · When to use what: · Univariate description: when we describe the distribution of a single variable, we use various numerical measures to characterize the center and dispersion of that distribution. · Measures of central tendency: · Mode (frequency-based): value(s) for the most frequently observed categor(ies). · Can be unimodal, bimodal, or even multimodal. · Median (rank-order-based): the value of the middle one (or the average of the middle two if the number of cases is an even number) in a sorted distribution. · Less sensitive to skewness (or outliers). · Mean (value-based): sum of all values divided by the number of cases. · Sensitive to skewness (or outliers). · In a histogram: · From bins to density curve. [draw on board] · Height of the curve = relative frequency; Area under the curve=1 (or 100%). · Where are the mode, median, and mean (all on X-axis). · Symmetric distribution: mode = median = mean. · Positive skew: median < mean. · Negative skew: median > mean.

· Measures of variability/spread/dispersion:
· Applicable to continuous variables only.
· Rank-order-based measures (i.e., you need to sort the distribution first):
· Introducing percentiles:
· If the Pth percentile is the value X, then P% of the observations in the distribution fall below X. For example, if the 95th percentile of exam grades of SYA 4450 is 86, then 95% of the students in this class have a lower score than 86.
· Graphically, division of the area under the curve. [draw on board]
· Min – 25th quartiles (Q1) – 50th quartile (Median, or Q2) – 75th quartile (Q3) – max. Usually identifiable in a frequency table.
· Percentiles are also called quantiles. In addition to quartiles, you will also see centiles
·
Range = max – min

· Very sensitive to extreme values at the ends.
·
Interquartile Range (IQR) = Q3 – Q1

· Not sensitive to extreme values beyond the quartiles.
·
Semi-interquartile range: IQR/2

· Value-based measures:
· Population data:
·

Variance (Var):

·

Standard Deviation (SD):

· Sample data:
·

Variance (Var):

·

Standard Deviation (SD):

·
Degrees of Freedom (DF,
df)

· Given that: sample size = n, and the sample mean is known;
· DF = n-1

· Notations:

Population

Sample

Number of observations

N

n

Mean

Deviation (D)

<




Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.