CEE 6601 – Statistics in Transportation

Course: CEE 6601:  Statistics in Transportation.  4 units (3 units lecture, 1 unit unsupervised lab), letter-graded.

Instructor:        Patricia Mokhtarian, Professor

                           School of Civil and Environmental Engineering

                           322 SEB, (404) 385-1443, patmokh@gatech.edu

Course Structure:  The 1-unit (= 3 unscheduled hours per week) “unsupervised lab” associated with this class reflects time allotted to familiarizing yourself with one or more statistical software packages needed to complete the assignments.  Most students use R for this purpose, but you are welcome to learn any package (such as Minitab, SPSS, or STATA) that will do the job.  I am not an R user, but I may be able to help diagnose some issues if you show me the output/error messages you are receiving.  The TA is available to help with software questions or further explain course concepts as needed.  Both of us will be available by e-mail to the best of our ability.

Course Objective:  This course is intended to (1) equip MS and PhD students with some standard tools of statistical analysis; and (2) provide liberal doses of the practical application advice that is often lacking in methods-oriented courses.  It is expected to be useful not only for those who plan to use such tools in their research (i.e. “producers” of statistics), but also for anyone who wants to be an intelligent “consumer” of statistics encountered in their professional and everyday lives.

Prerequisites:  Differential and integral calculus, introduction to probability & statistics (taught at the calculus level), and linear algebra.  See the “reading materials” folder on the class web site for brief reviews of probability, statistics, and linear algebra.  It is advisable to read the one on proba­bility before the next class.  Since it is assumed that you already know these concepts they will not (for the most part) be covered in lectures, but to ensure that your memory is refreshed, Homework 1 will have a number of problems relating to the content of the probability review document.

Text:  Lecture notes, and other readings as assigned.  The lecture notes constitute your automatic reading assignment for each class period.  Although I won’t keep saying that, it’s in your best interests to take it seriously and do the reading.

Homework and Exam: Your final grade will be based on 7 homework assignments and a final exam. Each HW assignment will count 12% of your final grade, and the final will account for the remaining 16%. The exam will be indivi­dual and open notes, and will require a calcu­lator.  Its coverage is comprehensive; anything I mention in class is fair game, whether covered by homework or not.  So again, read the lecture notes.  HWs will be distributed (through the Canvas site) approximately every other Thursday (hopefully starting 1/12), but should be turned in on paper.  Where a HW problem involves presenting a lot of equations and/or calculations, responses may be handwrit­ten as long as they are neat and legible.  When a problem involves a fair amount of narrative expla­nation, typing is obviously preferred.  They will be due at the beginning of class on the Tuesday before the next assignment comes out – i.e. 12 days later.  HWs will be partly “in advance” and partly “in arrears”.  That is, some of each assignment may deal with material already covered by the time it is distributed (the “arrears” part), while some may deal with material yet to be covered – covered as late as the Thursday before the HW is due the following Tuesday (the “advance” part). So don’t be surprised if not everything looks familiar or doable at first, but do try to do as much as you can after each lec­ture, instead of waiting until all topics are covered before starting.

Late Homework Policy: NO late HWs are accepted (and I repeat, they are due at the beginning of class). On the semester system, and with only seven assignments, there should be ample time to plan ahead and complete each one.

Make-up Policy: Since I post the solutions to the HW assignments shortly after the deadline, allow­ing a make-up assignment requires preparation of an entirely new assignment, which is quite a burden on both the TA and myself.  Therefore, extremely extenuating (and documented) circum­stances are required for me to approve a make-up assignment.  In any case, you must request a make-up before the next as­sign­ment is due; other­wise the grade for the missed assignment be­comes a permanent zero.

Extra Credit Assignment to Bring Up Your Grade: Sorry, the answer is no.  Offering YOU an extra credit assignment to help bring up your grade is not fair to the others in the class unless I offer the same opportunity to everyone, which could result in a great deal of extra work for the TA and myself.  More importantly in one sense, it’s my belief that you’re better off putting the time you would put into an extra credit assignment, into doing better on the regular assignments, practicing additional problems, etc.  So if you need to make a certain grade in the class, plan to give the regular assign­ments your best effort from the beginning, because they’re all you’re going to get.

Collaboration: I have no objection to collaboration on homework assign­ments; indeed studying and working through problems to­gether can offer significant syner­gy.  However, the solu­tions you turn in must be your own, in your own words and your own style.  Directly copying someone else’s solution is cheat­ing. And col­laboration on the final exam is also cheat­ing.  Any known or suspected violations of the Academic Honor Code will be reported to the Office of Student Integrity. The Honor Code can be found at http://www.policylibrary.gatech.edu/student-affairs/academic-honor-code.

Unauthorized Use of Materials/Agents: The use of any previous homework or exam solutions from this course, or any other course I teach or have taught, is prohibited. Using such materials will be con­sidered a direct vio­lation of the Academic Honor Code, and will be reported to the Office of Student Integrity.  Using any other humans or artificial intelligence programs to produce solutions for you is also a violation.  Sim­ilarly, redistributing your graded assignments or posted solutions from this semester to individuals or groups (e.g., contributing to online HW/test banks) is also prohibited. For any questions involving these or any other Academic Honor Code issues, please consult me or http://www.policylibrary.gatech.edu/student-life/academic-misconduct

Topics (approx. number of 1½-hour lectures):

  1. Overview of course contents and expectations (1)
  2. In praise of Bayes (2)
    • The terror of the false positive
    • Confusion of the inverse: why is it irrelevant to know that “95% of all heroin addicts started out by smoking pot”?
    • The Prosecutor’s and Defense Attorney’s Fallacies; The Conjunction Paradox
    • How to communicate probabilistic information to a lay audience
  3. The flaw of averages (1)
    • The peril of focusing on averages and ignoring variability
    • Jensen’s inequality
  4. Review of hypothesis testing procedure and common applications (4)
    • t-test for population mean
    • Type I and Type II errors; power calculations
    • t-test for equality of means in two independent populations and paired populations
    • (chi-squared) test for variance of a normal population
    • Levene’s (F) test of equality of variances for two normal populations
    • chi-squared goodness of fit test
    • chi-squared test of independence
  5. The philosophy of hypothesis testing (3)
    • Why doesn’t a 0.05 p-value mean there’s a 95% chance that the null hypothesis is false?
    • An alternative:  the Bayes Factor
    • Insignificant variables in a model:  to prune, or not to prune?
    • Why do some authorities say we should never do one-sided t-tests?
  6. Regression analysis (7)
    • Basic assumptions
    • Least squares estimation
    • Goodness of fit:  (R2), F-test for goodness of regression
    • F-test for constrained versus unconstrained model; Chow test for different segments
    • Violations of assumptions
    • Specification issues (dummy variables, nonlinear transformations, segmentation)
  7. Maximum likelihood estimation (2)
    • General; linear regression
    • AIC, BIC, likelihood ratio test
  8. Bayesian estimation (2)
  9. Analysis of variance (ANOVA) (1)
  10. Introduction to Poisson processes (2)
    • Basics, relationship to the exponential distribution
    • Memorylessness; relationship to binomial and uniform distributions; merging and splitting Poisson processes
  11. Correlation and causation (2)
    • When can we infer causality?
    • How can A and B be significantly correlated but A not cause B? 
    • How can A and B not be significantly correlated, and yet A does cause B?
    • Why might you get the wrong sign in a regression or discrete choice model, and what should you do about it?
    • Why do X and Y have a positive pairwise correlation, but X has a negative coefficient in an equation predicting Y?  (suppression)
    • If corr(A,B) = corr(B,C) = 0.7071, why might corr(A,C) = 0?  (the cosine law)
    • Decomposition of variance (path analysis)
  12. Size matters (2)
    • Looking at effect magnitudes, not just statistical significance
    • Why doesn’t a coefficient’s magnitude always tell you how important the associated variable is, and how can you tell?
    • Uses and abuses of elasticities
  13. Simpson’s Paradox; the ecological fallacy (1)
    • When a sample is partitioned into two groups, how can a variable of interest, measured at two points in time, increase over time for each of the two groups viewed separately, but decrease for the pooled sample as a whole?
  14. Writing about statistics (1)
    1. Text: pet peeves (e.g. “prove”, “confirm”, “significant”; missing comparatives)
    1. Tables: need for reader-friendliness
    1. Figures: how to mislead without really trying

Computer Usage:    Analysis of data sets using multivariate statistical software (such as SPSS, R, SAS)