STA561 COMPSCI571 ECE682: Probabilistic Machine Learning: Spring 2020

Distance Learning Version

tr>
Prof:Sayan Mukherjee sayan@stat.duke.edu Zoom me: T/W/Th 1-3pmhttps://duke.zoom.us/j/6247790803
TAs: Ian Hill ian_hill160@duke.eduOH: Thursday 11:00-13:00 Old Chemistry 203B
Rui Wang rui.wang16@duke.eduOH: Friday 15:00-17:00 3431 CIEMAS.
Pengyu Cheng pengyu.cheng@duke.eduOH:
Yuren Zhou yuren.zhou@duke.eduOH:
Zheng Yuanzheng.yuan@duke.eduOH:
Yiwei Gongyiwei.gong@duke.eduOH:
Class:

Description

Introduction to machine learning techniques. Graphical models, latent variable models, dimensionality reduction techniques, statistical learning, regression, kernel methods, state space models, HMMs, MCMC. Emphasis is on applying these techniques to real data in a variety of application areas.

Statistics at the level of STA611 (Introduction to Statistical Methods) is encouraged, along with knowledge of linear algebra and multivariate calculus.


Updates to course due to distance learning

First of all be safe and take care of yourselves physically and mentally. Second all classes at Duke are now S/U. I am going to be very lenient in terms of giving an S. Also, there will be no required take home final. I will post one and students can submit solutions but it is not required.

For those of you who want a letter (A,B,C...) grade the final is required and your grde will be based on 30% midterm, 60% final, 10% final project. If you want to get a letter grade you need to opt in.

What I will do: I am going to teach the rest of the semester as asychronosly as I can. What I mean by that is that I WILL NOT live stream my lectures during class time (10:05-11:20). I will record lectures by zoom and upload a link to the class website. The first lecture will be up Tuesday March, 24. My default is to have my zoom room hangout open on Tuesday, Wednesday, and Thursday from 1-3:30pm. Some days I may not be there for part of the time because of other committments. Also, please email me if there is some other time you want to meet and I will try and accomodate.

What the TAs will do: They will have office hours which I will post the beginning of next week. Each week a TA will send me a lab handout and a recorded zoom lab description each week which I will post.

The final project: There will be no poster session. All students are encouraged to do a final project but you don't have to do one to get an S. I suggest you do a final project and have it in a git directory, especially if you are an MS student looking for a job. If you have reluctance with making your final project public that is fine we don't need to. If you think that it is too much right now for you to do a final project, then tell me and that is also fine. I will post a zoom video discussing the projects Monday. Teams can be upto five people. DO NOT WORRY ABOUT GRADES. Try your best and work remotely as a team to help with distancing and some normalcy. The course project writeup should be about four pages due at the end of the semester. You are absolutely permitted to use your current rotation or research project as course projects. Examples of previous projects can be found at projects. The final projects should be in LaTeX. If you have never used LaTeX before, there are online tutorials, Mac GUIs, and even online compilers that might help you.

There is a Piazza course discussion page. Please direct questions about homeworks and other matters to that page. Otherwise, you can email the instructors (TAs and professor). Note that we are more likely to respond to the Piazza questions than to the email, and your classmates may respond too, so that is a good place to start.

The programming assignments in this course can be done in any language but we will be doing simulations in PyTorch.

The course will follow my lecture notes (this will be updated as the course proceeds), Lecture Notes. Some other texts and notes that may be useful include:

  1. Kevin Murphy, Machine Learning: a probabilistic perspective
  2. Michael Lavine, Introduction to Statistical Thought (an introductory statistical textbook with plenty of R examples, and it's online too)
  3. Chris Bishop, Pattern Recognition and Machine Learning
  4. Daphne Koller & Nir Friedman, Probabilistic Graphical Models
  5. Hastie, Tibshirani, Friedman, Elements of Statistical Learning (ESL) (PDF available online)
  6. David J.C. MacKay Information Theory, Inference, and Learning Algorithms (PDF available online)
  7. Aston Zhang, Zack Lipton, Mu Li, Alex Smola Dive into Deep Learning

The final project TeX template and final project style file should be used in preparation of your final project report. Please follow the instructions and let me know if you have questions.

This syllabus is tentative, and will almost surely be modified. Reload your browser for the current version.


Syllabus

  1. (Jan 15th) Introduction and review: Lecture

  2. Handout for Lab 1

  3. (Jan 17th) Linear regression, the proceduralist approach: Lecture

  4. (Jan 22nd) Bayesian motivation for proceduralist approach: Lecture

  5. (Jan 24th) Bayesian linear regression: Lecture

  6. HW 1 Data for HW 1 HW 1 solutions

  7. (Jan 29th) Regularized logistic regression: Lecture and Support Vector Machines and optimization notes
  8. Handout for Lab 2

  9. (Jan 31st) Gaussian process regression: Lecture

  10. Handout for Lab 3

  11. (Feb 5th) Sparse regression: Lecture

  12. HW 2 Dataset 1 for HW 2 Dataset 2 for HW 2

  13. (Feb 7th) Mixture models and latent space models I: Lecture

  14. (Feb 12th) Mixture models and latent space models II: Lecture


  15. Handout for Lab 4

    Practice midterm

  16. (Feb 14th) Latent Dirichlet Allocation I: Lecture

  17. (Feb 19th) Latent Dirichlet Allocation II: Lecture

  18. (Feb 14-Feb 20) Take home midterm

    Handout for Lab 5

  19. (Feb 21st) Markov chain Monte Carlo I: Lecture

  20. (Feb 26th) Markov chain Monte Carlo II: Lecture

  21. Handout for Lab 6

  22. (Feb 28th) Hidden Markov models II Lecture and more lecture notes

  23. (March 4th) Dimension reduction and embeddings I Lecture

  24. (March 4th) Dimension reduction and embeddings I Lecture

  25. (March 6th) Dimension reduction and embeddings II Lecture

  26. The following are zoom videos and material for the now remote class, I will probably update every two days

  27. (March 25th) Statistical learning theory: Lecture Zoom lecture Slides


  28. (April 1st) Neural networks I Lecture Zoom lecture

  29. (April 3rd) Neural networks I Lecture Zoom lecture

  30. (April 8st) Optimization I Zoom video notes

  31. (April 21-30) Final exam Example final

  32. (May 2nd) Final project due

  33. Below was the order of lectures planned before we had to go remote

  34. (March 18th) Neural networks I Lecture

  35. (March 20th) Neural networks II Lecture

  36. (March 25th) Variational methods and Generative Adversarial Networks I Lecture

  37. (March 27th) Variational methods and Generative Adversarial Networks II Lecture

  38. (April 1st) Optimization I Lecture

  39. (April 3rd) Optimization II Lecture

  40. (April 8th) Computational differentiation Lecture

  41. (April 10th) Statistical learning theory I: Lecture
  42. (April 15th) Computational differentiation Lecture