CS 4641-B Machine Learning — Spring 2020
Tuesday & Thursday 1:30pm-2:45pm, Instructional Center room 111
Instructor: Brian Hrolenok
@cc.gatech.edu email: brian.hrolenok
Office Hours: 3:00pm-4:00pm, T/Th, in the classroom or the area immediately outside. In response to COVID-19, office hours have been moved online. See Canvas for more information.
Course description
CS 4641 is a 3-credit introductory course on Machine Learning intended for undergraduates. Machine Learning is the area in the broader field of Artificial Intelligence that focuses on algorithms for making the best decisions given data. The theoretical and practical specifics of each of these terms in a variety of problem domains form the core of ML research. This course is an introduction to a very broad and active field, and presents specific algorithms and approaches in such a way that grounds them in broader classes within that field. Topics will include supervised and unsupervised learning, optimization methods, Bayesian inference techniques, and reinforcement learning. The course also covers theoretical concepts such as inductive bias, PAC and Mistake-bound learning frameworks, and computational learning theory. This course will include several individual programming and report based assignments.
Learning objectives:
- To provide a broad survey of approaches and techniques in ML.
- To develop a deeper understanding of several major topics in ML.
- To develop the design and programming skills that will help you to build computational artifacts that learn from data.
- To develop the basic skills necessary to pursue research in ML.
Prerequisites. The official prerequisite for this course is CS 1331, although familiarity in the following topics will be useful:
- Linear algebra
- Probability
- Calculus
- Statistics
- Data structures
- Computational complexity
Textbook: There is no textbook for this class. Specific readings will be provided via Canvas.
Homeworks
All assignment submissions will be handled through Canvas, and are due by the date and time listed there. Submissions by email will not be accepted.
Late Policy
You have three free late days to be used at your discretion thoughout the semester. That means you might turn in one assignment two days late or two different assignments one day late, etc. A free late day is "used" one minute after an assignment due date. A second free late day is "used" 24 hours and one minute after the due date. A third free late day is used 48 hours and one minute after the due date. After the free late days are exhausted, you will receive a 20% penalty per day.
Homework 0: This ungraded and optional assignment is intended as a guide for students who are uncertain about the background material (pdf). IMPORTANT NOTE: all students are strongly encouraged to review this homework.
Homework 1: This first assignment asks you to explore methods for solving regression problems using Linear Regression, Gradient Descent, and Random Fourier Features.Homework 2: This assignment investigates classification problems through Random Forests, k-Nearest Neighbors, and Support Vector Machines.
Homework 3: Clustering via k-Means and Principle Components Analysis.
Project
The final project in this course will be a synthesis of the wide variety of techniques we discuss throughout the semester, with a focus on comparative analysis.
Quizzes
Throughout the semester, there will be several participation quizzes given via Canvas. These will be short, multiple-choice, and you will receive full credit as long as you complete the quiz by the given due date.
There will be one in class quiz this semester, exact date TBD but near the 13th week. The quiz will be closed-book, closed-notes, and relatively short. There will be no make up for this quiz unless previously arranged (well in advance), or excused by the Dean of Students. There is no final exam for this class.
Grading policies
Your TAs and I will strive to provide you reasonably detailed and timely feedback on every assignment and quiz. If you have any questions about any of your grades please reach out to us, either by coming to scheduled office hours or via your "@gatech.edu" email address. If there is an error with your grade, please contact us within a week of when feedback is returned, otherwise we might not be able to change it.
Point breakdown:
- Homeworks: 20% each (60% total)
- Participation quizzes: 5%
- In class quiz: 10%
- Project: 25%
Academic Integrity
All of the assignments in this class are individual work only. For some aspects of some assignments you are allowed and even encouraged to use resources publicly available on the Internet, with two caveats:
- When you can use public resources, it will be explicitly stated. If it's not explicitly stated, assume it's not allowed. If you're unsure, ask first.
- Thoroughly document where and when you obtained any code or libraries that you use which you did not write yourself. Otherwise, you run the risk of appearing to misrepresent someone else's work as your own. When in doubt, be explicit about where the code came from.
Topic outline
- Linear regression, basis function expansion
- Gradient Descent
- Maximum Likelihood Estimation
- k-Nearest Neighbors
- Decision Trees, Bagging and Boosting
- Logistic Regression
- Support Vector Machines, the Kernel Trick
- Naive Bayes, Bayes Nets
- Bayesian Learning, Computational Learning Theory
- Feature selection and transformation
- Clustering
- Expectation Maximization
- Markov Decision Processes
- Neural Networks
- Experimental Design