Numbers

The assignment is worth 10% of your final grade.

Why?

Now it's time to explore unsupervised learning algorithms. This part of the assignment asks you to use some of the clustering and dimensionality reduction algorithms we've looked at in class and to revisit earlier assignments. The goal is for you to think about how these algorithms are the same as, different from, and interact with your earlier work.

The same ground rules apply for programming languages.

Read everything below carefully!

The Problems Given to You

You are to implement (or find the code for) six algorithms. The first two are clustering algorithms:

You can choose your own measures of distance/similarity. Naturally, you'll have to justify your choices, but you're practiced at that sort of thing by now.

The last four algorithms are dimensionality reduction algorithms:

You are to run a number of experiments. Come up with at least two datasets. If you'd like (and it makes a lot of sense in this case) you can use the ones you used in the first assignment.

  1. Run the clustering algorithms on the datasets and describe what you see.
  2. Apply the dimensionality reduction algorithms to the two datasets and describe what you see.
  3. Reproduce your clustering experiments, but on the data after you've run dimensionality reduction on it.
  4. Apply the dimensionality reduction algorithms to one of your datasets from assignment #1 (if you've reused the datasets from assignment #1 to do experiments 1-3 above then you've already done this) and rerun your neural network learner on the newly projected data.
  5. Apply the clustering algorithms to the same dataset to which you just applied the dimensionality reduction algorithms (you've probably already done this), treating the clusters as if they were new features. In other words, treat the clustering algorithms as if they were dimensionality reduction algorithms. Again, rerun your neural network learner on the newly projected data.

What to Turn In

You must submit:

  1. A file named README.txt that contains instructions for running your code
  2. your code (link only in the README.txt)
  3. a file named yourgtaccount-analysis.pdf that contains your writeup.

The file yourgtaccount-analysis.pdf should contain: 

It might be difficult to generate the same kinds of graphs for this part of the assignment as you did before; however, you should come up with some way to describe the kinds of clusters you get. If you can do that visually all the better. 

Note: Analysis writeup is limited to 10 pages total.
 

Grading Criteria

At this point, you are not surprised to read that you are being graded on your analysis more than anything else. I will refer you to this section from assignment #1 for a more detailed explanation. As always, start now.