Data has been growing in size at an increasing pace over the last decade, leading to an increased need for efficient and effective management as well as analysis of ever larger data sets. Data sets have become so large that traditional data management tools and processing models are impose severe limitations on what can be done with the data.
This course will introduce principles for big data analytics that have been developed in response to the challenges for big data processing and analysis. In essence, it will be a course on data mining methods with a focus on data sets that are too large to fit into main memory.
The course is broken into several sections. The first section will introduce MapReduce and the new software stack. The following sections will introduce a selection of data methods. Students will learn the principles of these methods, study selected algorithms and implementations, and familiarize themselves with example applications.
Student will also gain hands-on experience with the development, improvement, and implementation some of selected method(s) in their course projects.
- MapReduce and the new software stack
- Finding similar items
- Mining data streams
- Frequent itemsets
- Recommendation systems
- Assignments (30%)
- Term project (45%)
- Paper presentation (20%)
- Class participation (5%)