CMPUT 633 - Software Analytics


Which part of my software system should I focus my testing efforts on? Which library should I use for cryptography tasks in my application? How can I use a specific API to accomplish a specific task and are there hidden pitfalls I should be aware of? How can I detect more bugs in my code, with the least cost? How can I reduce merge conflicts in my team? All of these, and more, are decisions software developers and other software stakeholders face daily. Software analytics aims to leverage many sources of software data (e.g., source code, version control systems, continuous integration logs, testing logs, crowd-sourced documentation websites such as Stack Overflow) to provide support for such decisions.

This course covers various tools and techniques used for analyzing software data in order to provide actionable insights related to bug prediction, API usage, software evolution, software security, and collaborative software development. This is a seminar-based course where students will read, present, and discuss seminal and state-of-the-art papers as well as implement a project related to the above topics. During the course, students get exposed to source code analysis, natural language processing (NLP), machine learning (ML), and qualitative analysis techniques that are used within the context of software engineering research.


  • Understand various types of software engineering (SE) data sources and how to mine and analyze them
  • Get exposed to practical challenges faced by various stakeholders in the software development cycle
  • Learn how to analyze source code
  • Learn various NLP and ML techniques that can be used to analyze software data
  • Become familiar with how to design quantitative and qualitative empirical SE studies
  • Learn how to critique and write research papers

Course Work

  • Assignments
  • Projects
  • Presentations
  • Participation
  • Paper Reviews

Related Research Areas

  • Software Engineering
  • Software Systems