Course Description

Data have become one of the most critical resources in today’s world. This course provides a first introduction to the exciting field of data science using applications and case studies from various domains (e.g., social media, marketing, sociology, engineering, digital humanities). The course will introduce data-centric thinking including a discussion of how data is acquired, managed, manipulated, visualized, and used to support problem-solving. The fundamental practical skills necessary will be taught in class, and each step will be illustrated with small examples. Tools presented in this course include SQL, Excel, along with other state-of-the-art tools.

Prerequisites: None

Outline

Note: Readings need to be done before class at home. The reading material will be part of the exam.

1. Introduction

2. Tables (Spreadsheets)

3. Visualization and Charts

4. Tables (Relational Databases)

5. Descriptive Analytics: Data, Distributions and Correlation

6. Predictive Analytics: Data Mining/Machine Learning

Textbooks [not required]

  • Data Science and Rapidminer: Vijay Kotu, Data Science: Concepts and Practice, Morgan Kaufmann; 2nd edition (December 21, 2018).
  • Visualization: Eduard R. Tufte, The Visual Display of Quantitative Information, 2nd Edition, May 2001, ISBN: 1930824130, Graphics Press.
  • SQL: Learn SQLite, tutorialspoint.

Software