OREM 3309 - Information Engineering
All course material is provided under the CC Attribution Share Alike license.
Course Description
This course places an emphasis on working with data, databases, and performing and interpreting descriptive analytics in the context of contemporary, data-rich decision making environments including various engineering and management applications. The course will introduce the use of databases (database management systems and SQL) and discuss the functions of practical (e.g., corporate or public) data warehouses while placing an emphasis on formal methods of descriptive analytics including data preparation, visualization, and interpretation.
Prerequisites: CS 1341, OREM 3340/CS/STAT 4340
Details and the syllabus can be found on Canvas.
Course Outline
Introduction
- Read Syllabus
- Slides: Course introduction
- Additional Material: Competing on Analytics
Spreadsheets
- Purpose: Spreadsheets (tables) are the most basic form of organizing data. We will learn about importing data, cleaning, formatting, functions, and pivot tables.
- Tools: Excel (install the latest version from SMU Office 365 download - use install Office link)
- Reading:
- If you need to brush up on basic Excel skills, then go through the reference material below.
- Things to know about Excel Tables (optional: Using structured references with Excel tables)
- Things to know about Pivot tables in Excel
- Case Study: MLB player heights, weights, and ages.
- Reference Material:
- Excel Tutorial (Basics, Editing Worksheets, Formatting Cells, and Working with Formulas. You can also watch the video tutorials)
- Excel for Data Analysis Tutorial (Read: Importing data, cleaning data with text function, tables, conditional formatting, sorting, and filtering)
SQL
- Purpose: Relational databases are the most used way to store, manage and manipulate data. We will talk about database design, importing data, simple SQL queries, joins, and aggregation.
- Tools: DB Browser for SQLite (install the latest version)
- Reading Material: SQLite Tutorial
- Slides:
- Examples from class: Product Database (SQLite; save and open with DB Browser), Example Queries, Cross Products and Joins
- Exercises to check your SQL knowledge: SQL Exercises, Practice, Solution
- Additional Material:
The Entity-Relationship Model
- Purpose: Understanding how data is organized in a relational model is important to decide on how to store data and to work with provided data.
- Reading Material: ER Model Tutorial
- Slides:
- Additional Material: Chen’s Notation, Crow’s Foot Relationship Symbols Cheat Sheet
Data Handling with R
- Purpose: Learn to import, clean and explore data for visualization and decision making.
- Tools: R and RStudio.
- Slides: Introduction to Analytics
- Reading Material: An Introduction to R and R for Data Science by Hadley Wickham and Garrett Grolemund
Introduction
- Learning Goals
- What is R?
- RStudio and a first R session
- R Basics including vectors and subsetting
- Slides: 1. Introduction to R
- Learning Material
- Reading for this session: An Introduction to R (Chapters 1 and 2)
- R Manuals, Packages and Task Views (to find packages) can be found on The Comprehensive R Archive Network (CRAN).
- Finding solutions: Google or go to stackoverflow.com and use the tag “[R]” in your search.
- Cheat sheets: RStudio IDE (cheat sheet from RStudio Cheat Sheets)
Programming Basics
- Learning Goals
- Objects in R
- Importing and exporting data
- Functions, loops, and apply
- Slides
- Reading for this session: An Introduction to R (Chapters 3-7 and 9-10)
- Cheat sheet: Base R
Tidyverse and ggplot2
- Learning Goals:
- Understand how to handling data with tidyverse.
- Understand how ggplot2 graphs are created.
- Intro:
- Example:
- Free textbooks:
- R for Data Science by Grolemund and Wickham.
- Data Visualization: A practical introduction by Kieran Healy.
- Cheat Sheets:
Exploring Data and Reporting
- Learning Goals
- Creating reports
- Creating dashboards
- Exercises
- Create a report for the MLB data with RMarkdown.
Here is an example:
MLB.html
(Markdown file: MLB2.Rmd) - Creating a dashboard for the MLB data with flexdashboard.
Here is an example:
MLB_dashboard.html
(Markdown file: MLB_dashboard2.Rmd)
- Create a report for the MLB data with RMarkdown.
Here is an example:
MLB.html
- Learning Material
Modeling in R
- Learning Goals:
- What predictive modeling, and machine learning?
- Training and prediction in R
- Using the package caret
- Slides: 5. R for Data Science
- Learning Material
- Free textbook: R for Data Science
- Tutorial: Create predictive models in R with Caret
- Documentation: The caret Package (available models)
- StatQuest Videos: Machine Learning (watch: A Gentle Introduction to Machine Learning)
Textbooks
The textbooks are not required.
-
A First Course in Database Systems (3rd Edition), Jeffrey D. Ullman, Jennifer Widom, Pearson (2007). ISBN-10: 013600637X [Companion page]
-
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, by Hadley Wickham andGarrett Grolemund, O’Reilly Media; 1st edition (January 17, 2017), ISBN-10: 1491910399. Free Web book
Code and Data Used in Class
Learning Resources
- SQLite page (with documentation and tutorials)
- SQLite Tutorial (Tutorialspoint)
- Using SQLite in Python
- Using SQLite in R
- SQL Tutorial (Tutorialspoint)
- Database Systems: The Complete Book by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. Companion web site.
- 7 Steps to Mastering SQL for Data Science (KDnuggets)
- R for Data Science by Hadley Wickham and Garrett Grolemund