This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Loading the MLB Dataset

Reading the data from the Web.

mlb <- read.csv("https://michael.hahsler.net/SMU/DS_Workshop_Intro_R/examples/MLB_cleaned.csv")

Inspect the first few data rows.

Raw R output:

head(mlb)
##   First.Name Last.Name Team      Position Height.inches. Weight.pounds.   Age
## 1       Jeff    Mathis  ANA       Catcher             72            180 23.92
## 2       Mike    Napoli  ANA       Catcher             72            205 25.33
## 3       Jose    Molina  ANA       Catcher             74            220 31.74
## 4      Howie  Kendrick  ANA First Baseman             70            180 23.64
## 5     Kendry   Morales  ANA First Baseman             73            220 23.70
## 6      Casey  Kotchman  ANA First Baseman             75            210 24.02

A fancy option:

knitr::kable(head(mlb))
First.Name Last.Name Team Position Height.inches. Weight.pounds. Age
Jeff Mathis ANA Catcher 72 180 23.92
Mike Napoli ANA Catcher 72 205 25.33
Jose Molina ANA Catcher 74 220 31.74
Howie Kendrick ANA First Baseman 70 180 23.64
Kendry Morales ANA First Baseman 73 220 23.70
Casey Kotchman ANA First Baseman 75 210 24.02

An interactive option (HTML output only):

DT::datatable(mlb)

Data summary

The dataset contains 1034 players. Here is a summary:

knitr::kable(summary(mlb))
First.Name Last.Name Team Position Height.inches. Weight.pounds. Age
Jason : 27 Johnson : 9 NYM : 38 Relief Pitcher :315 Min. :67.0 Min. :150.0 Min. :20.90
Chris : 26 Perez : 7 ATL : 37 Starting Pitcher:221 1st Qu.:72.0 1st Qu.:187.0 1st Qu.:25.44
Mike : 26 Gonzalez : 6 DET : 37 Outfielder :194 Median :74.0 Median :200.0 Median :27.93
Scott : 24 Hernandez: 6 OAK : 37 Catcher : 76 Mean :73.7 Mean :201.7 Mean :28.74
Ryan : 23 Jones : 6 BOS : 36 Second Baseman : 58 3rd Qu.:75.0 3rd Qu.:215.0 3rd Qu.:31.23
Matt : 19 Ramirez : 6 CHC : 36 First Baseman : 55 Max. :83.0 Max. :290.0 Max. :48.52
(Other):889 (Other) :994 (Other):813 (Other) :115 NA NA NA

The height distribution

Base R:

hist(mlb$Height.inches.)

Use plotly (HTML output only):

plotly::plot_ly(mlb,  x = ~Height.inches., type = "histogram")

Exercises - Analyse an MLB Team

The goal of this exercise is to create a report that analyzes an MLB Team of your choice including basic information, tables and graphs.

  1. Change the title and author.
  2. Clean up the document and only keep the tables/figures you like.
  3. Write what team you are analyzing.
  4. Do a in-dept analysis of one team of your choice including tables and charts. Answer questions like:
    • How many players has the team in total?
    • Who is the youngest player?
    • How many players does the team have for each position?
    • What is the age range of the players?
    • Is there a difference in height and weight between positions?