以下为卖家选择提供的数据验证报告:
数据描述
Content
This dataset currently contains data from the start of the 2016 season through the end of the 2021 postseason. I plan to update it roughly weekly.
pitches.csv is one of the main files. It contains information about each pitch (as found on ESPN, I've noticed some games that are missing some at-bats, with the newest being from 2019 I believe). games.csv and events.csv are two other files of note.
Some of the other files contain information that can be gleamed from the other files. Over time I may try to cut down on this to reduce the number of files. I may also try to reduce the overall size of the dataset by changing several fields to use IDs instead of strings (just a heads-up).
Files
games.csv - general game info
hittersByGame.csv - how each player did in each game
pitchersByGame.csv - how each player did in each game
plays.csv - batter events - batter singled, batter struck out, etc
events.csv - general events - Have event id (per game) to join with next
pitches.csv - one row per pitch per game
inningScore.csv - score per inning
inningHighlights.csv - # of runs, hits, and errors per inning
hittingNotes.csv
pitchingNotes.csv
baserunningNotes.csv
fieldingNotes.csv
letterNotes.csv - for notes attached to batters (and maybe pitchers)
awards directory -- one file per award
Links
- Scraped from https://www.espn.com/mlb/schedule -- Code can be found here
- Image Link
Inspiration
- Which pitchers are best at getting out of a 3-2 count?
- What hitters have the highest likelihood of scoring the runner on third if there is one?
- What is the highest number of pitches each team has thrown in one inning over the last few years?-
