以下为卖家选择提供的数据验证报告:
数据描述
Airline Flight Delay and Cancellation Data, August 2019 - August 2023
US Department of Transportation, Bureau of Transportation Statistics https://www.transtats.bts.gov
Purpose
The purpose of collecting this dataset was to facilitate programmatic retrieval via an API into an AWS EC2 instance for structured data analysis using Spark SQL. The dataset includes several versions, ranging from the complete set of 29 million rows to a reduced set of 3 million rows. This iterative refinement process was aimed at optimizing the use of AWS's distributed computing capabilities to achieve efficient querying and effective data manipulation. Best performance was achieved using the i3, m5, and c5 instances with upgraded storage.
Context
Source data related to flight delays and cancellations for January 2019 – August 2023 retireved from DOT On-Time : Reporting Carrier On-Time Performance (1987-present)
Variables include flight routes (origin, destination), time ranges for events (minutes, local time), delay and cancellation reasons/attributions (limited).
Acknowledgements
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report.
Retrieved in November 2023 using the application at On-Time : Reporting Carrier On-Time Performance (1987-present).
The source data was downloaded in subsets by month and joined by year. Most current available data for 2023 is from August. Data consolidation, transformation, wrangling, variable selection, and label updates were done in csvkit, Python and Excel.
Attribution
This dataset is similar to, adopts header names from, and can be merged with Airline Delay and Cancellation Data, 2009 - 2018.
