以下为卖家选择提供的数据验证报告:
数据描述
Context
This database is managed by the US Environmental Protection Agency and contains information reported annually by some industry groups as well as federal facilities. Each year, companies across a wide range of industries (including chemical, mining, paper, oil and gas industries) that produce more than 25,000 pounds or handle more than 10,000 pounds of a listed toxic chemical must report it to the TRI. The TRI threshold was initially set at 75,000 pounds annually. If the company treats, recycles, disposes, or releases more than 500 pounds of that chemical into the environment (as opposed to just handling it), then they must provide a detailed inventory of that chemical's inventory.
Content
- There are roughly 100 columns in this dataset; please see the
tri_basic_data_file_format_v15.pdf
for details. You may also wish to consultfactors_to_consider_6.15.15_final.pdf
for general background about interpreting the data. - I've merged all of the TRI basic data files into a single large csv. You will probably need to process it in batches or use a tool like Dask to stay within kernel memory limits.
- Please note that the 2016 data remains preliminary at the time of this release.
Acknowledgements
This dataset was released by the US EPA. You can find the original dataset, more detailed versions of the data, and a great deal of background information here: https://www.epa.gov/toxics-release-inventory-tri-program/tri-data-and-tools
Inspiration
The EPA runs an annual university contest. Their list of previous winners contains a lot of great ideas that people have had for this dataset in the past. The 2017 competition is already over, but you can find the rules here.
