以下为卖家选择提供的数据验证报告:
数据描述
There are two files in this dataset. One dataset contains line items from 10-K and 10-Q forms filed between 2009-04-15 and 2023-09-06. The other dataset, "line_item_counts.csv", contains the frequency that each line item occurs, along with a description of the line item.
I was originally looking for a dataset with up to date company information but couldn't find anything that was current and beginner friendly to use. So I decided to pull data directly from SEC Edgar to create a tidy table from their dataset. I have yet to use it but figured I would share what I have so far in case anyone was in my position.
I'll release more info about my process in the near future, but for now I hope that you find some use from this dataset.
I have also released a sample notebook to show how you can load the large dataset into Kaggle without exceeding memory limits. Hopefully this can help you get started if you want to try in Kaggle. Other options would be to download the dataset locally and use your preferred ide to work with the dataset, and the operations would be limited by the memory currently available on your computer OR you could look into using a cloud computing platform like AWS EC2 or GCP to work with the dataset.
