以下为卖家选择提供的数据验证报告:
数据描述
Context
I have created this dataset for an easier way to analyse the progression of answers from the respondents that are participating each year in the very famous Data Science Kaggle Survey.
The sources of the present data are:
- 2017: https://www.kaggle.com/kaggle/kaggle-survey-2017
- 2018: https://www.kaggle.com/kaggle/kaggle-survey-2018
- 2019: https://www.kaggle.com/c/kaggle-survey-2019/data
- 2020: https://www.kaggle.com/c/kaggle-survey-2020/data
- 2021: https://www.kaggle.com/c/kaggle-survey-2021/data
Methodology
This dataset was created by manually aggregating each of the 5 tables mentioned above. The full methodology was as follows:
- The 2021 table was took as refference, as it is the latest and most "up to date" in regards with the questions and the Data Science Industry overall evolution.
- Each year in descending order was fully analysed one by one in order to find all questions (and answers) that were the same to the ones found in 2021.
- As we go back in time, the questions lose their completeness more and more, so I would highly suggest analysing percentages on Year, rather than absolute numbers.
The aggregation was done manually, as the questions order, naming and types of answers differ from one year to another. Hence, the most accurate way (although not the most efficient), was to read, order and pick the questions with regards to the base table (which was the 2021 Survey).
Content
This dataset contains the following:
kaggle_survey_2017_2021.csv
: the tabular dataset containing the aggregated data from 2017 to 2021.style.css
: a file that serves as custom styling for my notebook on this competition.images
folder: all images I have used for my notebook on this competition.
Note: Notebook can be found here.
Acknowledgements
Thank you so much to the Kaggle Team for hosting these surveys and sharing with us all the data, so we can take the pulse of the community each year.
Inspiration
The Kaggle Survey is reach in information as is, but what can you find by adding another layer of information - the year? Evolutions in time could be fascinating.
