以下为卖家选择提供的数据验证报告:
数据描述
This dataset is obsolete, superseded by this one and will not be updated anymore..
Fitted on data points as on 3 April 2020
Context
This dataset is created as a part of covid-19 global forecasting challenge. It contains parameters for the SIR model for different locations worldwide.
The model is defined as ODE system as follows:
The models are fitted on John Hopkins University data (time series) using several runs of Nelder-Mead simplex optimization method (best run is taken) starting at different initial locations and RMSE as a loss.
What parameters are fitted (estimated) per country/province:
- the day when the infection emerged in the country
- the initial infected count on the first day of the infection
- beta - an average number of contacts (sufficient to spread the disease) per day each infected individual has
- gamma - fixed fraction of the infected group that will recover during any given day
- R0 - how many susceptible people are infected (on average) by single infected individual. Equals beta/gamma
- initial susceptible population (e.g. init suscept pop in the figures) - how many people are susceptible with regards to the quarantine measures at the modelled location
How to read the figures.
points are real observed data provided by Johns Hopkins University
curves are model prediction
blue is susceptible population - people that are not yet infected but can get the infection
red is infected population
green is removed population (recovered or dead). people that are not susceptible any more as they came through the infection.
Content
The dataset contains 3 data portions:
- Fitted SIR model parameters for different locations worldwide.
- Figures that visually show how the fitted parameters match the data points.
- CSV files with prediction for one year in the future for each individual location.
Warning
Always do visual check of the model fit (per_location_figures
directory) for quality control before start to use the corresponding parameter values in your analysis.
Acknowledgements
Thanks a lot Kaggle for organizing data sharing and challenges that make the world better.
Also many thanks to John Hopkins University for their hard work of gathering COVID-19 statistics worldwide.
Inspiration
You can try to find correlation between model parameters (e.g. gamma - patient recovery rate) and other properties of the modelled locations worldwide (e.g. weather, population density, level of medical care, etc.)
