Down Shift

verify-tagSki jumping results database

europesportscategoricalexploratory data analysisjapan

10

已售 0
10.86MB

数据标识:D17171514971435637

发布时间:2024/05/31

以下为卖家选择提供的数据验证报告:

数据描述

Context

Hello. As a big ski jumping fan, I would like to invite everybody to something like a project called "Ski Jumping Data Center". Primary goal is as below:

Collect as many data about ski-jumping as possible and create as many useful insights based on them as possible

In the mid-September last year (12.09.20) I thought "Hmm, I don't know any statistical analyses of ski jumping". In fact, the only easily found public data analysis about SJ I know is https://rstudio-pubs-static.s3.amazonaws.com/153728_02db88490f314b8db409a2ce25551b82.html

Question is: why? This discipline is in fact overloaded with data, but almost nobody took this topic seriously. Therefore I decided to start collecting data and analyzing them. However, the amount of work needed to capture various data (i.e. jumps and results of competitions) was so big and there is so many ways to use these informations, that make it public was obvious. In fact, I have a plan to expand my database to be as big as possible, but it requires more time and (I wish) more help.

Content

Data below is (in a broad sense) created by merging a lot of (>6000) PDFs with the results of almost 4000 ski jumping competitions organized between (roughly) 2009 and 2021. Creation of this dataset costed me about 150 hours of coding and parsing data and over 4 months of hard work. My current algorithm can parse in a quasi-instant way results of the consecutive events, so this dataset can be easily extended. For details see the Github page: https://github.com/wrotki8778/Ski_jumping_data_center The observations contain standard information about every jump - style points, distance, take-off speed, wind etc. Main advantage of this dataset is the number of jumps - it's quite high (by the time of uploading it's almost 250 000 rows), so we can analyze this data in various ways, although the number of columns is not so insane.

Acknowledgements

Big "thank you" should go to the creators of tika package, because without theirs contribution I probably wouldn't create this dataset at all.

Inspiration

I plan to make at least a few insights from this data:

  1. Are the wind/gate factor well adjusted?
  2. How strong is the correlation between the distance and the style marks? Is the judgement always fair?
  3. (advanced) Can we create a model that predicts the performance/distance of an athlete in a given competition? Maybe some deep learning model?
  4. Which characteristics of athletes are important in achieving the best jumps - height/weight etc.?
data icon
Ski jumping results database
10
已售 0
10.86MB
申请报告