以下为卖家选择提供的数据验证报告:
数据描述
This dataset (MegaGeoCOV Extended), which is an extended version of MegaGeoCOV, was introduced in this paper: A Twitter narrative of the COVID-19 pandemic in Australia (the paper will appear in proceedings of the 20th ISCRAM conference, Omaha, Nebraska, USA May 2023). Please refer to the paper for more details (e.g., keywords and hashtags used, descriptive statistics, etc.).
MegaGeoCOV Extended contains over 25.2 million geotagged tweets (multilingual) specific to the COVID-19 pandemic. We also provide an English-only version which has 17.8 million tweets. We used Twitter's Full-archive search endpoint for curating this dataset. A free IEEE account is sufficient to access the data files. As per Twitter's content re-distribution policy, we share tweet identifiers; the identifiers need to be hydrated to recreate the dataset locally. Hydration can be easily done with tools such as Hydrator and twarc. The dataset includes the following tweet objects for filtering the tweet identifiers: created_at, id, author.verified, author_id, geo.country, and source. Note that, after hydration, the number of tweets can vary as deleted or private tweets are not retrievable.
