以下为卖家选择提供的数据验证报告:
数据描述
Context
Parallel to the dataset CORD-19 of scholarly articles, we provide the literature graph (10.5281/zenodo.3728215) composed of not only articles (graph nodes) that are relevant to the study of coronavirus, but also in and out citation links (directed graph edges) to base navigation and search among the articles. The article records are related and connected, not isolated. The graph has been updated weekly since March 26, 2020. The current graph includes 42,279 hot-off-the-press (HOTP) articles since January 2020. It contains 485,097 articles and 4,259,944 links. The link-to-node ratio is remarkably higher than some other existing literature graphs. In addition to the dataset we provide more functionalities at lg-covid-19-hotp.cs.duke.edu such as new articles, weekly meta-data analysis in terms of publication growth over time, ranking by citation, and statistical near-neighbor embedding maps by similarity in co-citation, and similarity in co-reference. Since April 11, we have enabled a novel functionality - self-navigated surf-search over the maps. At the site we also take courtesy input of COVID-19 articles that are missing from the current collection.
Content
Graph data are composed of not only datum records (nodes) but also relations (edges) among datum records. The literature graph LG-covid19-HOTP is generated in the following way. We started with 50 seed articles. We make a forward span by searching the articles that cite the seed articles, and name the set as the foreground HOTP-FG, which includes the seed articles. We then make a backward span from HOTP-FG by tracing all the reference (outCitation) lists. We complete the graphs with citation links among the collected article records.
We will make updates of the literature graph. We take input from the research community on seminal or noticeable articles that are missed in the current collection.
Sources & tools
