294

Arxiv CS Papers and Citation Network

Earth and NatureEducation

20

已售 0
482.49MB

数据标识:D17168852735600337

发布时间:2024/05/28

About Dataset

This Dataset contains CS Papers on Arxiv with their citation network till 2019. A paper is considered to be a CS paper if it has at least one cs category as dictated by the Arxiv category taxonomy.

The papers have been taken from the Arxiv Dataset. The citation network has been taken from the dataset made public by Clement et al..

Embeddings for each of the paper abstracts has been extracted using the SciBERT model (Beltagy et al.) and are available in the embeddings.parquet file. The paper indices in the cs_papers_wo_embeddings.parquet file match with the embedding indices in embeddings.parquet file.

The LDA Weights correspond to 20 LDA topics are found in lda_weights.parquet. The features provided for each paper were the TFIDF features corresponding to each abstract. These papers are also index matches with the cs_papers_wo_embeddings.parquet file.

看了又看

暂无推荐

验证报告

目前该文件尚无匹配的数据质量验证程序。我们将在后续版本中提供相应的验证支持,敬请谅解。

data icon
Arxiv CS Papers and Citation Network
20
已售 0
482.49MB
申请报告