294

Arxiv CS Papers and Citation Network

Earth and NatureEducation

20

已售 0
482.49MB

数据标识:D17168852735600337

发布时间:2024/05/28

卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家

申请验证报告

数据描述

About Dataset

This Dataset contains CS Papers on Arxiv with their citation network till 2019. A paper is considered to be a CS paper if it has at least one cs category as dictated by the Arxiv category taxonomy.

The papers have been taken from the Arxiv Dataset. The citation network has been taken from the dataset made public by Clement et al..

Embeddings for each of the paper abstracts has been extracted using the SciBERT model (Beltagy et al.) and are available in the embeddings.parquet file. The paper indices in the cs_papers_wo_embeddings.parquet file match with the embedding indices in embeddings.parquet file.

The LDA Weights correspond to 20 LDA topics are found in lda_weights.parquet. The features provided for each paper were the TFIDF features corresponding to each abstract. These papers are also index matches with the cs_papers_wo_embeddings.parquet file.

data icon
Arxiv CS Papers and Citation Network
20
已售 0
482.49MB
申请报告