
卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家
数据描述
About Dataset
This Dataset contains CS Papers on Arxiv with their citation network till 2019. A paper is considered to be a CS paper if it has at least one cs category as dictated by the Arxiv category taxonomy.
The papers have been taken from the Arxiv Dataset. The citation network has been taken from the dataset made public by Clement et al..
Embeddings for each of the paper abstracts has been extracted using the SciBERT model (Beltagy et al.) and are available in the embeddings.parquet file. The paper indices in the cs_papers_wo_embeddings.parquet file match with the embedding indices in embeddings.parquet file.
The LDA Weights correspond to 20 LDA topics are found in lda_weights.parquet. The features provided for each paper were the TFIDF features corresponding to each abstract. These papers are also index matches with the cs_papers_wo_embeddings.parquet file.
