About Dataset
This Dataset contains CS Papers on Arxiv with their citation network till 2019. A paper is considered to be a CS paper if it has at least one cs category as dictated by the Arxiv category taxonomy.
The papers have been taken from the Arxiv Dataset. The citation network has been taken from the dataset made public by Clement et al..
Embeddings for each of the paper abstracts has been extracted using the SciBERT model (Beltagy et al.) and are available in the embeddings.parquet file. The paper indices in the cs_papers_wo_embeddings.parquet file match with the embedding indices in embeddings.parquet file.
The LDA Weights correspond to 20 LDA topics are found in lda_weights.parquet. The features provided for each paper were the TFIDF features corresponding to each abstract. These papers are also index matches with the cs_papers_wo_embeddings.parquet file.
看了又看
验证报告

目前该文件尚无匹配的数据质量验证程序。我们将在后续版本中提供相应的验证支持,敬请谅解。





