维维

verify-tagAmharic News Corpus

classificationlstmtextnewsamharic

2

已售 0
71.48MB

数据标识:D17220739045766134

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

This dataset has a corpus Amharic news in the Fidel orthography has texts/categories in .csv format, and an Amharic stop words list curated from various academic sources. An Amharic native speaker edited and expanded the stop words list.

This dataset includes article text and news category data from Azime, I. A., & Mohammed, N. (2021). An Amharic News Text classification Dataset. arXiv preprint arXiv:2103.05639. Details of their dataset:

Category Number of Texts
ሀገር አቀፍ ዜና 20674
ስፖርት 10411
ፖለቲካ 9325
ዓለም አቀፍ ዜና 6543
ቢዝነስ 3894
መዝናኛ 635

Our addition:

Category Count
Business 5276
Politics 5156

We continue to add to our dataset, check back for updates (APRIL 2023).

The stop words list has 714 unique stop words. This list is a general list for Amharic, it is not specific to the corpus of news texts or to news classification.

data icon
Amharic News Corpus
2
已售 0
71.48MB
申请报告