以下为卖家选择提供的数据验证报告:
数据描述
This dataset has a corpus Amharic news in the Fidel orthography has texts/categories in .csv format, and an Amharic stop words list curated from various academic sources. An Amharic native speaker edited and expanded the stop words list.
This dataset includes article text and news category data from Azime, I. A., & Mohammed, N. (2021). An Amharic News Text classification Dataset. arXiv preprint arXiv:2103.05639. Details of their dataset:
Category | Number of Texts |
---|---|
ሀገር አቀፍ ዜና | 20674 |
ስፖርት | 10411 |
ፖለቲካ | 9325 |
ዓለም አቀፍ ዜና | 6543 |
ቢዝነስ | 3894 |
መዝናኛ | 635 |
Our addition:
Category | Count |
---|---|
Business | 5276 |
Politics | 5156 |
We continue to add to our dataset, check back for updates (APRIL 2023).
The stop words list has 714 unique stop words. This list is a general list for Amharic, it is not specific to the corpus of news texts or to news classification.
