wei德辉

Telugu NLP

social sciencenews

￥4

88.72MB

数据标识：D17222541544812890

发布时间：2024/07/29

Context

Indic NLP - Natural Language Processing for Indian Languages.

This dataset is a step towards the same for telugu language. Thanks to Anusha for getting the data from websites. The idea is to add more datasets related to Telugu NLP at a single place.

Similar dataset for other Indian languages

Tamil

Content

The dataset has the following files

Telugu Books

This folder has the file that has the text extracted from telugu books. The data is obtained from this link by Anusha and put together as a single file.

Telugu News

This folder has telugu news extract that can be used for multi-class classification problems. The folder has two files - train and test. Categories of the news are following

business
editorial
entertainment
nation
sport

The data is obtained from this link by Anusha. Post processing is done to extract the above five topics.

Acknowledgements

Sincere thanks to Anusha for collating the dataset from multiple places.

Photo by Prasanth Dasari on Unsplash

Inspiration

Some ideas would be

Books data can be used for nlp tasks like topic modeling, word embeddings, transfer learning etc
News dataset can be used for supervised learning problems

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

Telugu NLP

￥4

88.72MB

申请报告

Telugu NLP

Context

Content

Acknowledgements

Inspiration

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群