以下为卖家选择提供的数据验证报告:
数据描述
Please see the Discussion for updates and to voice your concerns and suggestions!
In the vein to keep bringing good philosophical works, this dataset offers a different corpus focused on political thought. It contains the corpus of both pre-modern and modern times. Given the constant evolution of such a burgeoning field, I will try to keep the list updated on a weekly or bi-weekly basis adding more classic works or recent gems I found. Though the initial intent was to have a Natural Language Processing task, it is yours to explore and be creative as the possibilities in data are infinite. After web scrapping the original texts, I created some functions to clean and tokenized them. So, you will find an auto-increment column and the four other columns as follows: book title, publishing date, authors, text, text clean.
And above all, it is thanks to Project Gutenberg, a phenomenal platform for all book lovers and generally knowledge avid people, that I could obtain those texts at no cost. So, please support them in their continuous effort in making knowledge accessible: https://www.gutenberg.org/
In the following bullet points, I would propose possible exploration routes but do not feel constrained to go above and beyond:
- An exploratory analysis on term frequency
- A word cloud of a specific author's ideas or the general themes among all authors
- A Recommendation system for someone wanting to read those books with an evolving string of ideas
