以下为卖家选择提供的数据验证报告:
数据描述
Context
It's a common question in linguistics; What are the most commonly used words in the English language?
This data definitively answers the question in the context of the Google Books corpora for works collected from the 1800's to 2019 (version 3).
Content
This data came from the Google Books Ngram Viewer Exports, version 3, exported on Feb 17, 2020:
https://storage.googleapis.com/books/ngrams/books/datasetsv3.html
Specifically, the 1-gram counts were accumulated for each (alphabetical-only) word gathered from the Google Books Corpora, an enormous amount of data scanned in from some of the world's largest collections of literary works.
For example, the 1-gram "the" has been used 125,971,793,511 times in the corpus.
There are over 9 million words listed, including misspellings, scanning errors, etc..
The original 1-gram data has many more entries as they include numbers, special characters, and other features.
Acknowledgements
Thanks to the Google Books team for making this incredible data available.
Inspiration
Natural language processing begins with words, so it is natural to ask which ones we use the most.
License
This compilation is licensed under a Creative Commons Attribution 3.0 Unported License.
Data Source: Google Books (https://storage.googleapis.com/books/ngrams/books/datasetsv3.html)
