鱼泪

verify-tagEnglish Word Frequency List

languagesliteraturelinguisticstext

3

已售 0
131.3MB

数据标识:D17220407891433316

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

Context

It's a common question in linguistics; What are the most commonly used words in the English language?

This data definitively answers the question in the context of the Google Books corpora for works collected from the 1800's to 2019 (version 3).

Content

This data came from the Google Books Ngram Viewer Exports, version 3, exported on Feb 17, 2020:

https://storage.googleapis.com/books/ngrams/books/datasetsv3.html

Specifically, the 1-gram counts were accumulated for each (alphabetical-only) word gathered from the Google Books Corpora, an enormous amount of data scanned in from some of the world's largest collections of literary works.

For example, the 1-gram "the" has been used 125,971,793,511 times in the corpus.

There are over 9 million words listed, including misspellings, scanning errors, etc..

The original 1-gram data has many more entries as they include numbers, special characters, and other features.

Acknowledgements

Thanks to the Google Books team for making this incredible data available.

Inspiration

Natural language processing begins with words, so it is natural to ask which ones we use the most.

License

This compilation is licensed under a Creative Commons Attribution 3.0 Unported License.

Data Source: Google Books (https://storage.googleapis.com/books/ngrams/books/datasetsv3.html)

data icon
English Word Frequency List
3
已售 0
131.3MB
申请报告