Skip to content

Culturomics: 5,195,759 digitized books analyzed, see for yourself

2010 December 17
by Michał

E-rumors spread some time ago that Google launched a project to digitize all the books there are. Recent issue of Science magazine contains an article reporting an analysis of 5 million digitized books, which, according to Google, accounts for around 4% of all the books ever published. By tracing word or phrase sequences through years 1800-2000 you can fantastically trace the evolution of culture throughout XIX and XX century. They used it to show for example that:

  • 500 000 words in English are missed by all dictionaries
  • evolution of language, like popularity of forms “burned” vs “burnt”
  • popularity of artists, scientists, politicians.
  • and more…

The project is called Culturomics. The Books Ngram Viewer, a tool to visualize word and phrase frequencies in the dataset, not unlike Google Trends for the search keywords, is publicly available. Check it out! It’s very addictive. Some examples:

Any other examples of nice dynamics?

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS