Culturomics

 December 18, 2010

\newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}}

\newenvironment{question}[1][]{\par\textbf{Question (#1).}}{} \newenvironment{theorem}[1][]{\par\textbf{Theorem (#1).}}{} \newenvironment{lemma}[1][]{\par\textbf{Lemma (#1).}}{} \newenvironment{proof}{\textit{Proof.}}{}

I have really fallen in love with Google Books Ngram Viewer, so I thought I’d do a little ``culturomics" myself. Here’s an image I made using Google’s data:

Numbers in Print

The brightness of the pixel at position (x,y) is related to how frequently “ x ” appears in books published in the year y . Specifically, if p is the number of times “ x ” appears in print during year y , divided by the number of times any number less than 2100 appears in print during that year, then (1 - p)^{1500} is the brightness of the pixel at (x,y) .

The dark, diagonal edge along the right hand side appears because in year y there are many published appearances of numbers near y .

Dark diagonal edge

World events have left their mark on the numbers appearing in books! For example, 1914 is still being talked about long after 1914, as evidenced by the darker line above 1914.

If we look at numbers just above 1000 and turn up the contrast a bit,

Around one thousand

we see an echo of the dark diagonal, from people writing (or more likely, the OCR software reading) zero instead of nine in the year. There’s a dark column for the Norman conquest in 1066; a number like 2^{10} = 1024 was not so important until the 20th century.

If we look at numbers just above 1300,

Above 1300

we can see an diagonal line from 1800s being read as 1300s, and a dark vertical line above 1453 (the “end” of the middle ages). In the 18th century,

Above 1700

1776 is quite visible. And finally, a puzzle:

Why 2044

Why was “2044” so significant until the 1920s?

2043,2044,2045 in Google ngrams viewer

I’d love to know the answer to this question. The only thing I can guess that might relate the year 1919 to the year 2044 is solar eclipses.