Clustering Shakespeare.

 January 23, 2008 personal

\newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}}

\newenvironment{question}[1][]{\par\textbf{Question (#1).}}{} \newenvironment{theorem}[1][]{\par\textbf{Theorem (#1).}}{} \newenvironment{lemma}[1][]{\par\textbf{Lemma (#1).}}{} \newenvironment{proof}{\textit{Proof.}}{}

I ran my clustering program (which I just ran on the New Testament) on Shakespeare’s plays—which were conveniently packaged into a text file by Open Source Shakespeare.

The result was the following graph:

Clustering of Shakespeare’s Plays

I know little about Shakespeare, so I can’t say too much about the above image. I’d love to know what you think: does this arrangement of his plays make any sense?

Given that modern processors are so good at vector and matrix calculations, I’m surprised that this sort of visualization tool doesn’t appear in more places. For instance,
  • Your blogs and email could be organized this way. Imagine lasso-ing a bunch of similar emails to reply to them all at once!
  • News could be organized into nice piles.
  • Your desktop and personal files could be arranged automatically into relevant piles.

Then again, maybe the idea of piles appeals to me more than most people—just look at how I organize the papers and books on my desk!