Genesis clusters around the Akedah.

February 26, 2007 theology personal

Someone contacted me with some questions about Bayesian document clustering; with that inspiration and a free afternoon a few weeks ago, I took a Hebrew bible and built a matrix where equals the frequency of the -th (Hebrew!) word in the -th chapter of Genesis. I calculated its singular value decomposition (supposedly this is “latent semantic analysis”), and then took some dot products (calculating the “correlation” of chapters)…

Anyhow, the result was astounding! The following table gives, for each chapter, a list of those chapters for which the given chapter is the chapter most highly correlated with it. Ah, that’s confusing; as an example to clarify this, the chapter most similar to chapters six, seven, eight, and nine is chapter one. With that, here’s the data:

Chapter 1:	2, 6-9
Chapter 5:	11
Chapter 7:	1
Chapter 10:	12-15, 34, 36, 46, 49
Chapter 11:	5
Chapter 15:	16
Chapter 21:	3, 22
Chapter 22:	4, 17-33, 35, 38, 44
Chapter 36:	10
Chapter 37:	43
Chapter 40:	41, 45, 47, 50
Chapter 41:	39
Chapter 45:	37, 42, 48
Chapter 50:	40

The shocking thing is that for 21 chapters of Genesis–for nearly half the book–the most highly correlated chapter is chapter 22–the binding of Isaac. In my mind, that story is the most powerful in Genesis, central to the message, and so it is especially remarkable that this crazy game with matrices also “detected” that most of Genesis clusters around that story.