Uncategorized

How Many Novels Can There Be?

I like reading. I like writing. When you’ve been writing for a while, you start to get really obsessed with word counts. Anybody you talk to about publishing something you’ve written will want to know your word count. For short fiction, you sometimes get paid by the word. And the number of words in the thing you’ve written determines whether it counts as a short story, a novella, a novel, as War and Peace, or as an encyclopedia.

Every year, I participate in National Novel-Writing Month. Unless, you know, I don’t feel like it. But I’ve participated more years than not, and I’ve produced a surprising number of novels. Every single one of them terrible, but that’s not NaNoWriMo’s fault. The goal in NaNoWriMo is to write a novel of at least 50,000 words in 30 days. And I got to thinking: how many novels that length are there?

Well, in the English language, there are somewhere between 100,000 and 1,000,000 words. But you’ll be able to understand 95% of everything written in English by knowing only the 3,000 most common ones. After all, even though it’s a valid word, people generally don’t go around calling each other antipodean anymore.

The question is: “How many 50,000-word novels are possible, using mostly the 3,000 most common words?” The naive answer is to allow each word to be any of those 3,000, which means the number of possible novels is 3,000^(50,000). That’s 1.155 x 10^173,856. You’ll be happy to know that this number is so large that, when I tried to copy and paste the full thing into this article, it crashed my browser.

Of course, this will include novels that consist entirely of the sentence “Anus anus anus anus anus!” over and over again, which is so avant-garde it makes me want to go pee on Samuel Beckett. The list will also contain more coherent, although still somewhat dubious works, like Stuart Ashen’s peerless desk reference, Fifty-Thousand Shades of Grey. But Fifty-Thousand Shades of Grey is actually constructed of coherent sentences. (Well, one coherent sentence, at least…) Most of the novels in this ridiculously long list will be more along the lines of “Him could carpet but also because you die but but the but the but the butt.”

We’re working from a flawed assumption: that a text is just a bunch of words stuck together. But unless you’re James Joyce (or, to a lesser extent, Stephanie Meyer), that’s not how it works. A novel is a bunch of words stuck together in a particular way. Although “that that” is grammatically valid (even though it looks weird on the page), “the the” isn’t, and “centipede cheese carpet muffin” is the kind of thing I say when I haven’t been getting enough sleep.

We’ve been working from the assumption that any word is equally likely to follow any other word. That is, that all word-pairs are equally likely. They’re not. “Our way” is a lot more common than “our anus,” for instance. Naively, the probability of any two-word combination is (1 / 3,000)^2, or 1 in 9,000,000. To put it another way, there are 9,000,000 two-word pairs, 25,000 of which would make up our nonsensical novel. It’d be much closer to reality to assume that, on average, there are only 50 words that make sense after a given word (the number will be much higher (in the thousands, I’d imagine), for words like “the”, and lower for words like “hoist.”) So, in reality, there are only 150,000 two-word combinations that make sense.

We could extend this to three-word combinations, but there are two problems with that: 50,000 isn’t evenly divisible by three, and that repeating decimal will drive me crazy. More importantly, the longer your word-block, the more words become possible at the end, until you’re getting close to 3,000 possibilities again. For example: “The” could be followed by any noun in our 3,000-word list. “The man” must be followed by a verb, the start of an adjective phrase (example: “The man I met last summer“), or something like that. “The man talked” will likely be followed by a word like “to” or “about.” But there’s an enormous range of things that the man could be talking to or about, so pretty much any noun or participle is fair game, bringing the number of possibilities back up into the thousands again.

So how many novels can there be? Well, the upper bound is probably (as we’ve seen), (3,000 * 50)^25,000, which is 1.912 x 10^129,402. That’s still a number so large there’s no name for it, but it’s smaller than our first number by almost fifty thousand orders of magnitude, which is something.

But let’s take it one step further. To simplify the math, I’m going to skip right to four-word combinations. And let’s say that any two-word combination forms the start of a phrase, and that the third word in the phrase can only be one of 10 words, on average. And, to take into account the fact that the number of choices start rising again with a long enough phrase, let’s say the fourth word can be any one of 500 words. The number of possible 50,000-word novels is now (3,000 * 50 * 10 * 1,000)^(12,500), or 1.382 x 10^114,701. So we’ve chopped off another ten thousand orders of magnitude. Still, that’s a big number. And, although I don’t have the math or linguistics background to prove it, I’m guessing that’s pretty close to the number of actual, sensible novels you could construct with 50,000 words: it takes into account the rough structure of the English language. This is related to the idea of a Markov Chain, which is a mathematically-formal way of saying “where you’re likely to go next depends on where you’re at now.”

For your amusement, I’m going to back up this post, and try to copy and paste (3,000 * 50 * 10 * 1,000)^(12,500) just below. If you see a horrific salad of numbers, you’ll know it worked. If you see an apology, you’ll know it crashed my browser again. Wish me luck!

Sorry. It didn’t work. Browser crashed again. But that’s probably good news for you, the reader, since, when I pasted the number of possible sensible novels into my word processor, it produced a document 32 pages long consisting of nothing but digits in 12-point Helvetica. I think that’d make most people’s eyes bleed. Or explode. Or sprout wings and fly away.

The moral of this story is: don’t worry about machines taking over the writing of novels. If a computer could output one word of its current novel every Planck time (which is generally agreed to be close to the shortest time interval that makes sense in our physics), the time it would take would be larger than the current age of the universe. And that’s an understatement. It would actually be so much larger than the current age of the universe, that if I were to express it as a multiple (in the same way I say 10^24 is a trillion trillion times larger than 1), then I’d have to write out the word “trillion” 9,558 times just to express it. If I allow the convention that 1 googol googol is (10^100) * (10^100), or 10^200 times bigger than 1, then I’d need to write “googol” over 1,100 times. There is simply no good way to express the size of this number. It’s 10^110,000 times larger than the age of the universe in Planck times, the diameter of the observable universe in Planck lengths, and the number of particles in the universe.

Boy oh boy. I started out talking about novels, and now I’m getting into numbers that trip the circuit breakers in my brain. Math can be scary sometimes. And you wanna know the scariest thing? There are numbers, like Graham’s number and the outputs of the Ackerman function for inputs larger than (6,6), that make the number of possible novels look exactly like zero by comparison, for any practical definition.

…I need to go lie down now. Although I’m probably going to come back later and talk about really enormous numbers, because part of my brain seems to want me to have a stroke.

Standard

162 thoughts on “How Many Novels Can There Be?

  1. Hardy A. Jackson III says:

    Love your post I couldn’t stop laughing at how you were doing the math. Finding the number of possibilities is endless. Well done! 1 like equals 1 following.

      • Who knows? If it’s really good, we might only need one. A novel so insightful there’s no reason to keep writing.

        Still, I have a hunch that people are still going to keep publishing those dodgy-looking erotic Star Trek paperbacks one way or another…

      • Wait- how many titles can there be? That can be the determining factor for number of novels. You can’t really have a title-less novel.
        Am I spelling it right? Or is it tittle? They both look wrong right now…

  2. aditi1641 says:

    Hahaha. I can see you’ve really worked on this. The sad thing is that the numbers simply take out the life of your writing. Who knows? A lovely, deep story could be even as small as a thousand words. Why do we make such a big deal about words? There are so many short stories that are classics and are taught in classes without any thought about word count.
    Word count is important, but I think sometimes people make a big deal out of it when it’s really not necessary. But your post has given me an interesting perspective on writing. 🙂

    • A fair point, of course. But the question wasn’t how many *good* novels there can be. After all, among all these possible novels would be one that goes “When I was eighteen my father told me, ‘When I was eighteen my father told me, ‘When I was eighteen, my father told me, ‘…””

      And that’s a point I should’ve made in the post (and will probably edit in later on): the remarkable thing about us humans is that we can look at all the choices for the next word in a sentence and pick one that make sense to other humans.

      • That’s a great way to raise the minimum allowable circumference as noted by the ethereal ides of March. I’d say the answer is best contemplated within the same paradigm as the question.

      • The only trouble with that question is that I don’t get to talk about planets, plasma, huge numbers, or silly nonsense, and I don’t get to use the word “anus” once.

      • Truer words have never been uttered from under the tutelage of an anus sensei such as yourself. Though one might argue that a door to nonsensical spermology has hinged in your favour forthwith.

    • I get the feeling I’d be laughed out of TED. Or possibly beaten to death with three-ring binders the second I stepped on stage.

      That said, I’m now imagining Patrick Stewart reading it out and getting depressed knowing it will never happen. XD

      • Wow. But you know what it aill mean something to someone which is part of the point of TED. I have watched some pretty crazy, superfluous, but over the top amazing speeches on there. 😀

      • Yes. If this guy keeps going this way within 40 days he could do a Ted talk, given he keeps going crazy just the same way 😉

      • You haven’t yet? All I know is every November when all you guys are writing novels (or at least the bare bones beginnings of them), I’m sadly looking at work wishing I could participate. I wish there were more time…like 3000^10,000 more seconds of time.

  3. Pingback: Addendum: How Many Novels Can There Be? | Sublime Curiosity

  4. originaltitle says:

    As a writer who went to a science and engineering institute for college, I can definitely appreciate this post! You’ve quelled the fears of all NanoWrimers out there that it’s possible there are enough novels out there to be produced. Great post. Thanks for sharing!

  5. I liked reading this post. Though what I took from it was not the hugeness of the numbers you were discussing.

    No. It was that you used the word ‘carpet’ twice in your examples of nonsense sentences.

    Which led me to this: http://www.etymonline.com/index.php?term=carpet

    And then to that place where you’ve looked at a word so much it looks wrong. So many thanks for showing me the wrongness of carpet! 😀

  6. Reblogged this on The Worlds Inside Our Minds and commented:
    This post is so very interesting. Don’t get put off by the math. Everyone asks about word count. How many words do you write a day. How long will you novel/novella/short story be? It is all about words. It is thought provoking to see it laid out in black and white the number of novels that could possibly be written – of course, some of those wouldn’t make much sense, but that is taken into account. Don’t worry about the math. You aren’t forced to actually solve those nasty equations. I know it gives people heart palpitations when they see powers, and whatnot.

    It was also interesting to know that a prediction was made about computers writing books. Clearly the author has a solid background in technology, because the reasoning was based on solid facts. Next time you see a computer with an imagination, let me know, I will be the first to buy one.

    So enjoy the post,

  7. Here’s another set of numbers for you: there are 26 letters in the alphabet. Brilliant opus or dusty manuscript shelved in the bottom drawer depends on how they are put together.

    • Precisely! A monkey (mathematically-speaking) is just as likely to produce a nice-sounding word like “superfluous” as “xxbgrklnghk.” That’s the weird thing about being human: we’ve figured out how to pick from all those possibilities.

  8. New Zealander. Living in England. Everybody I know here uses the word antipodean pretty much every time I talk to them. (Just sayin’.)
    This was hilarious. I am now in trouble for laughing in the quiet room of the library. Not sure whether this post was about noveling or statistics, but either way – utterly epic. Just worked out the scary thing: the computer-crashing numbers only dealt with books in English.
    I think the nearest explanation of what a library of all the possible books in the world would look like is Jorge Luis Borges’ The Library of Babel.

    • I’ve yet to read The Library of Babel, but I really need to–it sounds like Borge said the same thing I did, but much more elegantly. To nobody’s surprise. XD

      (Funnily enough, I’ve heard Hugh Laurie say “antipodean” in the bloopers for House, so maybe it’s an English thing.)

  9. This post made me think about that fragment on “The Neverending Story” where they claim to be able to write every story possible by throwing a many faces dice containing all the letters symbols of the alphabet one time, and another, and another and another…

  10. Apparently, Tolstoy got paid by the word for “War and Peace”, which is one of the reasons it is so long. He got a contract for publishing it in a magazine in pieces, and, having run out of plot before the end of the contract, had to invent side plots which are absolutely unnecessary and make no sense at all, their only purpose is to avoid contract breach. Go figure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.