math, short, thought experiment

Short: Probabilities

For this thought experiment, let’s equate a probability of 1 (100% chance, a certainty) with the diameter of the observable universe. The diameter of the observable universe is about 93 billion light-years (because, during the 13.8 billion years since it started, the universe has been steadily expanding). With this analogy, let’s consider some probabilities!

According to the National Weather Service, your odds of being struck by lightning this year (if you live in the US, that is) are 1 in 1,042,000. Less than one in a million. One part in a million of the diameter of the universe is 93,000 light-years, which is far enough to take you outside the Milky Way, but on a cosmic scale, absolutely tiny.

The odds of winning the jackpot with a single ticket in the U.S. Powerball lottery are around 1 in 292 million. That’s like 318 light-years set against the diameter of the universe. 318 light-years is a long way. Even so, it’s an almost-reasonable distance. Most of the brighter stars you see in the night sky are closer than that. That’s almost the Sun’s neighborhood. Compared to the entire universe. Maybe that’s why they say the lottery is for suckers…

The odds of being struck by lightning three times in your lifetime are, mathematically, 1 in 1,000,000,000,000,000,000. The actual odds are even lower, since there’s a non-zero chance that you’ll be killed by a lightning strike, making getting another impossible. If your odds of dying in a lightning strike are 10%, then your odds of surviving are 9/10, and your odds of surviving the first two so you can get the third are (1 in a million) * (9/10) * (1 in a million) * (9 in 10) * (1 in a million), or about 81 in one hundred million trillion.That’s 81 in 100,000,000,000,000,000,000. That’s roughly the diameter of the Earth-moon system compared to the diameter of the universe.

The odds of putting 100 pennies in a cup, shaking them up, and scattering them so they all land flat, and then having every single coin come up heads, are 1 in 1, 267, 650, 600, 228, 229, 401, 496, 703, 205, 376. That’s the diameter of a grain of sand compared to the entire universe. Literally.

Get a standard deck of cards. Take out the jokers and the instructions. Shuffle the deck and pick a card at random. Do this 25 times. The odds of picking the jack of clubs every single time are like a proton compared to the visible universe.

If you pick 43 letters at random, the odds of forming the string

actisceneielsinoreaplatformbeforethecastlef

(that is, the first 43 letters of Hamlet) are as small as one Planck length (which is the smallest unit of distance that ever gets used in actual physics) compared to the visible universe. For reference, a Planck length is ten million trillion times smaller than a proton, which is itself a trillion times smaller than a grain of salt.

Incidentally, if you assembled random 43-letter strings, you would have to do it

32, 143, 980, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000

times to have a 99% chance of producing the first 43 letters of Hamlet in one of them. But a human bard did it in, at most, a couple hundred tries. Isn’t that weird? More probability stuff (and black hole stuff) to come!

Standard
Uncategorized

How Many Novels Can There Be?

I like reading. I like writing. When you’ve been writing for a while, you start to get really obsessed with word counts. Anybody you talk to about publishing something you’ve written will want to know your word count. For short fiction, you sometimes get paid by the word. And the number of words in the thing you’ve written determines whether it counts as a short story, a novella, a novel, as War and Peace, or as an encyclopedia.

Every year, I participate in National Novel-Writing Month. Unless, you know, I don’t feel like it. But I’ve participated more years than not, and I’ve produced a surprising number of novels. Every single one of them terrible, but that’s not NaNoWriMo’s fault. The goal in NaNoWriMo is to write a novel of at least 50,000 words in 30 days. And I got to thinking: how many novels that length are there?

Well, in the English language, there are somewhere between 100,000 and 1,000,000 words. But you’ll be able to understand 95% of everything written in English by knowing only the 3,000 most common ones. After all, even though it’s a valid word, people generally don’t go around calling each other antipodean anymore.

The question is: “How many 50,000-word novels are possible, using mostly the 3,000 most common words?” The naive answer is to allow each word to be any of those 3,000, which means the number of possible novels is 3,000^(50,000). That’s 1.155 x 10^173,856. You’ll be happy to know that this number is so large that, when I tried to copy and paste it into this article, it crashed my browser.

Of course, this will include novels that consist entirely of the sentence “Anus anus anus anus anus!” over and over again, which is so avant-garde it makes me want to go pee on Samuel Beckett. The list will also contain more coherent, although still somewhat dubious works, like Stuart Ashen’s peerless desk reference, Fifty-Thousand Shades of Grey. But Fifty-Thousand Shades of Grey is actually constructed of coherent sentences. (Well, one coherent sentence, at least…) Most of the novels in this ridiculously long list will be more along the lines of “Him could carpet but also because you die but but the but the but the butt.”

We’re working from a flawed assumption: that a text is just a bunch of words stuck together. But unless you’re James Joyce (or, to a lesser extent, Stephanie Meyer), that’s not how it works. A novel is a bunch of words stuck together in a particular way. Although “that that” is grammatically valid (even though it looks weird on the page), “the the” isn’t, and “centipede cheese carpet muffin” is the kind of thing I say when I haven’t been getting enough sleep.

We’ve been working from the assumption that any word is equally likely to follow any other word. That is, that all word-pairs are equally likely. They’re not. “Our way” is a lot more common than “our anus,” for instance. Naively, the probability of any two-word combination is (1 / 3,000)^2, or 1 in 9,000,000. To put it another way, there are 9,000,000 two-word pairs, 25,000 of which would make up our nonsensical novel. It’d be much closer to reality to assume that, on average, there are only 50 words that make sense after a given word (the number will be much higher (in the thousands, I’d imagine), for words like “the”, and lower for words like “hoist.”) So, in reality, there are only 150,000 two-word combinations that make sense.

We could extend this to three-word combinations, but there are two problems with that: 50,000 isn’t evenly divisible by three, and that repeating decimal will drive me crazy. More importantly, the longer your word-block, the more words become possible at the end, until you’re getting close to 3,000 possibilities again. For example: “The” could be followed by any noun in our 3,000-word list. “The man” must be followed by a verb, the start of an adjective phrase (example: “The man I met last summer“), or something like that. “The man talked” will likely be followed by a word like “to” or “about.” But there’s an enormous range of things that the man could be talking to or about, so pretty much any noun or participle is fair game, bringing the number of possibilities back up into the thousands again.

So how many novels can there be? Well, the upper bound is probably (as we’ve seen), (3,000 * 50)^25,000, which is 1.912 x 10^129,402. That’s still a number so large there’s no name for it, but it’s smaller than our first number by almost fifty thousand orders of magnitude, which is something.

But let’s take it one step further. To simplify the math, I’m going to skip right to four-word combinations. And let’s say that any two-word combination forms the start of a phrase, and that the third word in the phrase can only be one of 10 words, on average. And, to take into account the fact that the number of choices start rising again with a long enough phrase, let’s say the fourth word can be any one of 500 words. The number of possible 50,000-word novels is now (3,000 * 50 * 10 * 1,000)^(12,500), or 1.382 x 10^114,701. So we’ve chopped off another ten thousand orders of magnitude. Still, that’s a big number. And, although I don’t have the math or linguistics background to prove it, I’m guessing that’s pretty close to the number of actual, sensible novels you could construct with 50,000 words: it takes into account the rough structure of the English language. This is related to the idea of a Markov Chain, which is a mathematically-formal way of saying “where you’re likely to go next depends on where you’re at now.”

For your amusement, I’m going to back up this post, and try to copy and paste (3,000 * 50 * 10 * 1,000)^(12,500) just below. If you see a horrific salad of numbers, you’ll know it worked. If you see an apology, you’ll know it crashed my browser again. Wish me luck!

Sorry. It didn’t work. Browser crashed again. But that’s probably good news for you, the reader, since, when I pasted the number of possible sensible novels into my word processor, it produced a document 32 pages long consisting of nothing but digits in 12-point Helvetica. I think that’d make most people’s eyes bleed. Or explode. Or sprout wings and fly away.

The moral of this story is: don’t worry about machines taking over the writing of novels. If a computer could output one word of its current novel every Planck time (which is generally agreed to be close to the shortest time interval that makes sense in our physics), the time it would take would be larger than the current age of the universe. And that’s an understatement. It would actually be so much larger than the current age of the universe, that if I were to express it as a multiple (in the same way I say 10^24 is a trillion trillion times larger than 1), then I’d have to write out the word “trillion” 9,558 times just to express it. If I allow the convention that 1 googol googol is (10^100) * (10^100), or 10^200 times bigger than 1, then I’d need to write “googol” over 1,100 times. There is simply no good way to express the size of this number. It’s 10^110,000 times larger than the age of the universe in Planck times, the diameter of the observable universe in Planck lengths, and the number of particles in the universe.

Boy oh boy. I started out talking about novels, and now I’m getting into numbers that trip the circuit breakers in my brain. Math can be scary sometimes. And you wanna know the scariest thing? There are numbers, like Graham’s number and the outputs of the Ackerman function for inputs larger than (6,6), that make the number of possible novels look exactly like zero by comparison, for any practical definition.

…I need to go lie down now. Although I’m probably going to come back later and talk about really enormous numbers, because part of my brain seems to want me to have a stroke.

Standard
Uncategorized

Shaving too.

The hilarious and bizarre Mitch Hedberg once said “Every time I go and shave I assume there’s someone else on the planet shaving, so I say ‘I’m gonna go shave, too.'” Because I am an obsessive fool who can’t leave anything alone, I started wondering if you could actually reasonably say that. I mean, there are a lot of people in the world, and a lot of people who shave, so it’s entirely possible that there’s someone shaving every second of the day.

This is a perfect place for Fermi estimation, or, if you prefer, back-of-the-envelope calculation. It’s a great method for getting a quick idea of the scope of a problem.

There are about 7 billion people on Earth. In many cultures, only the men shave. Let’s assume that half of the people in the world are men. That gives us 3.5 billion potential shavers. But, except in rare cases, men don’t start shaving until their beards begin growing at puberty. Let’s say beard growth starts at age 15. A randomly-chosen person could be pretty much any age, let’s say from 0 to 70. Only that percentage of men between 15 and 70 shave, which comes out to 79%, or 2.756 billion.

It takes me about 15 minutes to shave. Let’s assume that all the men in the world shave every day at a random time of day (this ain’t a realistic asumption, lemme tell you, but it’ll help compensate for the fact that most of the men in the world are in a different timezone than me, and for other weird factors like that.) There are 96 15-minute blocks in a 24-hour day. The probability of a man picking a particular 15-minute block to shave is 0.01. Therefore, the probability of a man not picking a block to shave is 0.99. The probability of every shaving-age man not picking the block in which I’m shaving is 0.99^(2,756,000,000), or 2.045e-12029404. When you see a negative exponent that large, your number is, by any sensible definition, zero.

Mitch had it right. So, from now on, when I shave, I’ll say “I’m gonna shave, too.”

Standard