Uncategorized

The curse of dimensionality.

(The coolest part of this article will be that “The Curse of Dimensionality” is an actual legitimate term from computer science, even though it sounds like something from H.P. Lovecraft’s nightmares. But it’s a real scientific thing. Take a look.)

I like cellular automata. A cellular automaton is a bunch of cells arranged on some kind of grid. Each cell has a “state” variable. Each time-step, it decides which state to go into next according to a rule which looks at its current state and the state of all the cells around it. It’s about as simple as a computer simulation gets, but even in one dimension, it can produce amazing psychedelic patterns like this one:

Image

(The picture is two-dimensional, of course. Each timestep, the one-dimensional line of cells is rendered as a line of colored pixels, with its distance from the top increasing as time goes on. Click on the picture to get a version that isn’t all compressed and horrible.)

It doesn’t take much processing power to run a program like this. For instance, a 1-D cellular automaton with 1,000 cells and 8 possible states would only require 1,000 bytes of memory.

2-D automata take up a little more space, but they can still run smooth as butter on most computers, especially with neat, streamlined code. In two dimensions, an 8-state cellular automaton on a grid measuring 1,000 by 1,000 (1,000,000 cells) would only require a megabyte.

But you know me. I don’t like it when things are easy. I like it when things are weird. What if, for instance, we imagined a 3-dimensional cellular automaton? Here, each cell has 27 neighbors (itself and the cells adjacent to it and touching its corners and edges). If this cube of cells measured 1,000 cells on an edge (and each cell could still be in one of 8 states), then the cellular automaton would need 1 gigabyte of memory. I’m old enough to remember when that was an impressive amount, and indeed, processing a gigabyte of at 30 frames per second takes a bit of power. More power than a Pentium III chip had in 1999, at least.

But couldn’t we go bigger? OF COURSE! In a 4-dimensional cellular automaton, every cell has 81 neighbors, we’ve got a trillion cells, a terabyte of data (you’re probably going to need an extra hard drive), and our throughput would strain even a modern graphics card.

But let’s go BIGGER! I like the number 6, so let’s imagine a cellular automaton in 6 dimensions. Every cell has 729 neighbors adjacent to it. (Remember the good old days, back in 2 dimensions, when every cell had a measly 9 neighbors?). We’re going to need 1 million hard drives to store all those 8-bit numbers (there are 1,000,000,000,000,000,000 of them, after all), and we’re going to need a supercomputer more powerful than any currently in existence to handle the 60 exaFLOPS needed to maintain a smooth 60-FPS framerate. We will also need an air conditioning unit the size of a house, but don’t we always when I’m around?

But this is baby stuff. When scientists talk about high-dimensional mathematics, they’re not talking about hypercubes in 4 or 6 dimensions. They’re talking about crazy curves and datasets in, say, 256 dimensions! 256-D sounds like fun, so let’s go there!

Okay. I think I went too far. Again. For a start, every cell in our 256-D automaton will have 139008452377144732764939786789661303114218850808529137991604824430036072629766435941001769154109609521811665540548899435521 neighbors, which is more than a googol, which is a LOT. Second, the full 1,000 by 1,000 by 1,000 (by 1,000 until you hit 256) automaton would require

1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

bytes to store. That’s 768 zeroes. We’re talking numbers without names here.

What the hell happened? I mean, 256 isn’t exactly a small number. It’s certainly larger than the number of pounds of cilantro you’d want to eat, but it’s hardly huge.

Well, that’s the Curse of Dimensionality. A cube’s N-dimensional “content” (which is an analogue for 3-dimensional volume or 2-dimensional area) is equal to S^N, where S is the length of an edge. And as anybody who’s ever played with numbers knows, throwing exponents into things makes numbers get scary very fast.

And there’s another odd problem that you don’t notice much in two or three dimensions. A circle with a diameter of 2 has an area of (approximately) 3.142, while a square with an edge length of 2 has an area of 4. The ratio of square area to circle area is 1.273. A sphere with a diameter of 2 has a volume of 4.189, while a cube with an edge length of 2 has a volume of 8. The ratio is now 1.910. Well, the ratio keeps going up. And it goes up FAST. A 256-dimensional cube with edge length 2 has a hyper-volume (content) of 115792089237316195423570985008687907853269984665640564039457584007913129639936, which is a lot. Meanwhile, a 256-dimensional sphere with a diameter of 2 has a content of 1.1195×10^-152. That’s an absurdly small number. 1.1195×10^-152 universe diameters (and in case you forgot, the universe is so large that trying to fit its size into your head would probably physically kill you) is ten million trillion trillion trillion trillion trillion trillion trillion times smaller than the smallest distance which makes any physical sense (Side note: isn’t it weird that there’s a smallest physically-sensible distance? Reality has a finite resolution!)

What this all boils down to is that the vast majority of a hyper-cube’s content is in its corners. And, as anybody who’s ever tried to dust a room knows, corners are a pain in the ass. Which is why it’s so hard to work on data-sets in high-dimensional space.

Here’s another interesting fact. I was talking about Lovecraftian horrors at the start. And, although he gets binned as a horror writer, he was actually kind of a horror/sci-fi writer. He liked thinking about non-Euclidean geometry and extra dimensions and other such stuff that was wild and revolutionary at the time. Before I go get some much-needed sleep (I keep having weird dreams about having to take tests so I can work at a pizza parlor. Trust me, they’re exhausting.), let’s work out just how large a 256-dimensional creature would be.

A human being contains approximately 7×10^27 atoms. That’s a lot. If we take (7×10^27)^(1/3) (the cube root), we get how many atoms would lie along the edge of a cube with that many atoms. A cube about the density of a human containing about the same amount of stuff as a human would be 40 centimeters on an edge (Don’t try to imagine ways to make a human into a perfect cube. None of them are nice.) What about a 6-dimensional human? Well, because you can pack so much stuff into a 6-cube, a creature with as many atoms as a human and the same approximate atom-to-atom distance as a human would form a cube 8.7 microns on an edge, about as large as a medium-large bacterium or a human cell. In 256 dimensions, you’d actually run out of atoms before you ran out of places to put them. Imagine you put down one atom at one corner of a big 256-D grid. Then, you start putting atoms in the neighboring cells. Well, in 256-D, there are 115792089237316195423570985008687907853269984665640564039457584007913129639935 neighboring cells, which is more than the number of atoms in a person, so our 256-D human would be no more than two atoms across (no larger than a molecule, in other words) along any dimension.

I don’t know about you, but I’ve succeeded in tripping all the circuit breakers in my head. I’m going to lie down. But I’ll see you again soon, after I find something else headache-inducing to think about.

Standard

5 thoughts on “The curse of dimensionality.

  1. Pingback: The Phantom haunts the CT scanner. | Sublime Curiosity

  2. Pingback: The Curse of Dimensionality, Part 2 | Sublime Curiosity

    • Did I forget to give the unit its proper name? It’s called the Planck Length. As I understand it, there’s still some debate in physics concerning whether or not distance scales smaller than the Planck Length have any real meaning, but basic quantum mechanics (which has turned out to be one of the better models for the universe that humanity has come up with) strongly suggests that the universe is essentially nonsense on scales smaller than the Planck Length.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s