Tuesday, 23 November 2010

The PiSBN Project

[Jan 2015 update - thanks to Cliff Pickover for an unexpected Twitter boost to an old article - I should point out that his books on maths and science are excellent, I used to deal with them a lot as a science buyer at Blackwell's.  See also a followup at the end on ISBN10]

I don't normally apologise for being a bit of a geek.  There's a part of my brain that's into computers and space and science and all that, it's just the way things are.

But to be honest, even I cringe a little at my latest little project.  Sorry about this.

Pi is an interesting number.  It goes on forever, doesn't repeat itself, and appears to be entirely random.  And in an infinite, random sequence, you get every possible combination of numbers eventually.  Your phone number is in there somewhere.  And every book written, if you convert the numbers into ASCII.

Searching Pi for books like that would be stupid though, you'll burn up the best computers on the planet before you get anything worthwhile.  There is, however, a quicker alternative.

Almost every single book published since 1966 has an ISBN number.  These days they all start 978, then there's another ten digits.  The last one's a check digit made by multiplying the others up in a certain way.

So I wrote a program that searches Pi for ISBN numbers.  Then it checks them to see if the check digit is a valid one.  Then it looks the ISBN up on Google Books.

I got three hits in the first fifty million digits of Pi.  It took about ten minutes.  Actually, it took about three hours to write the thing properly, another hour debugging it, and a frustrated lie in the bath half way through.  And about six cups of tea.  Once it actually worked it was fairly quick though.

And so, ladies and gentlemen, I present for you edification and entertainment....

The first three books in Pi!

(Cue fireworks.)

At the four hundred and nine thousand, seven hundred and eighty third decimal place, we have:

Licentiate seminar on environmental engineering and biotechnology
by the Tampereen teknillinen korkeakoulu. Bio- ja ympäristötekniikka.
(Tampere University of Technology. Department of Biological and Environmental Engineering.) 

I have to admit that I haven't read this one, but I'm quite chuffed that the first book (well, journal) is something a bit geeky.


In second place, at the two million, one hundred and twenty thousand, two hundred and fourth place, is:

Sneeuwwitje en Rozerood
by Jacob Grimm


Or as it's known in English, Snow White and Rose Red.  And because the original is not covered by copyright you can get a copy from the excellent Project Gutenberg.  Not hugely geeky, but there is something wonderfully gothic and conspiratorial about one of Grimm's turning up.

And taking the bronze, at the three million, six hundred and thirty thousand and thirty third decimal place:

The healing knife
by George Sava

A curious sounding book written under a nom de plume by a "noted Harley Street Surgeon".

I'll bet you feel better now that you know what the first three ISBN numbers are in Pi?  I know I certainly do.

Sorry again.


2015 Followup
Bookish types will have recognised that I searched for ISBN13 versions - and that there's also a 10 digit version which was used until 2007 (coincidentally when I first worked as a bookseller, and witnessed much gnashing of teeth from people who had 40 years experience as booksellers or, worse, software suppliers to the book trade).

The 13 digit version was brought in to align the whole system with an over-arching barcode, the "978" was added to denote "bookworld" - books, magazines, journals, that kind of thing. Unfortunately, it also screwed with the check digit at the end, meaning that also had to be recalculated.  As a bonus, however, the new check digit could only be 0-9, while the ISBN10 version could also have a check digit of "10" which was encoded as "X".  This is the main reason I went with ISBN13 in my search, Pi doesn't contain any X's.

However, a search for ISBN10s would be far more profitable in terms of books discovered, even ignoring the missing ones with X's at the end. A quick and dirty guess would suggest thousands of times more, and now, the internet being what the internet is, somebody has run with it in a glorious way.

David Fiander has found a total of not three, not thirty, not three hundred, but 1,168 books in the first million digits, and he's posted a great big spreadsheet with all of them on it on his blog.  My thanks to him for turning this from a geeky little project into a geeky bigger project.

13 comments:

  1. This is awesome. What a creative way to find a good book! :)

    ReplyDelete
  2. This is too cool! When are we gonna see more?

    ReplyDelete
  3. Awesomely geeky! Maybe you could turn this into a distributed processing project like SETI@Home. You could call it The Library of Pi. :-)

    ReplyDelete
    Replies
    1. Borges already beat me to that one - Pi and the Library of Babel are essentially the same thing:

      https://archive.org/details/TheLibraryOfBabel

      Delete
  4. Minor code bug: pi actually encodes ISBNs in reverse order!

    ReplyDelete
    Replies
    1. Ha! Point me to somewhere listing Pi in reverse order and I'll get on it ;)

      Delete
  5. Since we're already this meta it might also be interesting to know which *possible* ISBNs are in PI.

    These are "only" actual ISBNs but technically there could be more as long as the checksum is right, right?

    ReplyDelete
    Replies
    1. As I understand the script does this in step 1.

      Delete
  6. They're all in there; the ones without valid checksums, the ones with valid checksums but which haven't been assigned to publishers, imprints or books yet, and all currently valid ISBNs alike. As you can see though, they're pretty widely scattered with just the three in the first 50,000,000 digits - the original code is on a hard drive in a box somewhere, but from memory there were hundreds of valid-as-in-checksum ISBNs and only the three that were actually real.

    ReplyDelete
  7. A little update for any readers from spiegel.de - my apologies for using "ISBN number" - this, as pointed out in the comments, is indeed a case of RAS syndrome (Redundant Acronym Syndrome syndrome), similar and just as annoying as PIN number. This error is entirely mine, not that of Spiegel, and I accept full responsibility (while blaming the entire book trade for still talking about ISBN numbers)

    Secondly, thanks to the user who pointed out that the edition of Snow White & Rose Red is actually in Dutch, I hadn't realised.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. I wonder had it all gone horribly wrong and had ended up with some very specialist illustrated adult literature would this article still have been published. We hope so…. (Grin)

    ReplyDelete