Bits of Intellectual Property II

I was reading "What Color are the Bits?", an in depth discussion of the interplay between information theory and copyright law. I like this discussion, with one exception.

The authors reference an interesting thought experiment. If I take a copyrighted file, x-or it with a public domain file to produce gibberish, is that gibberish still copyrighted? Their argument is that a Lawyer would say "yes of course, those bits still came from someones intellectual property", and a computer scientist would say "no, now its just nonsense".

Why the confusion ?

These two differing opinions arise because each party is considering different pieces of information. When we scramble the bits of a copyrighted file, we might render the file itself meaningless. However, not all the information is contained within the file. We must also consider the information needed to de-code the file. The scrambled file, along with the string "This file can be decoded by x-oring it with the file found at www.foo.bar/baz57", constitutes an encoding of the original, copyrighted work. Without the knowledge of how to decode the x-ored file, it appears to be gibberish. However, with just a few more bits we can reconstruct the original copyrighted work.

This may sound familiar. The x-or example is really an example of encryption. In this case, the decryption password is a reference to the public-domain file that can be used to recover the original copyrighted work. I believe common sense dictates that encrypting a copyrighted work does not strip it of its copyrighted status, although the bits (encoding) may change significantly. The Lawyers are correct, provided that the decryption scheme is distributed in a way that can be associated with the x-ored file.

This logic also applies to the illegal numbers. These are numbers which, if represented in binary, correspond to information that it is illegal to posses. People, myself included, have gotten terribly excited about this. How can a number be illegal ? This must mean our entire legal system is bankrupt.

No. These numbers are only illegal if you know how to use them.

For instance, this prime is perfectly innocuous, unless, of course, you mention that "it unzips via the gzip algorithm into the c source for a program that breaks DVD encryption". That last piece of information is crucial. It is, in fact, the ( number, how_to_use ) pair that is in violation. Neither on its own has any meaning.

We do not consume, directly, series of zeros and ones from our computers. To ascribe meaning to these sequences of bits, we define procedures for turning these bits into something more familiar. For example, the ASCII standard defines how to turn a sequence of 8-bit chunks into the text you're reading now. The mp3 codex defines how to turn a series of bits into an audible sound file. Information is always paired with a decoding algorithm to convert it into something meaningful to humans. This concept captured by file extensions : we, or at least our computers, know to interpret 'foo.mp3' differently from 'foo.txt' or 'foo.doc'. All of these tricks to disguise copyrighted or illegal information are just clever re-encoding.

Since it is impossible to outlaw sequences of bits, it follows that to stop distribution of copyrighted or illegal information, you must simply dissociated the (information,encoding) pair. Both the information and the encoding scheme have legitimate uses on their own, but together they represent the infringing file. Some legal definitions more true to information theory might look like :

"An (information,encoding) pair A is considered to infringe upon an existing copyrighted (information,encoding) pair B if and only if the decoding of A would be considered infringing on the decoding of B."


"For a given illegal or copyrighted (information,encoding), it is unlawful to distribute (information) and (encoding) in such a way that through expressed or implied means, the (information,encoding) pair can be reconstructed".

I'm not sure what the implications of this are, if any.


  1. the lawyers don't care about this technical gibbering. they want to be able to say, X has ownership and distribution rights of the Content, and if you devise a method of circumventing his rights, then you are an infringer.

    this objection "but you can't make a number illegal!" doesn't actually have anything to do with copyright in particular. if the number encodes the message "i will pay $10,000 for you to kill person X", then n is also an 'illegal number'...

    nondisclosure contracts are similar.

    the law never cares about method of encoding or communicating, only about meaning and intent. information theory is not actually relevant here. i think its strange that the "illegal numbers" concept has become such a widespread meme, its pretty obvious that it is a canard.

  2. actually, i guess it is widespread because the argument was used in court by sony... what a crock.

  3. Anonymous31.3.11

    Yes, I was trying to give a definition of "meaning and intent" that would be satisfying enough to a computer scientist, since the "illegal number" case seems to be turning into a common misconception.

  4. Interesting. This suggests that content has some existence outside of any specific encoding. Imagine that you have a movie A, on film medium A*. This is identical to video B, on VHS medium B*. Both of these entities are the same, under copyright law, but neither is specifically the copyrighted entity, X, which is legally protected.

    So, under this framework are de-encoding methods that could be used for piracy in the absence of a 'magic number'?

    Weird... this is a much more pro-corporate account of copyright than I've come to expect from you.

  5. I'm a little confused. Yes, if (A,A*) and (B,B*) decode to the same view-able movie projection, then they should be considered equivalent under copyright. If there is a reference work (X,X*) that is copyrighted, then both (A,A*) and (B,B*) would be considered legal copies of (X,X*), and protected/restricted under law.

    "Weird... this is a much more pro-corporate account of copyright than I've come to expect from you."

    Mostly, I was trying to point out a growing crack in information theoretic reasoning about copyright. It looks like I'm defending the Corporate viewpoint, but I am merely proposing more logical definitions for the existing law. I had a copy-leftist conclusion prepared, but decided to let the math/logic stand on its own and replace it with "I'm not sure what the implications of this are, if any."