I was reading "What Color are the Bits?", an in depth discussion of the interplay between information theory and copyright law. I like this discussion, with one exception.
The authors reference an interesting thought experiment. If I take a copyrighted file, x-or it with a public domain file to produce gibberish, is that gibberish still copyrighted? Their argument is that a Lawyer would say "yes of course, those bits still came from someones intellectual property", and a computer scientist would say "no, now its just nonsense".
Why the confusion ?
These two differing opinions arise because each party is considering different pieces of information. When we scramble the bits of a copyrighted file, we might render the file itself meaningless. However, not all the information is contained within the file. We must also consider the information needed to de-code the file. The scrambled file, along with the string "This file can be decoded by x-oring it with the file found at www.foo.bar/baz57", constitutes an encoding of the original, copyrighted work. Without the knowledge of how to decode the x-ored file, it appears to be gibberish. However, with just a few more bits we can reconstruct the original copyrighted work.
This may sound familiar. The x-or example is really an example of encryption. In this case, the decryption password is a reference to the public-domain file that can be used to recover the original copyrighted work. I believe common sense dictates that encrypting a copyrighted work does not strip it of its copyrighted status, although the bits (encoding) may change significantly. The Lawyers are correct, provided that the decryption scheme is distributed in a way that can be associated with the x-ored file.
This logic also applies to the illegal numbers. These are numbers which, if represented in binary, correspond to information that it is illegal to posses. People, myself included, have gotten terribly excited about this. How can a number be illegal ? This must mean our entire legal system is bankrupt.
No. These numbers are only illegal if you know how to use them.
For instance, this prime is perfectly innocuous, unless, of course, you mention that "it unzips via the gzip algorithm into the c source for a program that breaks DVD encryption". That last piece of information is crucial. It is, in fact, the ( number, how_to_use ) pair that is in violation. Neither on its own has any meaning.
We do not consume, directly, series of zeros and ones from our computers. To ascribe meaning to these sequences of bits, we define procedures for turning these bits into something more familiar. For example, the ASCII standard defines how to turn a sequence of 8-bit chunks into the text you're reading now. The mp3 codex defines how to turn a series of bits into an audible sound file. Information is always paired with a decoding algorithm to convert it into something meaningful to humans. This concept captured by file extensions : we, or at least our computers, know to interpret 'foo.mp3' differently from 'foo.txt' or 'foo.doc'. All of these tricks to disguise copyrighted or illegal information are just clever re-encoding.
Since it is impossible to outlaw sequences of bits, it follows that to stop distribution of copyrighted or illegal information, you must simply dissociated the (information,encoding) pair. Both the information and the encoding scheme have legitimate uses on their own, but together they represent the infringing file. Some legal definitions more true to information theory might look like :
"An (information,encoding) pair A is considered to infringe upon an existing copyrighted (information,encoding) pair B if and only if the decoding of A would be considered infringing on the decoding of B."
"For a given illegal or copyrighted (information,encoding), it is unlawful to distribute (information) and (encoding) in such a way that through expressed or implied means, the (information,encoding) pair can be reconstructed".
I'm not sure what the implications of this are, if any.