20100301

Bits of Intellectual Property

At what point do information artifacts cease to become copyrighted works of art and become patentable inventions ? The distinction between aesthetics and utility would seem to clarify this, but in an information based economy the two become less and less distinguishable. Code is data, data is code, and certain innocent prime numbers become illegal.



Copyright is the set of exclusive rights granted to the author or creator of an original work, including the right to copy, distribute and adapt the work. Copyright lasts for a certain time period after which the work is said to enter the public domain. Copyright applies to a wide range of works that are substantive and fixed in a medium.
In the United States, a design patent is a patent granted on the ornamental design of a functional item. Design patents are a type of industrial design right. Ornamental designs of jewelry, furniture, beverage containers (see Fig. 1) and computer icons are examples of objects that are covered by design patents.
Note that copyright verses design patent has nothing to do with the medium. A mass produced copy of a marble sculpture can be copyrighted, and a mass produced copy of a marble coffee cup would be design-patented. The critical feature it would seem, is weather the art is attached to something useful, or is purely valued for aesthetic purposes. However, this distinction is undefined. Modern art created objects of minimalist, utilitarian design that were also clearly works of art (see here). Furthermore, information can itself be a technology. The distinction between code (patentable) and data (copyrighted) has been dissolved my modern programming paradigms. Music is arguably a technology for manipulating emotional and cognitive state. Whether a coffee cup is a useful object or an art object changes depending on how many you sell and what your clients decide to do with them. Modern music seems to be made for production, to the point where some of it is legally in the same class as disposable coffee cups.

For every partition between the set of things covered by copyright, and the set of things covered by design patents, I can construct an object that lies directly on this partition. This indicates that the two sets are connected and must share at least on object. Existing law acknowledges the intersection, (e.g. The Statue of Liberty ).

The distinction boils down to the number of bits representing the idea and the probability of collision between artifacts. There are still objects that clearly fall under copyright, and objects that clearly fall under design patent, but the digital economy has created ever more exceptions to this classification.

Design patents cover features with low information content and a high probability of collision. Alternatively, (since we may be working with physical objects with an infinite amount of state ) the similarity threshold for design patents is higher than that for copyrighted works. Generally speaking, design patents will cover the "gist" of the decorative appearance of objects, and will therefore contain far fewer bits than might be required to exactly reproduce the design.

Copyrighted tends to cover objects with high information content where collisions are improbable. Almost all written and printed works contain sufficient entropy that the probability of collision between two original works is near 0. The lyrics to the song "Happy Birthday" have exceptionally low information content as far as copyright goes and, arguably somewhere close to 50 bits, using 1.0 bits per letter and assuming that the repetitive structure reduces essentially to

"eval('happybirthdaytoyou'*4;s/toyou/dearname line3)"

This gives p(collision) as 2^-50. However, if each of the last 7 billion people to live uttered a mere 100,000 sentences of equivalent entropy over the course of their lifespan, you'd get approximately 50% chance of collision ((1-(1/2^50))^(7000000000*100000)=0.537). Although this estimate is off, due to the number of people on earth who actually speak English, most works are quite a bit longer than Happy Birthday and therefore will not collide in practice.

Incidentally this gives a heuristic for the expiration of copyright : The length of time a copyright is granted can be proportional to the raw number of novel bits contained within the work. Perhaps an article could remain copyrighted until you reach a 50% probability of another human producing the same work by random chance. This creates the unfortunate incentive to simply append a lot of unrelated works together. Perhaps copyrighted material could be sectioned into substrings which provide no mutual information about eachother, although this enters into a weirdness of information theory about which I am not equipped to reason.

The only sense I can make of the distinction is that design patents must be verified before production, and copyright violations are settled in expensive lawsuits after production. It makes economic sense to thoroughly check objects with a high probability of collision before mass production, but to be lazy with objects of a low probability of collision. Of course, for physical artifacts the number of effective bits is subjective and depends on how discerning the consumer is. Have you ever heard two people arguing for hours over the relative merits of two cars/shoes/operating-systems that appear functionally indistinguishable to you? In light of this, it almost makes more sense to let the author ( yes, 3D objects with utility are works of authorship ) decide on what risks to take : expensive, fixed cost patent search, or prohibitively expensive possible lawsuit ? I bet we can write a whole set of optimization equations to determine what is economically most favorable based on probability of collision, and the cost of resolving these collisions, but I need to go to sleep.

How does one tell if two files are "equal" up to the equivalence class of "copyright" ? Computer files can be encoded in a multitude of ways, producing file strings of seemingly different information content. The key is that a file string, in union with a decoding method, makes up the information artifact. The file string itself is useless without a consensus decoding scheme. Large prime numbers are not illegal, but a specific large prime together with the decoding scheme "this is a zipped c source for a DVD decryption algorithm" becomes a copyright violation on the DVD decoding algorithm. This is much the same way that possessing a sequence of numbers, eg 4-8-15-16-23-42 is not in itself illegall, unless you also have the information "This is the combination to the safe at the bank downtown". For a working definition, let us say that a tuple (file, decoder) are equivalent if they must, at some point, be translated into the same string of bits to become usable, or ultimately compute the same function. Note that the general problem of determining if two algorithms compute the same function is undecidable, but most specific instances can be evaluated in finite time.

This problem is complicated by the advent of lossy compression algorithms. If I compress an audio book into a low bit rate Ogg Vorbis format, I will loose most of the original file information, and can not reconstruct it. We might be able to get away with "equivalent information content as appropriate for the medium", which for books might reasonably be defined as the string of text representing the work. This case would also cover transcribing the audio book to text using speech recognition. However, this definition is still vague. For instance, the quality of the narration in an audio book is arguably part of the product and would be lost with heavy compression schemes.

We might be able to get away with another definition : a lossy compression of a copyrighted work is a violation if the tuple (file, decoder) could not, with high probability, have been arrived at without access to the original file in some form. This can be formalized by saying that the compressed file contains a very specific subset of the information content of the original, and effectively is a copyright violation on this subset of information, but not the whole file.

So, how many bits do you need for a copyright violation ? How many bits do you need for a patent violation on an algorithm ?

Another scenario, dealing with algorithm patents. An algorithm may be patented because it is exceptionally fast, or has some desirable properties. So, say Alice has patented a super good face recognition algorithm with excellent accuracy and time constant, and Bob then develops a wholly different algorithm with the same speed and accuracy. Alice's algorithm was phrased in terms of digital signal processing, and Bob's algorithm used a lot of group theory and graph data-structures. The patent office decides that these are clearly pretty different algorithms, and grants both patents. Later on, a mathematician comes along and says "Aha! by the following isomorphism Alice's and Bob's face algorithms are mathematically equivalent", and furthermore "All possible algorithms with this time constant are isomorphic to Alice's and Bob's algorithms", and even worse "No one can do better, theoretically, than Alice or Bob". So, not only are Alice's and Bob's inventions provably identical, but all future inventions attempting to solve the problem will also be provably identical. How do you resolve this ? Do you revoke the patent on whoever filed second ? This seems pretty clearly unfair. It could get even worse : Alice's and Bob's algorithms could be, theoretically, identical in speed and accuracy, but the equivalence between them could be uncomputable. Is this really, legally, a different situation from the case where the equivalence is computable(provable) ?


4 comments:

  1. So, if you knew a way to rigorously define when two programs are the same "up to copyright", that would actually be of extreme interest to complexity theory.

    I've tried to think about this a bit, and I've talked to a few other grad students/ researchers about it.

    How do we define when two programs are implementing the same "algorithm"? Everybody kind of knows what this means. Like, two programs are basically the same when they differ only up to trivial implementation details.

    The idea, in my mind, is closely related to, how do we define when two formal proofs are "basically the same idea"?

    Explicitly, I would consider that, two programs are "basically the same" when the Hoare Logic proofs of correctness corresponding to them are "basically the same". Or perhaps, you would view the computation history of the machine as the proof that the input should correspond to this output, and ask that each of these proofs are the same, for each input.

    One might imagine defining a finite collection of program modifications, of implementation details, and then sequences of these yield equivalent programs... but I've actually had a hard time figuring out what these should be.

    A second thought is that, there isn't going to be a finite sequence of changes, its going to be some kind of asymptotic consideration. Consider a sequence of graphs, and then a corresponding sequence of proofs that they are connected, generated by one program, and then a second sequence of proofs generated by a different program.

    If a proof from one program can be transformed into a proof in the other program by logspace computable transformations, then we might say they are equivalent.

    Or, another idea I had, we might look at these proofs in some formal logic, such as Frege or Extended Frege, and say that if, starting from all of the lines of proof A, we can get the rest of the lines of proof B with say log n overhead, and also start from the lines of proof B and get to the lines of proof A with log n overhead. Or any other asymptotic quantity.

    It would be very interesting if we could actually focus meaningfully on algorithms rather than just turing machines / programs...

    For instance, if we could show that, all algorithms for S-T connectivity in a graph boil down to either depth first search or breadth first search (or some kind of linear combination, either in a deterministic taking turns sense or a probabilistic sense?), it would really represent a dramatic step forward in our understanding of computation.

    I really can't see how this is any different the issue raised in the article. All I've really done is say, there probably isn't a strict "equivalence class" of programs based on the "key ideas", although there certainly is one based on program behavior. There is probably an equivalence up to low level details, up to higher level details, etc.

    I had at one point tried to figure out how to define a notion of homotopy (in topology, this is the notion of two continuous maps being the same except up to a sequence of minute changes, or "wigglings", usually of loops on some kind of surface), except for circuits instead... I haven't had any luck. The obvious definition coming from Category Theory is actually pretty vacuous. I don't know how promising it really is to just try to generalize other definitions like that, for instance, if we could throw some kind of meaningful topology on the set of Turing machines... I'm not really actively thinking on this anymore, but I hope to again in the future, and I am very interested in anyone's thoughts on this... It really is a pretty intuitive concept I think, that everyone seems to grasp, but yet we don't quite see how to formalize it yet, at least as far as I know.

    ReplyDelete
  2. " If a proof from one program can be transformed into a proof in the other program by logspace computable transformations, then we might say they are equivalent. "

    Thats actually fairly good. I'm not completely sure I understand everything you said.

    Is it also possible that two algorithms ( or their corresponding correctness proofs ) compute the same function, but _can_not_ be transformed into one another, even permitting all possible transformations ?

    What would be the purpose of restricting it to log-space transformations? I mean, I understand that this means you can take "whatever algorithm A is doing" and, in polynomial time transform this into "whatever algorithm B is doing", which means the blow-by-blow equivalence between the two computation streams is efficiently computable, but is this really necessary ?

    Is it possible that I could have two algorithms which are isomorphic but only after an infinite number of transformations ? If they are in the same complexity class and can be mapped to one another via any sort of abstract proof (however hard to compute), does that mean that they are "basically the same idea" ?

    ReplyDelete
  3. > Is it also possible that two algorithms ( or their corresponding correctness proofs ) compute the same function, but _can_not_ be transformed into one another, even permitting all possible transformations ?

    So, the idea that, their corresponding correctness proofs are basically the same, is sort of like saying, these results are "corollaries" of eachother. I don't really have any idea how to formalize that.

    The other notion, that, instead of looking at the (single) proof of correctness for an algorithm, we view it as generating a proof in some proof system for each string it sees...

    Note: It occurs to me that it is possible that an algorithm has no proof of correctness (and yet is correct!). So that may be a strike against this definition.

    The other notion I was thinking of, where we view the algorithm as creating a proof for each input, well then its definitely possible that their proofs can't be transformed into eachother in logspace... anything where you have two arguments for something that make fundamentally different points, will work like this. So like, if you have some kind of exhaustive proof argument that two graphs are not isomorphic, by saying, whats all the places this vertex could go, etc., and proving that nothing works. You could have a different kind of proof based on a graph invariant, like say, this graph has a different eigenvalue from that graph. Then these proofs should not be interreducible in logspace, unless they were both originally derivable in logspace (they almost certainly aren't).

    If you allow arbitrary power in the transformation, then certainly anything is equivalent, I mean, assuming the proof systems are both sound then there is some proof of x in L for each, and so we can set up some arbitrary correspondence.

    > What would be the purpose of restricting it to log-space transformations? I mean, I understand that this means you can take "whatever algorithm A is doing" and, in polynomial time transform this into "whatever algorithm B is doing", which means the blow-by-blow equivalence between the two computation streams is efficiently computable, but is this really necessary ?

    So, logspace transformations are usually used for reductions in theoretical CS, because logspace isn't powerful to do anything really interesting, but it is good enough to do things like, rephrasing something we already have. Think, Cook Levin reduction, which turns Turing Machines into corresponding Circuits. Or most NP-completeness reductions. Major result, figuring out if two vertices of a graph are in the same component is complete for logspace -- you can't do any real serious computation, but you can rearrange things syntactically and make really obvious observations. So if two proofs are logspace equivalent, they are almost like, different representations of the same information, or so the thought goes.

    We've already seen that, we need to make some kind of constraint to get a meaningful notion of program equivalence -- so that we can identify the things that really require more than logspace computation to be seen as the same. In particular we must have less power than the actual algorithms under consideration require, or else these reductions could actually just carry out the computations in question. But if we are too restrictive, then we will end up distinguishing things we don't want to distinguish... for trying to distinguish logspace algorithms, we might consider using AC0 reductions. But for polynomial time algorithms, logspace might make more sense. There's not a real "right" or "wrong" answer... my claim is, anyway, that adding different kinds of power in the reductions will be like, ignoring various levels of detail in the corresponding proofs, and there probably isn't any one "natural" level of detail which makes the most sense to use.

    ReplyDelete
  4. > Is it possible that I could have two algorithms which are isomorphic but only after an infinite number of transformations ? If they are in the same complexity class and can be mapped to one another via any sort of abstract proof (however hard to compute), does that mean that they are "basically the same idea" ?

    So it really depends on the precise definition we use. Probably, we will only allow a finite number of transformation operations to get from any one object to any other... this is how things are usually done in discrete settings. In a continuous setting, there isn't really a notion of infinitely many changes, there is just this one notion of homotopy, which is that we can move from one to the other in a continuous fashion with respect to time.

    Theorems like, finitely many operations of a sort suffice to capture a given notion of equivalent, are often quite deep theorems that pave the way for much more insight and work. A classic example is Hilbert's Basis Theorem:
    http://en.wikipedia.org/wiki/Hilbert%27s_basis_theorem

    A similar example is a theorem that, if two different Group Presentations yield isomorphic groups, then you can move from one to the other using only finitely many Tietze Transformations:
    http://en.wikipedia.org/wiki/Tietze_transformation

    Again, I don't really know what the right answer is to these questions. I don't know what the corresponding "basic moves" would be, although if a notion were made like this in a discrete setting I would assume it would look something like Tietze Transformations, except somehow defined for Turing Machines... unless it really was defined in terms of proofs and logspace reductions. I'm not sure this logspace idea really is the right one, because I'm not sure its quite the same as the idea in mathematics of "corollaries", which is I think closely related to what we are capturing -- I'm quite attached to the idea, two programs are equivalent if their proofs of correctness are in some sense corollaries of eachother... But I don't know for sure. These are all pretty good questions.

    ReplyDelete