I got some interesting comments to one of my latest posts about "robots.txt". I am not linking to the comments here, since I am trying to hide my shadow site from the comment and trackback spammers. If you are reading this on the main site, the shadow site is at k.lenz.name/discuss. Scroll a couple of days back to a post named "The Robots.txt Exception".
One of the points of that discussion is whether copyright requires permission to build a search engine index.
Under American law, that would be the question if the index is a "derivative work" under 17 USC 106 (2). Under German law, I would have to ask if it is an "adaptation or other transformation" under Article 23.
One clear case is a translation.
It has taken some time until translation rights were recognized, as they are now under Article 8 of the Berne Convention.
The question if building an index is a derivative work or not is now worth billions of dollars. With the Internet, building large-scale indexes of all pages on the web is a business worth a lot of money. Without the Internet, there was no business model for building an index of all published books.
So this question is new.
Comparing to translation, we find a couple of parallels.
Just as a translation helps the work to find more readers (those that don't know the original language), a search engine helps a page to find more readers (those that would not have found the page without the search).
Just as a translation uses the whole work, and not a small part of it (like a citation), the indexing process uses the whole work. Actually, if I understand the process correctly, all modern search engines first use their robots to recover copies of all pages and then run their queries on the local cache, so arguably they make even more fare-reaching use of the whole work than a translation.
This can also be used as a fall-back argument for the copyright holders. Even if the act of indexing is not creation of a derivative work, the original act of harvesting the cache clearly is a reproduction under 17 USC 106 (1).
The difference to the translation business is that translation rights are usually awarded exclusively. For search engines, there is no need to exclude Yahoo because you let in Ask Jeeves.
An example of a use that is clearly not a derivative work is a book catalog or a Amazon listing.
However, these do not use the whole work in question. They only contain some short description of the book in question.
Therefore I think that a search engine index is a derivative work. That means that the indexing needs either an exception or a license. The default changes.
It has taken some time until translation rights were recognized. I expect that it will also take some time until indexing rights are undisputed. The search engines make a lot of money without giving the authors one cent of compensation. They will probably be interested in disputing this kind of right.
Posted by Karl-Friedrich Lenz at August 13, 2005 01:07 PM