July 09, 2006

Google Banned

from Ebay as a payment method.

See also this long article from a former Paypal employee, who argues that Google will be unable to solve the fraud problems coming with a payment service.

Link found at BoingBoing.

Posted by Karl-Friedrich Lenz at 04:25 PM

July 01, 2006

Check Out from Google

Finally Google has released a sensible service.

It is called "Google Checkout". This is urgently needed. Of course people need to check out from using Google. I hope the new service will help users to find other search engines (I recommend Yahoo) that are not completely unworthy of their trust.

As I am not really using any Google services, I might be wrong in assuming that the new service helps users to check out from Google. "Leave Google" at leave.google.com might be a better name for the service I am thinking of.

But anyway, for those who start to check out: Contrary to that old song, just remember: You can leave as well.

Posted by Karl-Friedrich Lenz at 11:41 PM

May 16, 2006

Google Death Ray

In Dilbert today.

Posted by Karl-Friedrich Lenz at 09:34 PM

May 10, 2006

Google Book Search Blog

Google has launched a new official blog focussed on their "Book Search" massive copyright violation program.

Link found at Google Blogoscoped.

Posted by Karl-Friedrich Lenz at 05:11 PM

April 10, 2006

WiFi Privacy

Google's proposal for free WiFi access in San Francisco of course comes with another spam and privacy violation string attached. Verne Kopytoff writes in the San Francisco Chronicle that some people start to raise concerns about Google's policy to require log-in and build yet another giant database on online usage and location data. (Link found at Scripting News).

They want to store for 180 days where the users of their WiFi service have been and use that data for the purpose of serving location-related spam.

This kind of location data processing would be illegal under Article 9 of the 2002 data protection Directive if someone tried to pull it off in Europe.

Anonymous WiFi has the potential to help work around attempts to turn the Internet into one big surveillance machine. It also has the potential to help the enemies of freedom with that project.

It seems quite clear again which side Google is on in this fight.

Posted by Karl-Friedrich Lenz at 11:22 PM

April 02, 2006

Google Deleting Whole Website

That is, if you have "delete this page" buttons on your wiki pages. Of course the Google robot will click them all. See this report at the Daily WTF (found at Google Blogoscoped).

Does the Google robot also click on all the Google ads? Or ads from other sources that pay by the click? Do other search engine robots click on ads? If so, that would need to be considered when settling payments from the advertisers.

Posted by Karl-Friedrich Lenz at 12:39 PM

Library Online Use

The new German draft copyright law introduces a new exception in Article 52a (pages 4 and 53 of the draft) that implements Article 5 Nr. 3 n) of the 2001 information society copyright Directive. That reads:

"3. Member States may provide for exceptions or limitations to the rights provided for in Articles 2 and 3 in the following cases: (...)

n) use by communication or making available, for the purpose of research or private study, to individual members of the public by dedicated terminals on the premises of establishments referred to in paragraph 2(c) of works and other subject-matter not subject to purchase or licensing terms which are contained in their collections;"

The German draft gives non-commercial libraries, museums, and public archives a right to display books they have bought electronically on terminals in the library. They have to pay for that right to a collecting society. Any contract with a publisher about this matter overrides the exception.

There was not much controversy about this proposal in the debate leading to this draft, so it seems to be a balanced approach.

The interesting question is what this means for the Google library project.

As far as German copyright law is concerned, the Google project is obviously illegal. This proposal changes nothing in that respect.

Is there any meaning for the fair use analysis under Section 107?

Here the last factor seems to be relevant. "Effect of the use upon the potential market".

Clearly there is a value for libraries to be able to display the content of books in digital form to their users. The new German proposal recognizes this value by ordering libraries to pay for this right. Clearly there are also publishers out there who approach libraries and get some kind of contract on that right. The draft recognizes this market reality by giving these contracts priority over the legal exception.

Tony Sanfilippo, marketing and sales director for Penn State University Press, makes this point in a October 2005 letter to the Wall Street Journal:

"My primary objection is that we will lose the opportunity to sell those digital files of our content ourselves. These libraries are among our best customers. Each of the libraries in question probably has 70% to 90% of what we've published over the past 50 years. The files of just our content that Google is giving each library are conservatively worth tens of thousands of dollars, if we had been allowed to sell them those files. The libraries involved have all bought or subscribed to our digital content in the past. Now they won't need to anymore."

So the uncontroversial decision of the new German law backs up the complaint of the publishers that Google's copyright violation is ruining their market chance to sell libraries digital copies of their content, which is one factor in the fair use analysis under Article 107.

Posted by Karl-Friedrich Lenz at 11:31 AM

April 01, 2006

Privacy Threat Award

The privacy creeps at Google have bagged a prestigious award for "biggest threat to genetic privacy" by the "Coalition Against Biopiracy".

Their nomination was for "Worst Corporate Offender". The Coalition Against Biopiracy doesn't like the idea of having individual genome informations in the Google databases.

Neither do I. I want Google and other spam artists to keep their dirty fingers the hell away from my genetical information.

On the other hand, Glyn Moody says "it is much more likely that Google wants to create the ultimate gene reference, but on a purely general, not personal basis." (I found that award in his blog post in the first place).

I don't know why Moody thinks there is any reason to trust Google's dedication to privacy values. Is there any basis in their track record to assume anything but the worst?

The future world Moody describes in his 2004 Guardian article "Googling the genome" seems to give individuals as well as the police the power to google genetic information about everyone. It does not impress me as convincing to deny the possibility now that Google is actually starting to build this particular privacy outrage.

Posted by Karl-Friedrich Lenz at 04:47 PM

March 19, 2006

Google Never Forgets

Google never forgets, as Michael Froomkin points out here.

Posted by Karl-Friedrich Lenz at 08:16 PM

March 08, 2006

Google Book Search Sucks

As Siva Vaidhyanathan points out here, users can't even find Cory Doctorow's book "Down and Out in the Magic Kingdom".

He also reminds us that no one knows what kind of search algorithm Google uses to calculate bookpageranks. Those are proprietary and secret.

Posted by Karl-Friedrich Lenz at 11:01 AM

February 14, 2006


Another "global campaign against Google" sparked by anger over Google's China policy. They have slightly changed the "Google" trademark to "Goolag" and want people to print that on t-shirts. Went right up at BoingBoing.

Posted by Karl-Friedrich Lenz at 01:17 PM

February 11, 2006

Which Four Lines?

Many believe that Google is only using four lines of any book in snippet mode.

For these people, there is a simple question.

Can you tell me _which_ four lines Google is using?

Posted by Karl-Friedrich Lenz at 10:30 AM

February 10, 2006

Google Boycott Site

"Students for a Free Tibet" angry about Google's China policy have set up a boycott site.

I am sorry to say that I can't join that movement since I am already boycotting Google since last year for other reasons.

Posted by Karl-Friedrich Lenz at 03:14 PM

Google Ignoring Opt-Out?

Larry Lessig pointed to a "Google Group" that distributes statements from Google on the book project.

He cites one post from that group that repeats the common error of assuming that Google only displays a couple of lines. The last paragraph from that:

"Second, Google does not show more than two or three sentences without the author’s permission. And that’s not all. If a copyright holder chooses not to participate in Google Book Search, not a single word from the book will appear in any searches."

Here is a mirror of a comment I just entered to Lessig's post.

The error is of course on your and Google’s side.

Google is displaying all pages of all books to the totality of searchers. That they only display a couple of lines (much more meaningful than “sentences”, since sentences can be two words or two hundred words long) to individual searchers is irrelevant, since we are talking about Google’s use, not that of individual searchers.

While this error is not new, the last sentence is interesting. I thought that opting out meant that Google kept their fingers off the work completely. In contrast, this might be read that Google still includes works opted out in the database and in search results, but just displays no snippets for them. That would explain what happens when you search for “Supreme Court Sony”.

As I noted on my blog yesterday, one of the results on the first page (owned by Lexis) displays no pages and no snippets. I was somewhat puzzled by that, but this might be the logical explanation.

If this theory is correct, Google is reproducing content into their search database against the expressly declared will of publishers. That seems to be somewhat more serious than just violating copyright because it takes too much time to ask for permission.

Posted by Karl-Friedrich Lenz at 02:58 PM

"Privacy is Dead"

Title of this post by Michael Arrington on the latest outrage by the privacy creeps at Google.

Apparently, Google now starts storing the whole content of their user's harddrives on their servers.

Good to know that they have a "no comment" policy firmly in place on whether they are selling out that data to interested third parties like the NSA or anyone with a subpoena.

See also "Google Copies Your Hard Drive - Government Smiles in Anticipation" from the EFF.

Posted by Karl-Friedrich Lenz at 02:26 PM

February 09, 2006

Google Book under UK Law

Paul Ganley has a forty page article up at SSRN supporting Google. Contrary to my views, he seems to think that Google are "good guys" (page 2).

While he gives some "positive spin" (his words) that Google might have a case under American law, he seems to agree with my assessment that under current European law Google has not much to work with, and then calls for changing that.

Found at Open Access News.

Posted by Karl-Friedrich Lenz at 08:04 PM

"Online Privacy Nightmare"

EFFs Fred von Lohman writes about the recent subpoena against Google and notes that the possibility of future subpoenas is a "recipe for an online privacy nightmare".

He asks why Google needs to know so much about the users and wants legislation enacted to limit their endless appetite for consumer data, such as that already in place for video rental services.

Posted by Karl-Friedrich Lenz at 02:28 PM

No Comment

Danny Sullivan has an essential post on search engine privacy up at Searchenginewatch. He discusses and explains a recent survey.

News.com has asked some questions to search engines. One of the question was if the search engine in question gives the government the data it has on its users.

Google and others replied with a no comment.

Now anybody paying half attention knows that the Bush administration does not believe in obeying any privacy laws. They operate under the weird theory that they can do any spying they please. See for example Glenn Greenwald's blog for some pointers.

Under these circumstances, I for one read "no comment" as "yes, of course we are selling out our users to the NSA for any data mining they might be interested in".

Posted by Karl-Friedrich Lenz at 09:49 AM

Searching for Love

in Google book gets this result.

Looking at that page, I find no snippets on the first ten results. All of those are under what used to be called the "publisher" program (it now seems to be the "partner" program).

I also learn that there are not three ways books are displayed, as Google misleadingly claims on the main "partner program" page, but four. That is, sample pages are displayed either with or without login to Google, depending on the agreement with the particular publisher. The latter case means that Google knows exactly what people have been reading, with names attached.

But I can't find one case where Google has no agreement with the publisher and therefore displays only snippets in those first ten results. The same is true for the second set of ten.

This would seem to indicate that the snippet view is not really important to the top search results.

Maybe the reason was the search term. Who cares about love. Let's search for "Supreme Court Sony" instead.

Still the majority of results on the first page is under the "partner program". However, we find two cases of snippet view, both of them Senate Judiciary Committee hearings.

These show how exactly those snippets look. Google is not displaying an electronic snippet here, but a part of the scan, a picture. They do display two lines of electronic data on the results overview page. So to be exact, Google is displaying snippets in two different ways.

We also find a strange case here. The result for "United States Supreme Court Reports" displays absolutely nothing. No snippet of a scan (maybe because Google has the data only in electronic format in their database), and no two lines context in the search overview. Note that the publisher in question is Lexis. I don't know why Lexis gets this preferential treatment compared to all other publishers.

I have been disgressing somewhat. My main point here is that most research results on the top pages are from the "partner program". I have confirmed that theory also with a couple of other searches.

If true, that might be important for assessing the copyright situation. For example, the assertion of Google supporters that you just can't build a meaningful book search engine while respecting copyright would be contradicted by the fact that, yes, most or all of the top search results are licensed.

Posted by Karl-Friedrich Lenz at 09:11 AM

February 08, 2006

Coleman Speech

University of Michigan president Mary Sue Coleman addressed the Association of American Publishers, explaining why the Google book project is a good thing.

Link found at John Batelle's Searchblog.

I completely agree that digitizing books and building a searchable database are useful.

As are translations, derivative works like movies or offering the full text in databases. The fact that something is useful doesn't mean you can do it without asking the copyright holders.

Update: One interesting point for the legal analysis is the question what the University of Michigan intends to do with the electronic data that Google hands them under the contract as consideration for giving Google access to the books. President Coleman noted that they don't expect to have students read electronic copies of the latest Harry Potter book in their dormitory rooms. Instead, the data will be locked away and treated with utmost care, like "highly infectious disease agents used in research".

If the university is planning on just locking the data away, what exactly is the point of having it in the first place?

Posted by Karl-Friedrich Lenz at 12:09 PM

January 29, 2006

Nature of the Work

The Field case opinions' discussion of factor three (the amount and substantiality of the use) relies on the Sony case to give Google a right to copy and display the whole work in the cache. It cites the following passage from Sony:

"When one considers the nature of a televised copyrighted audiovisual work ... and that timeshifting merely enables a viewer to see such a work which he had been invited to witness in its entirety free of charge, the fact that the entire work is reproduced ... does not have its ordinary effect of militating against a finding of fair use."

The court then goes on to say that just like the broadcasters in Sony and the photographer in Kelly, Field made his content available to anyone, free of charge.

I have two comments to this analysis.

One. Many websites run ads. In that case, redirecting traffic to the Google cache hurts the advertising income of the website. This might not have been a factor in this case, but should be noted if one talks about extending its finding to different settings.

Two. The Supreme Court was talking about "time-shifting for private home use", about the time-shifter's fair use right. In contrast, this opinion is talking about Google's fair use, not that of individual searchers.

Making the broadcast available to individual viewers free of charge does not mean making it available to other broadcasters for replaying it. In the Sony picture, Google is not the equivalent of a time-shifting viewer, but of a different network. Obviously Sony did not give other broadcasters the right to replay just because the original broadcast was free for the viewer.

Posted by Karl-Friedrich Lenz at 09:44 AM

January 28, 2006

Google's Great Transformation

The Field case opinion gives "transformation" by Google as a reason for finding fair use (pages 14 and 15).

They cite the Supreme Court Campbell case. According to the Supreme Court in that case, as cited by this court, fair use analysis largely turns on the question:

"whether the new [use] merely "supersedes the objects" of the original creation... or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message; it asks, in other words, whether and to what extent the new work is transformative" ... Although such transformative use is not absolutely necessary for a finding of fair use, ... the goal of copyright, to promote science and the arts, is generally furthered by the creation of transformative works."

"Transformative works thus lie at the heart of the fair use doctrine's guarantee of breathing space within the confines of copyright, ... and the more transformative the new work, the less will be the significance of other factors, like commercialism, that may weigh against a finding of fair use."

Since the Nevada court is clearly an extremist Google cheerleader with no interest in neutrally looking at the facts, we can expect it to find lots of fabulous "transformations" in Google's activities, even though Google obviously just engages in wholesale copying without adding any creative aspect whatsoever.

We are not disappointed in our expectation. The court comes up with the following points to recognize Google's Great Transformation.

One. The cache adds the possibility to access the content if the website is down.

That, obviously, does not change anything. The content in the cache is exactly the same as that in the website (Google is not, at this point, asserting the right to edit the content owned by other people). So how exactly is adding to the distribution channel further the promotion of the arts? You need to add to the content, not to the distribution.

Two. Having a cached copy gives users the power to check for changes. If the website owner edited his work, the user can still access an older version.

Any transformations found in this case (where the content of the original website changed after Google has crawled it) clearly is done by the website owner. Asserting this as proof of Google's Great Transformative power seems rather far-fetched. If this was true, the website owner could avoid that problem by refraining from editing his own website.

I don't think that ordering the website owner to do that "furthers promotion of the arts" in any way.

Three. Now comes the first real transformation offered by Google. They highlight the search terms, making it easier to find those terms on the cached page.

This is not much of a creative effort. The content is still unchanged, it is only slightly easier to navigate. And that added functionality is not added by a human author, like in the Campbell case, but automatically.

The Fields court seems to want to protect the creativity of the computer serving the search result page in question. I am not convinced that any such protection is called for.

Four. Google is adding a backlink and a disclaimer to the cached pages.

Again, that is not any creative effort worth protection, and this adds absolutely nothing to the work as such.

Five. The court then comes up with another fact irrelevant for this analysis: Website owners can disable Google caching, if they are ready to jump through the hoops Google has set up for this.

That might be. However, that fact does nothing to transform the work in any way.

To sum up, we have a big pile of nothing to base the weird "transformation" theory on.

The Google cache is a one-on-one, absolutely unchanged copy of the Web. Not a parody, like in the Campbell case. Highlighting a couple of search terms an independent creative effort does not make.

Posted by Karl-Friedrich Lenz at 04:34 PM

Google as ISP

The Field case opinion says that the Google's cache is legal under the safe harbor provisions of the Digital Copyright Millenium Act, 17 USC 512 (a) to (d), on pages 22 to 24.

That decision rests on the weird idea that the Google cache is "transient", which it is quite obviously not.

The court cites a "Brougher Deposition" we can't see for the claim that the Google cache stores information for approximately 14 to 20 days.

The only way this could possibly make sense is to say that the cache information is refreshed on average about all two weeks. That might be, since Google refreshes some pages (like blogs) daily and most others about once a month in a so called "deep crawl", see this article on the "Dummies" website for more detail.

Now, refreshing something once a month is obviously quite different from having it transiently for only two weeks, as in the case regarding usenet newsgroups the court discussed.

What Google wants to deliver is a permanent copy of the whole Web.

The decision is also wrong to say that the Google cache is compatible with the condition in Section 512 b) (1) C, which reads:

"(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A), if the conditions set forth in paragraph (2) are met."

The user of the Google cache is getting the content from Google, which is not the "person described in subparagraph (A)". Fields is that person.

The lame excuse the court finds for ignoring that clear requirement:

The purpose of Google's cache is enabling access to a page if the user is unsuccessful in requesting the material from the originating site for whatever reason (page 23 of the opinion).

Section 512 requires that the user "requests access to the material" from the originating site. It is not enough that the user "was unsuccessful in requesting".

European law has a similar provision, and since Google can't rely on "fair use" in Europe, it becomes even more important. But this post is too long already.

Posted by Karl-Friedrich Lenz at 11:13 AM

January 27, 2006

Robot Exception

It's not copyright infringement if a robot does it.

I can't find any basis for this "robot exception" either in American or European copyright law.

But the Field case opinion seems to assume exactly that in its misguided observations about an "implied license" (pages 10 and 11).

Under this opinion, Google can say that I give them an implied license, even though I clearly state on this page that I don't give Google anything. Just like Field I have decided not to use "robots.txt" since I don't want to do business on Google's terms.

Any human looking for the license for this blog will find and understand this text:

Please feel free to forward copies of this work to others, mirror it on your homepage or blog, or post it on bulletin boards or P2P networks. However, please leave my name attached and don't edit the work. Commercial use requires separate permission.

Permission for such commercial use is expressly denied to Google. I object specifically to inclusion in their cache and access to my pages with the "Google Web Accelerator".

Even extremist Google cheerleaders like that Nevada court would be hard pressed to find any "implied license" in this. I don't see much of any way to make it more clear to a human reader that Google is not welcome here.

That means a human reader can't assume an implied license. If the court's conclusion is true, Google's robot can do something which a human user could not. If someone builds an Epsilon search engine with exactly three blogs covered, of course that person would need to look at the licenses, and obey any restrictions they might set.

So why would robots be held to a lesser standard? Because, as the court observes on page 5, it would be impossible for Google to look at individual licenses?

That logic would come in handy for the Bush government in their NSA spying scandal. "Since we can't do it under FISA, we need to break the law."

Needing to break the law to do something is no excuse.

If anything, using robots means that you make more reproductions as when doing the same thing from hand. Holding billions of reproductions to a lesser standard than three does not convince me.

And as a matter of policy, of course it is possible to build search engines under opt-in. It even makes sense from a business and efficiency point of view, as the Technorati example shows.

Posted by Karl-Friedrich Lenz at 03:56 PM

Iura Novit Curia

The Field case opinion says that there is no copyright infringement in the first place. The reason stated on page 9 of the opinion is that

"Field does not allege that Google committed infringement when its "Googlebot," like an ordinary Internet user, made the initial copies of the Web pages containing his copyrighted works and stores (sic) those copies in the Google cache."

Therefore the court does not look at that phase for copyright infringement. The whole process of entering a work in Google's database is not relevant to the court.

That is somewhat surprising for German standards. Under German civil procedure, the parties can assume that the courts know the law (iura novit curia). They don't have the burden to "allege" anything or explain the law to the courts. Failing to "allege copyright infringement" does not exclude that particular point from the analysis as long as the facts are undisputed.

Maybe the courts in Nevada can't be relied to know the law without having the plaintiff explaining to them.

Anyway, that reasoning holds only if the plaintiff fails to raise this particular point, which seems to be a rather strange strategy.

Of course, the act of copying the Web page into the search database is either a reproduction under Section 106 (1), or under the opinion's weird theory that Google is "transforming" the Web page by highlighting some search terms preparing a derivative work under Section 106 (2).

Trying to deny even that seems to be a rather extreme idea.

Posted by Karl-Friedrich Lenz at 02:50 PM

January 26, 2006

77 Percent

of users don't even know that Google records personal data.

That number might change. And once consumers demand it, privacy protection might become an important factor in the competition between search engines.

Posted by Karl-Friedrich Lenz at 03:03 PM

Fair Use Search Right

In an opinion about Internet search not directly transferable to the book search cases, a district court has given Google a fair use right to do whatever they want, and then some. See the copies at Lessig's or the EFF blog.

I don't have time right now to discuss this opinion in detail, but hope to be able to come back to it.

Only one first reaction: The court's analysis in this case is based on the fact that the content in question was without any economic value whatsoever and served to anyone interested free of charge. The Google book search cases are different in both respects.

Posted by Karl-Friedrich Lenz at 02:52 PM

January 21, 2006

Dvorak's Anti-Choice Column

John Dvorak cheers for Google and calls opponents "idiotic and naive" in a recent PC Magazine column.

He misses finer points like the fact (supporting his position) that Google does not display whole pages, but snippets for all works it has not licensed.

But his column is another good example for "anti-choice" rhetoric on the question. That is, some commentator telling copyright holders that they are really stupid not to thank Google for kindly spending the $750 million dollars it takes to promote their obscure works.

The problem with that is that it is really non of Dvorak's or other commentators' business what a copyright holder might decide or not with regard to his licenses.

The copyright holder has the choice, not some third party, even if they actually know better.

Posted by Karl-Friedrich Lenz at 11:06 PM

Markey Proposal

Edward Markey is the ranking Democrat on the Telecommunications House subcommittee. He intends to propose legislation that would extend existing privacy standards for cable television companies about individual viewing habits to search engines.

Link found at the EFF blog, applauding the move.

My opposition to Google (already over 50 posts in the category "Google-free zone") is based on their completely insufficient regard for privacy interests, triggered by the "Web Accelerator".

If this kind of legislation is enacted, Google might be forced to change their evil ways.

The same could happen tomorrow if they listen to Ed Felten, who points out that Google is only one or two privacy disasters away from becoming just another Internet company, and should become a privacy leader now. And yes, the latest service (Google video) is again implemented with complete contempt for privacy requirements, well in line with the "Web Accelerator" outrage last year.

Posted by Karl-Friedrich Lenz at 10:46 PM

January 20, 2006

Siva Vaidhaynathan Question

"What standards and principles will Google be using for this project?"

Here is one comment to this question I wrote at a post by Brett Frischmann:

It seems quite obvious that the basic Google magic potion won't work with books. Most books are not built around links (maybe with the exception of law review articles).

But whatever Google uses to calculate BookPageRank, there will be many search results in the back pages no one ever looks at. And those will probably come from the majority of books out of print.

The books that are out of print are there for a reason. They don't generate enough interest anymore.

That in turn means that the benefit for the searching public from including millions of obscure books no one knows or is interested in reading might be quite limited. Google might get not much of a return for most of the $750 million scanning operation.

Posted by Karl-Friedrich Lenz at 11:59 PM

Fishy Google Defense

Yahoo reports that "Defenders of Property Rights" president Nancie G. Marzulla has come out swinging for Google in the library case.

The fishy thing about that is that the "press release" that supposedly came from the "Defenders" is nowhere on their website. Their last press release published there is from November last year.

The "Defenders", which are celebrating the Grokster case as "victory for IP rights" also give the impression of being rather unlikely to call for weaker copyright protection.

Therefore, I have some doubts about the attribution.

In substance, this particular defense brings no remarkable new point, while repeating the error of many other commenters regarding the "portion used".

"Google is only displaying snippets, not the whole work." That, of course, is wrong. Google is displaying the whole work to the totality of users, not only a couple of lines. Many snippets, many snippets.

Posted by Karl-Friedrich Lenz at 11:21 PM

Searching for Sarin

Google is reported to do something right on the privacy front. They seem to resist wishes of the American government to hand data about their usage while other search engines have complied with that request. The original article is at Mercury News, by Howard Mintz.

Much more detailed coverage by Danny Sullivan at searchenginewatch. Kevin Heller at TechLawAdviser has the government motion against Google.

Sullivan makes the important point that no personal data is involved. That in turn makes Google's resistance to this particular request irrelevant from a data protection point of view.

However, this question sure gets a lot of attention. And it shows that there is a reason to oppose storing large amounts of search user data in the first place.

One could, of course, assume that the American government or other governments are not interested in who has been searching with the term "sarin".

One could, of course, assume that it would become public if Google or some other search engine was asked to quietly hand over the search records in question.

I for one am not convinced. Even if all search records at all search engines are not yet easily accessible to various governments in yet another antiterror surveillance program not disclosed to anybody, that can change very fast anytime.

Therefore I think that this case clearly shows that user data should not be stored in the first place, except when completely made anonymous.

See also the "Patriot Search" parody site.

Posted by Karl-Friedrich Lenz at 10:56 PM

January 18, 2006

Three Copies

Brian Dear says that "Lessig is wrong" in a long post about the Google library case.

I don't care for much of the personal attacks against Lessig in that post, though I happen to disagree with Lessig as well on this particular question.

But Dear makes one interesting point I did not see as clearly before.

When scanning something, you first make a digital photography. That is one first copy. Then you use an optical character recognition program to automatically extract text from the photograph. That is a second copy.

Until now, I had just seen this process as one big "scanning" step, involving only one copy of the whole book.

Google then goes on and gives the library a third copy of the whole e-text.

So, while in the Arriba case we have only a graphically enhanced link to something that is on the Internet, Google makes THREE COPIES of the whole book.

They then of course serve the whole book (in many snippets) to their searchers, as I repeatedly have pointed out earlier.

Posted by Karl-Friedrich Lenz at 11:08 AM

January 14, 2006

Google Innovation?

Lessig concludes his presentation with the point that Google is trying to introduce some fabulous innovation here and should not be at the mercy of greedy copyright holders. While Google might be able to fight for their right to innovate, many smaller projects would not.

The problem with this point is that there is really no innovation involved here. Scanning books and making them searchable in a database is not a new technology. Databases have been around for decades. The only new thing is the unprecedented scale, the amount of content.

The other problem is that furthering innovation is the goal of the patent system, not of copyright. Copyright rewards people who create content, not those who just display it in some new way.

Posted by Karl-Friedrich Lenz at 07:55 PM

Google Fair Use Right to Violate the Orphans

Lessig's presentation is right in one point: Clearing the copyrights to millions of orphaned works would be prohibitively expensive and therefore impossible to do.

As I learned from the recent paper by Jonathan Band, Google's cost of scanning 30 million books is estimated at $750 million (page 10). I had not realized that Google is investing this kind of money.

But if you add a modest sum of $1000 per orphan to clear the copyright, the cost soars to 25 billion for clearing the copyrights for 25 million books. Can't be done, even with Google's deep pockets.

That means that Google's plans become one reason more to discuss the problem of how to sensibly deal with the orphan issue. Like I mentioned before, I think that the Japanese solution in Article 67 of the Copyright Law is one good idea.

However, any such solution would need to be enacted by Congress. You can't just go ahead and violate the orphans just because you don't believe that the current law makes sense.

The correct way to proceed is to adress the problem with necessary legislation, not just pretend that it does not exist.

To make that clear with Lessig's own proposal of dealing with the orphans: Lessig wants to go back to a mandatory registration system. That proposal is not necessary if anyone can go ahead and violate the orphans anyway if they find that clearing the rights is too burdensome.

Even if you follow Lessig and say yes, Google should have a fair use right to violate orphans, that does not help them in their lawsuits. All the works the plaintiffs are listing are clearly _not_ orphans. Google knows exactly who owns those works. The complaint tells them.

Posted by Karl-Friedrich Lenz at 07:48 PM

"We Are Not Crooks"

A new Google blog comes with the gimmick of letting readers vote at the end of each post whether the post in question makes them think Google is evil.

That site is utterly unworthy of any attention, since it only rips content off from other Google blogs and presents no original own blog posts. If I want to read Nathan Weinberg, I go directly to InsideGoogle, thank you.

They also get Google's informal corporate motto wrong as "Do no evil", while it actually reads "We are not crooks", see the Google "Code of Conduct" page.

Posted by Karl-Friedrich Lenz at 11:21 AM

January 10, 2006

CRS Report on Google Book Search

BeSpacific points to a "Congressional Research Service" (CRS) report on the Google case, "The Google Book Search Project - Is Online Indexing a Fair Use Under Copyright Law" by "legislative attorney" Robin Jeweler.

While the report does not provide for any answer to the question raised in the title, there are a few interesting points.

One is the fact that Google's wholesale copyright violation is restricted to books. The report points out on page 2 that Google video restricts the search service to works that the copyright holders opt in. One more example that it is definitely not impossible to build a search database while respecting copyright, joining the other examples of Technorati and Google's own "Partner Program" for books.

The other is a comparison to the Sony case. The Supreme Court created the fair use right to time shifting in that case and might well create a new fair use right to online search in the Google case.

I completely agree with the author that it is rather difficult to know now if that will happen in a couple of years.

Posted by Karl-Friedrich Lenz at 10:13 PM

AdWords Trademark Decision

A German court has ruled that buying a trademarked phrase for AdWords is illegal for everyone except the trademark holder (Heise article, in German).

While I rejoice in anything kicking Google's interests by making it harder to sell AdWords, the obvious problem with this theory is that several trademark holders can register the same phrase in different product classes or different countries. Therefore that theory would mean that no one would be left to legally book any phrase that is registered by multiple trademark holders. That does not seem to make much sense.

Posted by Karl-Friedrich Lenz at 09:01 PM

Google Boycott?

Cory Doctorow, who does not seem to be a big fan of DRM, has joined a pledge to boycott all CDs with DRM.

Since BoingBoing has quite a lot of readers, many of which share Doctorow's rejection of all DRM, the pledge quickly gained more than 500 signatures.

I wonder if they will start to boycott Google next for their DRM in Google video.

One poster in the Slashdot thread on this announcement noted that it will be interesting to see the whether the hate for all DRM or the "bowing before Google and their products" will win out on Slashdot. Exactly my feeling here.

Posted by Karl-Friedrich Lenz at 08:31 PM

January 08, 2006

Google DRM

Google has announced to enter the video distribution business and use their own DRM.

That, of course, will not help their good will with the many radical opponents of DRM. The Register already has a critical editorial, asking about "Google crippling culture".

Good. Anything that helps strengthening the opposition to Google makes my day.

Posted by Karl-Friedrich Lenz at 04:48 PM

January 06, 2006

Software Protected by Copyright

William Patry reports that Gregory Aharonian failed in his attempt to have American courts declare software not protected under copyright.

Japanese and European copyright law expressly acknowledge copyright protection for software. American law is the same, though it relies on precedents cited in the opinion.

The plaintiff wanted this declaration since he intends to build a prior art database for software patents.

While Mr. Aharonian lost this round, he might still be able to go ahead with his project.

All he needs is to hope that Google wins their lawsuit and building a "search database" is recognized as fair use. If Google can copy millions of books into their database, surely Aharonian can copy a couple of source code listings into his search database.

So this case is exhibit A for Google's opponents when pointing out that giving Google the right to build search databases would obviously mean that everyone else would have the same right.

Posted by Karl-Friedrich Lenz at 04:09 PM

December 31, 2005

Google Patent Lawsuit

Google has been sued for patent violation, coverage here and here.

Nathan Weinberg hopes Google stands up and fights this patent.

I agree that this patent seems to be without much merit. However, just as software patent lawsuits against spammers are a good thing, any lawsuit against Google is great. This might turn out to be another occasion where I support the software patent holders (like this one before).

Posted by Karl-Friedrich Lenz at 04:27 PM

December 09, 2005

Google Sucks

Dave Winer: "Google sucks".

Posted by Karl-Friedrich Lenz at 07:16 PM

November 19, 2005

Potential Market

Larry Lessig develops some thoughts in response to a recent debate on Google Book Search (renamed such from Google Print lately).

In that post, Lessig seems to say that it is not appropriate to talk about "potential" markets when discussing fair use.

I disagree with that. Section 107 uses exactly that language in factor four.

The question under that factor is: "What market?"

If you are talking only about the market for selling printed copies of the books, Google might point out that authors will sell more books if readers can find them.

On the other hand, if you are talking about the market for search, the picture is somewhat different.

As is clear from this Google page about their Publisher program (the legal part of their project), they are paying copyright holders a portion of their advertising revenue. It is remarkable that they don't disclose that portion, but this is clearly proof that even in Google's view there is some value in having books in the database, and copyright holders deserve a "portion" of that value.

So why should Google have the right to get that value for free in their "Library" project?

Posted by Karl-Friedrich Lenz at 01:39 PM

The Problem With Google

Dan Gillmor's latest Financial Times column "Google's hubris risks nemesis" offers some criticism.

In Gillmor's view, the privacy issue is Google's biggest problem.

I agree completely with that view. I don't buy the idea that search engines can go ahead on an opt-out model under current copyright law and would oppose calls for introducing some new "search exception" into Article 5 of the 2001 Copyright Directive.

That is only a minor point, however. Just as Gillmor, I think that privacy issues are the biggest problem.

And I agree also with his point that there is no need yet to abandon all hope. Google might change.

They have changed their privacy policy recently, and asking people to let them know how they are doing.

Posted by Karl-Friedrich Lenz at 12:51 PM

November 06, 2005

More Many Snippets

I am discussing yesterday's post "Many Snippets" at the Copyfight site. I got this differing viewpoint from Joseph Pietro Riolo:

Nowhere in fair use section says that the sum of previous
snippets must be considered for the purpose of fair use. If
I quote 5 snippets from one book during first year, 10
snippets from the same book during second year, 12 snippets
during third year, and so on, that is still permissible.

Each case of fair use stands on its own and is independent
of past and future fair use cases.

My answer at the Copyfight site:

You are right that section 107 says nothing about the point. It however also does not confirm in any way your view that each use is to be reviewed independently.

We can easily agree that this question is crucial for determining the portion used by Google.

Now, in your analogy, you are quoting a book five times this year. I completely agree with you that these instances should be judged each on their own, as long as this is not part of a plan by you to get around number 3 in Section 107.

If I post the first three lines of the latest Harry Potter book on my blog, and then the second three lines five minutes later, and so on, you bet I am going to hear from Ms. Rowlings' lawyers. They will tell me to stop it, and my protest that I am posting three lines at a time won't help me much.

It won't help me because there is an obvious connection between the single posts.

Now, back to Google.

There is an obvious connection in their case as well.

They include each book only once in their search database. Then they lean back and look at the thousands of snippets flowing to the public from that one act.

That seems to be one reason to view the totality of the use flowing from the act of including the book as the standard when talking about "portion used" under Section 107.

I assume that this question will be discussed in the lawsuit, and when the Supreme Court decides on the issue in 2009, we might get the answer.

However, that decision is not yet in any database. Future caselaw is one area that even Google can't reach.

Posted by Karl-Friedrich Lenz at 12:07 PM

November 05, 2005

Many Snippets

Google apologists still don't get that Google is serving many snippets, not only one.

Among them is a post by Donna Wentworth here, attacking Pat Schroeder for mistaking the portion used by Google.

Here is a comment I entered to that post:

Well, Google is not displaying only one snippet per book, but thousands of them, covering every book almost completely.

And yes, Google has _not_ disclosed what "set of pages" they are talking about in number 6 of their FAQ:

"6. I'm already logged in. Why are you telling me the page is unavailable?

As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users."

This "set of pages" is the only one they are not using to display to searchers, and I don't know how large it is in relation to the whole book.

Quite possibly this restriction applies only to the "Publisher" part, since it talks about "pages". In that case, the "Library" project would indeed use the whole book, and display the whole book, even if to different searchers (which is irrelevant, since Google is the defendant here, not the searchers).

While your position is correct if you are only looking at individual searches, Google is actually serving multiple searches per book.

That does seem to make somewhat of a difference when talking about the "portion used" under fair use.

Posted by Karl-Friedrich Lenz at 11:26 AM

November 04, 2005

Washington Times Blasting Google

Pat Schroeder and Bob Barr on "Reining in Google" in the Washington Times.

Nick Schulz in a reply at Forbes (Don't Fear Google) makes the common mistake of thinking that Google is only using a couple of lines of each book. He overlooks the fact that Google is serving lots of different snippets to lots of searchers, effectively using the whole book when serving searchers. See details in this earlier post.

I agree with his headline, though. The various plaintiffs should indeed refrain from fearing Google, and proceed vigorously with their lawsuit.

Posted by Karl-Friedrich Lenz at 09:12 PM

November 01, 2005

Lessig on Google

in Wired here.

Again he says that Google Print (that should be "Library") should be legal because Google has been doing the same in their core search business.

I agree that both cases are rather similar and search engines need to move to an opt-in model if Google looses their lawsuits, as well they should.

I don't agree that there is any problem with that. I think that I as an author should have the right to stop Google from indexing my content on the Internet, while allowing it to Yahoo, or to stop anybody from indexing as long as they don't pay me for it.

In other words, Lessig's argument makes sense only if you assume that opt-out is the correct legal model for Internet search. That is far from obvious.

Lessig then goes on to say that there is a fair use right to indexing. His reason seems to be related to something Congress meant or not in 1909.

I would rather assume that fair use depends on the purpose and character of the use, the amount used in relation to the work, and the effect on the market, among other things.

As to the first factor, clearly Google's use is commercial. Therefore, many Creative Commons licenses won't give Google any rights, if there is a need for a license in the first place.

The amount used is misunderstood by many commentators as just a couple of lines.

That is wrong. Google is searching the whole text and can serve any three lines to any user, leading to using all lines in the aggregate. That counts, since Google is the defendant here, not individual users.

The effect on the market for the books might be positive. If more people can find a book, more might buy it.

On the other hand, the effect on the market for search licenses is devastating. No one is going to pay for the right to offer search if everyone is entitled to do it for free.

See also the detailed analysis at Scrivener's Error, who distinguishes between the three phases of entering the work in the database, building a searchable index from that and retrieving individual search results.

Posted by Karl-Friedrich Lenz at 09:41 PM

October 27, 2005

Dave Winer's Elgoog Service

Idea described here.

The problem with that idea is that Google doesn't seem to have much of any original content in the first place. All they display is works written by third parties.

So even if they send over their "naked lawyers" to start opting out, those lawyers would lack any rights needing to be cleared as well as clothes.

Posted by Karl-Friedrich Lenz at 01:54 PM

October 23, 2005

Publisher Law Suit

I agree mostly with the new complaint of several large publishers against Google for their "Google Library" wholesale copyright violation program.

However I have one problem with this. At number 29, the complaint says:

" Google purports to justify its systematic copying of entire books on the ground that it is a necessary step to making them available for searching through www.google.com, where excerpts from the books retrieved through the search will be presented to the user. Exhibit B. Google analogizes the Google Library Project's scanning of entire books to its reproduction of the content of websites for search purposes. This comparison fails. On the Internet, website owners have allowed their sites to be searchable via a (sic) Google (or other) search engine by not adopting one or more technological measures."

That is not true. "Allowing" something is not done by simply not adopting an opt-out procedure. If it was, the publishers' case would fail, since Google gives publishers an opt-out procedure for the "Library" project as well.

Accepting that copyright holders have the burden to opt out of Internet search does not seem to be a smart move for the plaintiffs' lawyers to me. It is exactly about how this burden is to be distributed that this whole case is. Who needs to spend the time and money to build the lists of works that are not cleared to show up in the search engine?

Copyright has never placed a burden to opt out on copyright holders anywhere. If it did, who would be the beneficiary of that? Would copyright holders need to contact only Google, or anybody else who wants to use their works?

If you want to use a work, ask for a license. Don't expect the other guy to do your homework and contact you with a "no, you don't get permission" form.

Posted by Karl-Friedrich Lenz at 05:15 PM

October 14, 2005

I Wonder What Happens

if you click on the "I'm Feeling Paranoid" on that "Google 2084" page by Randy Siegel at the New York Times.

Found at Inside Google.

Posted by Karl-Friedrich Lenz at 08:28 PM

October 09, 2005

Dave Winer:

"Google's reader is a huge step backward from what was available in 1999."

Nathan Weinberg agrees.

Posted by Karl-Friedrich Lenz at 12:16 AM

October 08, 2005

Anti-Choice Rhetoric

Some of the people defending Google's right to extend their spam and spyware business to twenty million books without bothering to ask the authors base that defense on what the author should reasonably do if asked.

This line of reasoning can be found for example in a post by Jack Balkin about a week ago:

Every author wishes that more people read his or her books. Most of us would happily stand on street corners with sandwich boards if we thought it would help. Anything that brings our work in front of a larger public should be welcomed as a good thing, not something to be feared. The Authors Guild, and indeed all authors, should be working with search engines like Google to come up with new and creative ways to get people to know about and sample what we have often spent many months-- and sometimes many years-- working on. Authors spend their lives putting the best part of themselves into their books. The cruelest fate they can suffer is not criticism and rejection-- it is being forgotten. The digitally networked environment gives them a chance to avoid that fate. All authors who care about their work should embrace it.

Similar rhetoric comes from Xeni Jardin, who adds some language insulting authors as "saps" and Cory Doctorow.

One could answer this by pointing out good reasons why an author would not want to be included in the index without compensation. See for example this statement from the publisher Penn State Press, where they point out that Google is using one unauthorized copy of their whole books in the index and ANOTHER unauthorized copy as payment for the university library that is supplying the books to them. Clearly the dollar value of that payment should flow to the copyright holders. It does not.

I could add as an author that I don't happen to want some Viagra ad I have no control over to pop up next to search results including my books. I also object specifically to contributing anything to Google's spam and spyware empire, while I don't have a problem with giving Yahoo the rights to a similar reasonable project.

That however is not really the point I want to make here.

The point is that in a discussion about the limits of fair use no one has any business second-guessing the author's choice.

Those who don't want to object to being included in Google's project certainly can easily have that happen. They can either use a license that permits indexing for profit (a non-commercial Creative Commons license probably would be not clear enough on the point) or just sit back and quietly congratulate Google on their good sense in not bothering to ask.

But that choice is up to every author only for his own works. If Doctorow and Balkin want their books indexed by Google, of course they will get them indexed.

They have however no business to try imposing their choice on other authors. Current copyright gives the right to choose on how, if at all, a work is published to the copyright holder and not to any third parties. The copyright holder does not need any reasons, much less convincing reasons, for deciding to withhold a license.

And while I am at it, the new project by Yahoo and the Internet Archive shows that, of course, you can start online indexing of books without violating the orphans.

And another quick link: Publishers are starting to pull out of the "Google Publisher" program because they don't like the "Google Library" part.

Update October 11: More of the same from Cory Doctorow, printing some paragraphs of a Wall Street Journal editorial at BoingBoing.

Posted by Karl-Friedrich Lenz at 12:13 PM

September 25, 2005

Portion Used by Google

I have commented again elsewhere on the latest Google lawsuit. This time at Ed Felten's Freedom to Tinker.

In the discussion there someone made the same mistake I made in yesterday's first post on this lawsuit, confusing "Google Print Publisher Program" and "Google Print Library Project".

However, now I think it is really not relevant how many lines are displayed to an individual user. We are talking about Google's use of the work here, not about that of individual Google costumers.

If Google serves all pages to the totality of costumers, they use all pages, even if the individual act of displaying a search result to an individual costumer only shows a couple of lines.

As Andreas Bovens noted in the discussion at Freedom to Tinker, Google says in the FAQ that they won't display some pages to any user:

"6. I'm already logged in. Why are you telling me the page is unavailable?

As part of our efforts to protect a book's copyright, a set of pages in every in-copyright book will be unavailable to all users."

There is no explanation on what percentage the pages not displayed make up. But at best for Google, these are the only pages that Google is not using. So when calculating the "portion used" under § 107, everything except those pages is the relevant portion.

And that only if you insist that the original reproduction into the database and the fact that every individual user gets to search the whole text with his query are irrelevant.

I disagree with that idea, as discussed earlier here and here. The act of including a work in a search database is already reproduction of the whole work.

Posted by Karl-Friedrich Lenz at 12:34 AM

September 24, 2005

Lessig Defending Google


I placed this comment on Lessig's Blog (here with some hyperlinks):

The lawsuit is asking to shut down the “Google Print Library Project”, not “Google Print”.

A large part of “Google Print” is the “Google Print Publisher Program”. There Google does ask for permission first. That shows that it can be done for many books.

That leaves the orphans. I recall that your proposal for dealing with them was to introduce a $1 registration fee, contrary to American obligations under the Berne Convention.

That proposal would seem to be unnecessary if anyone is free to violate orphans anyway, as you advocate in this post.

As to Google itself being illegal, well, of course they are. What’s wrong with that? It’s about time their free pass on copyright violations expired.

Posted by Karl-Friedrich Lenz at 12:27 PM

Some Pages Might Fail

to be displayed in the first search on a book on Google Print.

Everyone interested in the class action lawsuit against Google for their massive wholesale copyright violations in the "Google Print" project should try this search.

That particular search is for the term "image" in the "Image Processing Handbook" by John C. Russ, selling for $ 121.39 at Amazon.

The Google search returns 738 pages. That is no big surprise. A book about "Image processing" is likely to contain the word "image" on most pages.

It should be a rather easy exercise to come up with some words to recover the pages missing from this first search. This kind of thing can also be easily done by some automatic retrieving robot.

Building that would give a completely new meaning to the word "Googlebot".

I came up with this particular search because of this Slashdot post that described a similar strategy.

One might expect the retail value of this $121 book to fall sharply if everybody can retrieve most pages easily online over Google print. Google's defense that they only display small parts of the works they illegally reproduced into their database is clearly wrong. They display every single word. All you need is to do multiple searches.

And once it becomes clear that search engines don't get a free pass on copyright issues, the real fun can begin with the equivalent class action lawsuit on their core web search business.

See my earlier post (Breaking the Calculator) for some preliminary thoughts on that aspect.

Update: The above post seems to be largely mistaken. It fails to adequately distinguish between the "Google Print Publisher Program" (where the book discussed above is included) and the "Google Print Library Project" (where only a couple of lines are displayed).

Posted by Karl-Friedrich Lenz at 01:21 AM

August 28, 2005

Perfect 10 Case

After the AFP lawsuit against Google, there is the next case. A publisher called Perfect 10 is the next one to sue them. See "A Perfect Storm of Infringement" by Susan Kuchinskas at internetnews.com (found at Bag and Baggage).

Google is copying the files from websites that are infringing Perfect 10's copyright.

That does help Perfect 10 to identify those infringers. However, most of them are somewhere abroad, where copyright enforcement is difficult.

Denise Howell is quoted as saying that the plaintiffs know what they are doing and are "playing for high stakes". I agree. As I noted before, $150.000 in statutory damages per violation paints a large target on Google's back. The plaintiffs won't make that kind of money from a nude photo, even with the most perfect models.

I also note that Google's "robots.txt defense" (we have an implied license to use your web pages as long as you don't tell us differently in "robots.txt") can't work in this case, since the offending pages are on third parties' web sites.

This will be another interesting case to follow. If Perfect 10 wins, search engines will need to start clearing rights before indexing. That would be a major shift in their business patterns.

Posted by Karl-Friedrich Lenz at 12:16 AM

August 27, 2005

Adding Content to Databases

is one form of reproduction. Therefore, the rights for the works included in the database need to be cleared.

One could object that in the case of search engines like Google, the content is not necessarily extractable from the database (that is, if there is no cache accessible to the user).

However, including some work in a database is a reproduction even if the reproduction in question is not accessible to third parties for even further reproductions.

Another reason to support this view can be found in recital 18 of the 1996 database Directive. That says:

18) Whereas this Directive is without prejudice to the freedom of authors to decide whether, or in what manner, they will allow their works to be included in a database, in particular whether or not the authorization given is exclusive;

Authors have the freedom to decide if they want their works included in a database, and if so, under what conditions. In the case of search engines, the author of a web page gets to decide if she wants her page indexed by all engines, by onle one engine, or by only those search engines that satisfy certain conditions that author might want to impose. Respecting European data protection standards, not displaying ads, not displaying ads of direct competitors, or paying a flat fee for inclusion might be some examples.

Posted by Karl-Friedrich Lenz at 10:08 AM

August 25, 2005

"Evil Empire Google"

A quote in this New York Times article describing the quick evaporation of Google good will, while not even mentioning the data protection and copyright problems.

Update: John Gilmore comments under the title "Google's unnecessary arrogance":

"But Google's willingness to flout other norms -- in particular, its grossly insufficient privacy stance, which amounts to "trust us" -- will eventually rebound in ways the company may not appreciate today."

This is my biggest problem with Google. If you are building one of the world's largest databases, you need to pay a corresponding amount of attention to data protection issues.

Posted by Karl-Friedrich Lenz at 11:24 AM

August 20, 2005

Index and Database

I have been discussing Google's and other search engines' right to index the Internet here and here.

I got some interesting objections to my point of view at the shadow blog (k.lenz.name/discuss). Also I noted that a recent Duke Law & Technology Review article by Elisabeth Hanratty (found at Sivacracy) seems to agree with my and Vaidhyanathan's view that Google can't just go ahead without clearing copyrights, but doesn't even mention the word "index".

The main problem with my theory is that what Google is doing is different from building an alphabetic index of a book with Microsoft Word.

They don't build one big index beforehand and then let users search in that index. Instead, they transform the whole Internet into one big searchable database.

For the copyright analysis that means that they are necessarily reproducing all the works in their database.

In contrast, if some Harry Potter fans starts building an index on some wiki page, that wiki page would not need to reproduce the books.

In other words, a fixed index is something very different to a dynamic index, where the results are displayed as an answer to a database query.

While I still think it might be possible to see an index to the Harry Potter books as a derivative work, this question is not ever so important for the search engine question.

Actually, it only gives them some munition to confuse the issue.

The question should be framed much simpler. What exactly gives Google the right to reproduce home pages in their database?

Posted by Karl-Friedrich Lenz at 11:22 AM

August 13, 2005

Time for Opt-in

at search engines, said Daniel Brandt at Google-Watch, one year ago.

While he doesn't mention copyright as one reason, I agree completely. In my view, copyright requires opt-in already.

It took Europe a couple of years to reach opt-in as a countermeasure to the spam nuisance (Article 13 of the telecommunications data protection Directive).

It might take a couple of years as well to reach opt-in to deal with the search engines.

Posted by Karl-Friedrich Lenz at 03:25 PM

Index as Derivative Work II

Google has decided to generously give book authors the chance to opt out of their "Google Print" program.

This has sparked some criticism from Aaron Swartz, who doesn't seem to believe in the right of copyright holders to control the indexing of their works.

In contrast, as I have said in a post a couple of hours ago, I think that the indexing right should be recognized just as the translation right. Building an index is a derivative work.

It would be nice if I could understand Googles "selling out the users to the publishers" (Swartz) as endorsing my theory.

However, that would probably be premature.

All they are doing is giving copyright holders the chance to opt out, while still asserting the right to go ahead indexing without clearing any rights if they don't hear any objections. That is exactly what they are doing with their main index of the Internet. Daniel Brandt at Google-Watch has more on this point.

Of course this is not compatible with my view that the index is a derivative work. In that case, they need to clear all rights before indexing.

Comparing the case of "Google Print" with the main Internet index, I think that the level of infringement is more serious with the main index. That displays the cache of all pages to users, while the "Google Print" project only displays a couple of lines.

Posted by Karl-Friedrich Lenz at 02:45 PM

Index as Derivative Work?

I got some interesting comments to one of my latest posts about "robots.txt". I am not linking to the comments here, since I am trying to hide my shadow site from the comment and trackback spammers. If you are reading this on the main site, the shadow site is at k.lenz.name/discuss. Scroll a couple of days back to a post named "The Robots.txt Exception".

One of the points of that discussion is whether copyright requires permission to build a search engine index.

Under American law, that would be the question if the index is a "derivative work" under 17 USC 106 (2). Under German law, I would have to ask if it is an "adaptation or other transformation" under Article 23.

One clear case is a translation.

It has taken some time until translation rights were recognized, as they are now under Article 8 of the Berne Convention.

The question if building an index is a derivative work or not is now worth billions of dollars. With the Internet, building large-scale indexes of all pages on the web is a business worth a lot of money. Without the Internet, there was no business model for building an index of all published books.

So this question is new.

Comparing to translation, we find a couple of parallels.

Just as a translation helps the work to find more readers (those that don't know the original language), a search engine helps a page to find more readers (those that would not have found the page without the search).

Just as a translation uses the whole work, and not a small part of it (like a citation), the indexing process uses the whole work. Actually, if I understand the process correctly, all modern search engines first use their robots to recover copies of all pages and then run their queries on the local cache, so arguably they make even more fare-reaching use of the whole work than a translation.

This can also be used as a fall-back argument for the copyright holders. Even if the act of indexing is not creation of a derivative work, the original act of harvesting the cache clearly is a reproduction under 17 USC 106 (1).

The difference to the translation business is that translation rights are usually awarded exclusively. For search engines, there is no need to exclude Yahoo because you let in Ask Jeeves.

An example of a use that is clearly not a derivative work is a book catalog or a Amazon listing.

However, these do not use the whole work in question. They only contain some short description of the book in question.

Therefore I think that a search engine index is a derivative work. That means that the indexing needs either an exception or a license. The default changes.

It has taken some time until translation rights were recognized. I expect that it will also take some time until indexing rights are undisputed. The search engines make a lot of money without giving the authors one cent of compensation. They will probably be interested in disputing this kind of right.

Posted by Karl-Friedrich Lenz at 01:07 PM

August 09, 2005

The Robots.txt Exception

Cedric Manara kindly gave me a copy of the complaint and Google's answer in the lawsuit AFP (Agence France-Presse) has started. These documents are available at Eric Goldman's blog here and here.

In Google's answer they say in paragraphs 187 to 192 that they have a license to use AFP's stories because AFP has not required their licensees to exclude robots.

This means they want to rely on the possibility of excluding the Google robot with robots.txt to fabricate a license where there is none.

This in turn is exactly what I needed to make up my mind. As discussed earlier, I was thinking about stopping to block Google's access to my content over robots.txt. While I want to keep Google from copying my works or making a derivative work (the index) from my content, doing so with "robots.txt" seems to actually help their point of view.

I have stopped blocking Google's robot in my robots.txt file as of today.

Google wants a "robots.txt exception". They want to point to the fact that anybody can easily shut them out so as to be able to violate copyright as the default.

I don't approve of that.

The default is that you have to ask first if you want to make copies, or derivative works like an index.

Let those who actually want their content to be copied and indexed by Google specify so in their "robots.txt" files. There is no "robots.txt" exception in copyright law now, and there is no need for it.

Posted by Karl-Friedrich Lenz at 09:24 AM

July 27, 2005

Google-free Zone Logo

There is a nice little site freeriding on Google's fame that lets people make Google style logos out of random text. I tried it with "Google-free Zone", which gives an interesting logo for one of my blog categories.

Posted by Karl-Friedrich Lenz at 02:37 PM

July 13, 2005

Reason No. 132547 not to Use Google

They are working on balkanizing the Internet to accomodate local censorship needs.

Posted by Karl-Friedrich Lenz at 10:24 PM

July 03, 2005

More on robots.txt

I got some interesting comments on my last post in the discussion area I am hiding from the comment and trackback spammers at k.lenz.name/discuss.

One point was a reference to the case of Chip Salzenberg. That was discussed at Slashdot last Thursday here and is documented at a site named GeeksUnite.net.

In that case, Mr. Salzenberg complained to his employer about their practice of ignoring "robots.txt" instructions, among other things. As I learned from his letter complaining about the practice, doing that might be illegal irrespective of the copyright angle under American law.

Then there is of course the issue if it makes sense to call "robots.txt" a DRM measure.

One example in the comments was that some file is not retrieved from server A because of a robots.txt block, but from server B that has no such block. I might add that since I allow people to mirror all my files as long as it is not commercial use, the Google robot might find something I wrote on some other website, so this is a realistic scenario. The comment then says that using robots.txt as a DRM solution may be confusing.

For the time being, I would still like to think of robots.txt as a technological measure that is designed to prevent acts in respect to works which are not authorized by the copyright holder (me). See the definition in Article 6 Paragraph 3 of the 2001 Copyright Directive:

For the purposes of this Directive, the expression "technological measures" means any technology, device or component that, in the normal course of its operation, is designed to prevent or restrict acts, in respect of works or other subject-matter, which are not authorised by the rightholder of any copyright or any right related to copyright as provided for by law or the sui generis right provided for in Chapter III of Directive 96/9/EC. Technological measures shall be deemed "effective" where the use of a protected work or other subject-matter is controlled by the rightholders through application of an access control or protection process, such as encryption, scrambling or other transformation of the work or other subject-matter or a copy control mechanism, which achieves the protection objective.

If so, this case clearly shows one aspect of DRM that I think is relevant for other cases as well.

That is, the effectiveness of DRM depends strongly on the willingness of the people addressed by it to respect the restrictions.

If the company in the Salzenberg case above has a policy of ignoring "robots.txt" instructions, then these instructions are not effective against them.

If on the other hand Google keeps their robot out of sites that have blocked them, that makes the instruction "effective" in the sense of the Copyright Directive quoted above.

Let's make that point clear with another example. The authority controlling a city park might want to keep people from entering some areas to let grass grow freely. So they might enclose the area in question with a knee-high small fence and put up some signs asking people to keep out.

That measure can easily be defeated. All it takes is a small jump over the fence.

If the authority builds the equivalent of the Berlin wall instead (maybe minus the machine guns and landmines), that would be much more effective in physically forcing people out of the area in question.

But even the small fence is enough if people are generally inclined to follow the instructions.

I am still unable to make up my mind on the question of stopping the blocking of Google in my robots.txt file. I will probably come back to that question later.

Posted by Karl-Friedrich Lenz at 11:51 AM

June 30, 2005

Yet Another Case of DRM Working?

Contrary to the belief of many, I think there is no reason to assume that DRM can never work. I have collected several examples in previous posts: Nintendo game boy cartridges, online games (for example Half-Life 2) and mobile phones.

Now I have another possible candidate.

Several weeks ago, I decided to declare my website a Google-free zone and edited my "robots.txt" file accordingly.

Today I discovered some older posts about Google's massive copyright violations at the Unofficial Google Blog here ("Looming Copyright Catastrophe for Google", a post title with a definitely pleasant ring to my ears) and at Search Engine Watch here ("Search Engines Already Infringe").

Thinking a bit about these posts, I came up with robots.txt as a candidate for working DRM.

If I tell Google with robots.txt that they are supposed to desist from copying my content, I am using a technological measure under the definition of DRM in Article 6 Paragraph 3 of the 2001 Copyright Directive.

It is clearly designed to prevent acts of copying by Google not authorised by me. It is also "effective", since it achieves its object of keeping Google from copying my pages.

The other point is that I probably need to stop shutting out Google with this DRM measure.

In my opinion, Google has no right in the first place to copy the whole web.

That point of view however is weakened by the fact that Google can point to the effective DRM measure of editing "robots.txt". So by accepting their way of doing business (steal everything freely and point to the opt-out DRM solution as a substitute for non-existing licenses if someone complains), I am actually helping their cause. Which is of course the last thing I want to do.

I am not sure how should decide that question.

Posted by Karl-Friedrich Lenz at 09:39 PM

May 29, 2005

Three Cheers for Lycos

Lycos Germany has announced to follow one of their costumer's requests to stop storing dynamic IP number usage.

That puts them squarely on the right side of this vital fight about the future of Internet freedom. It is also a marked contrast to Google, which I have banned from my site for exactly that reason.

And it is also a clear success for the costumer who complained based on still existing European data protection standards that actually prohibit watching your costumers around the clock for no particular reason.

Posted by Karl-Friedrich Lenz at 08:40 PM

Yahoo Creative Commons Search

Yahoo has promoted the Creative Commons search option to the "advanced search" page.

Found at the Creative Commons blog.

Posted by Karl-Friedrich Lenz at 07:53 PM

May 17, 2005

Google Traffic

I have discontinued blocking or redirecting traffic from Google links to this site.

When I started this two days ago, I thought that by now all links to my site in Google would be gone, so any redirection or blocking would be only symbolic.

However, in the meantime I learned from Nathan Weinberg at InsideGoogle that Google keeps links to pages in their index and search results even if their robot does not crawl those pages.

I don't want to confuse and annoy users permanently. Therefore I have pulled the blocking script from my .htaccess file.

Posted by Karl-Friedrich Lenz at 11:10 AM

May 15, 2005

Rewriting Requests from Google

Setting up my website to refuse any links from Google and serve "www.yahoo.com" in answer to anyone who clicks a link to my website on Google was rather difficult for me. Took me about one day to figure it out, since I needed to study how the Apache web server's "rewrite engine" does this kind of thing and I had no idea of what "http_referer" exactly means and does.

As a result I included this code in the ".htaccess" file in my web home directory:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^http://www.google.co.jp [OR]
RewriteCond %{HTTP_REFERER} ^http://www.google.com [OR]
RewriteCond %{HTTP_REFERER} ^http://www.google.de
RewriteRule /* http://www.yahoo.com [R,L]

A few comments to explain this. The whole point of this code is to take a request from someone clicking on a Google link to (for example) k.lenz.name/LB and "rewrite" this request as "www.yahoo.com".

I note that this is not very friendly to a person clicking on the link and apologize for that. The alternatives would have been to direct them to a "403" error page by shutting out all traffic from Google as "forbidden" or to direct them to some page on my server explaining what I am doing here.

I decided to go with "www.yahoo.com" since I think that is what Google would find the least pleasing alternative. While this does leave some confused users for a short time, of course I expect any Google links to my site to disappear quickly, so this should not be a lasting problem for users, leaving the symbolic value of pointing to Google's strongest competitor as a decisive advantage over the other alternatives.

The first line "Rewrite Engine On" gives the Apache web server the instruction to turn rewriting on, which is turned off as a default.

The next line "RewriteCond %{HTTP_REFERER} !^$" starts setting conditions. This particular line says that no rewriting should be done if there is no "http_referer" data in the request. That happens with some firewall or browser settings. For example, users can easily turn off broadcasting the "http_referer" data in Firefox by typing "about:config" into the address bar and setting "Network.http.sendRefererHeader" from the "2" default to "0".

The next three lines tell the rewrite engine to look if the request comes from a click on a link from any Google site in Germany, Japan or the U.S.

If that condition is true, the last line gives the instruction to rewrite the request, returning "www.yahoo.com" instead.

I have tested the setup by clicking on links to my site on all of the Google sites above (just entering a direct request into the address bar would not work as a test, since in that case no "http_referer" data is sent). In all cases I got "www.yahoo.com", so it seems to be working fine at the moment.

Update: In comments at k.lenz.name/discuss someone asked that I should go with the "403" alternative instead. I followed that request for the time being and will go back to the symbolic forwarding stuff only after some time has passed and Google has removed their links to me so that there are no users confused by this measure.

Update 2: I have discontinued blocking or redirecting traffic from Google links to this site.

When I started this two days ago, I thought that by now all links to my site in Google would be gone, so any redirection or blocking would be only symbolic.

However, in the meantime I learned from Nathan Weinberg at InsideGoogle that Google keeps links to pages in their index and search results even if their robot does not crawl those pages.

I don't want to confuse and annoy users permanently. Therefore I have pulled the blocking script from my .htaccess file.

Posted by Karl-Friedrich Lenz at 11:32 AM

Shutting Out Google

with "robots.txt" was easy. All I needed to do was add

"User-agent: Googlebot
Disallow: / # I want to keep Google completely out of my site.

User-agent: Googlebot-Image
Disallow: / # I want to keep Google completely out of my site."

to a file named "robots.txt" in my web home directory. This was easily done in a couple of seconds.

Update August 8, 2005: For reasons described in this post, as of today, I have stopped blocking Google's robots over robots.txt. The reason for that is not that I want them to index my files, but that I don't want to contribute to their point of view that they are entitled to violate copyright as a default as long as people can easily shut them out with robots.txt. In my view respecting copyright should be the default, and the Google robot should be shut out of all sites that don't specify in robots.txt that they want to be indexed.

Let those who want Google in say so in their robots.txt files.

Posted by Karl-Friedrich Lenz at 11:31 AM

New License for my Blog

The license for my blog until now simply read:

"All rights reserved. Comments are owned by their authors.

Please feel free to forward copies of this work to others, mirror it on your
homepage or blog, or post it on bulletin boards or P2P networks. However, please leave my name attached and don't edit the work. Commercial use requires separate permission."

As of today, I add:

"Permission for such commercial use is expressly denied to Google. I object specifically to inclusion in their cache and access to my pages with the "Google Web Accelerator"."

Posted by Karl-Friedrich Lenz at 11:30 AM

New Category: Google-free Zone

Two months ago, I had Google as the homepage of my web browser. It would have been fair to describe me as a fan of their service.

Now I have decided to shut them out completely from my website, which I declare a "Google-free zone". I have also started a new category on the blog with that title.

Dave Winer and Dan Gillmor also have expressed some reservations about Google lately. The reason why my reaction is much stronger lies in the fact that I have been opposed to any large-scale collection of Internet traffic data for years.

There is a heated battle going on about exactly this question right now in Europe. Enemies of freedom are gaining influence and want to turn the Internet into one big surveillance instrument. Under these circumstances, it is absolutely unacceptable to try building the world's largest Internet traffic data collection under the misleading excuse of speeding up web surfing. This calls for active resistance to Google, which deserves to be put completely out of business for this move.

To implement my new policy of completely shutting out Google from my pages, I will change the license for my online content, make necessary changes in "robots.txt" and set up the Apache rewrite engine in my ".htaccess" file so as to serve "www.yahoo.com" to everyone who follows a link from Google to one of my pages.

As a consequence, I will lose all readers who came in from Google until now. Also the pleasure of looking at a completely inflated Google page rank for my blog will be gone.

That can't be helped. In a conflict between losing readers and losing integrity, I have the luxury to choose integrity over reader numbers.

Posted by Karl-Friedrich Lenz at 11:27 AM

May 08, 2005

Living Under the Searchlights

The recent Google move to introduce the ultimate SPYWARE while fooling users that they are only "accelerating" their downloads has not done much to change my opinion about them. That's because my opinion about Google was already at a low point, making it difficult to fall any lower.

However, this "web accelerator" is clearly another new level of privacy violation, even if it only affects those who choose to live under the Google searchlights just to get a few downloads done faster.

Therefore, I will take a few minutes to look at whether it might be illegal under current European law.

There are three potential problems.

One is copyright. The service seems to be working, among other things, by using a "prefetch" command. That is, Google is downloading content the user might possibly require next in advance.

This downloading is a reproduction, just as the illegal cache of the whole Web Google is doing is a reproduction.

That means it needs an exception or limitation, since obviously Google has no licenses.

The only exception possible is Article 5 Number 1 a) of the 2001 Information Society Copyright Directive.

That exception requires that the "prefetch" is an "integral and essential part of a technological process whose sole purpose is to enable a transmission in a network between third parties in a network".

The "prefetch" does not enable a transmission in all cases where the user does not choose to actually use the prefetched file. In all those cases, it adds only unnecessary burdens to the whole Internet traffic load. Therefore, it seems to be open to doubt if the exception extends this far.

The next potential problem is data protection. Article 6 paragraph 1 of the 2002 Electronic Communications Data Protection Directive says:

"1. Traffic data relating to subscribers and users processed and stored by the provider of a public communications network or publicly available electronic communications service must be erased or made anonymous when it is no longer needed for the purpose of the transmission of a communication without prejudice to paragraphs 2, 3 and 5 of this Article and Article 15(1)."

Since Google "logs page requests" and does not seem to delete them when the communication is finished, all that keeps them from violating Article 6 is the anonymity of the user. However, since the pages logged may contain personally identifiable information, that defense is rather weak in most cases.

The third potential problem is that of liability for illegal content.

Under Article 13 of the 2000 Electronic Commerce Directive, an exception for liability is granted for "Caching".

However, in this case the exception is clearly restricted to cases where the cache is for the purpose of making more efficient the information's onward transmission to other recipients of the service upon their request. With "prefetched" pages there is no user request.

So if any of the billions of prefetched pages on some user's computer turns out to be illegal in that particular country, there is nothing to stop Google's liability for delivering that particular content.

Summing up, there seem to be some potential legal problems with the "web accelerator" service under European law, especially regarding the "prefetched" pages.

However, the moral repulsiveness of turning the searchlights on your users, as opposed to having them turned on the web content, depends in no way on the finer legal points mentioned above.

If you have a comment or trackback, please go to k.lenz.name/discuss.

Posted by Karl-Friedrich Lenz at 01:34 PM

April 27, 2005

Save the Web?

Yahoo has followed Google's lead and rolled out their own snooping operation. They call it "My Web".

This goes in the same direction as Google's "My Search History" privacy violation service, but goes a few steps farther.

As in Google's case, Yahoo is showing zero sensitivity for the data protection problems involved. And they seem to be inducing their millions of registered users to break copyright on a massive scale.

Their neat new service has a "save" button. Users are encouraged to save any web page to a "my web" page.

What exactly gives Yahoo and their users the right to make those copies? I would be surprised if they had licenses from every single owner of the copyright in web pages.

As a publisher of web content, I can't recall to have given Yahoo permission to use my content commercially. Many Creative Commons licenses also don't allow commercial use. And there are even web publishers who reserve all their rights (actually that's the default if you don't declare anything).

Search engines have been given a free pass on copyright violations for too long. Just running a search business along with your advertising and media empire doesn't give you the right to steal everyone's content.

The question of how Google's cache of the Web is compatible with copyright standards is related to this. Cory Doctorow noted that *of course* Google is infringing copyright with that, and implying "as well they should" here.

In contrast, I don't approve of copyright violation on a massive scale.

European law on the exceptions and limitations of copyright is laid down in Article 5 of the 2001 copyright Directive. Number 1 a) gives an exception for caching that does not help Google and Yahoo in making wholesale copies of the Web, since it requires that the cache in question has the sole purpose to enable transmission in a network between third parties by an intermediary.

Searching for an exception or limitation for the purpose of building a search engine comes up with zero records in Article 5. It may be noted, however, that the Wayback Machine of the Internet Archive might come close to qualifying for an exception under Number 2 c) (however, the requirement of "specific acts of reproduction" might be not fulfilled).

Google and Yahoo don't even come close anywhere. They are ripping off other people's content for their own commercial gain, without a shadow of an excuse to do so.

As a matter of public policy, I would oppose changing the law to allow search engines to copy other people's works freely. If they want to build a cache, they can do so with the content of authors who want their content mirrored even for commercial gain and declare that license in a machine-readable way. There is no need to assume implied licenses where there are none, and absolutely no excuse for grabbing content from web publishers who are opposed to having their works mirrored to help Google make still more billions of profit.

Have a comment or trackback? You are welcome at k.lenz.name/discuss.

Posted by Karl-Friedrich Lenz at 03:40 PM

April 21, 2005

Really Bad Idea

says World Privacy Forum executive director Pam Dixon about Google's latest feature making search histories of individual users available, according to this Wired article.

I agree completely. If Google made this snooping operation part of the standard package, as it has done with the downloading of files without the user's consent or knowledge earlier, I would stop using their service completely.

At least they had the good sense to require users to request this massive violation of privacy. Therefore, while I will take most of my searching business elsewhere as a consequence of this, I might still consider falling back on Google occasionally if I can't avoid it.

Posted by Karl-Friedrich Lenz at 11:34 PM

April 02, 2005

Firefox Security Issue

Google has added a "cool feature" to their service recently.

That feature hijacks my PC to download random pages from the Internet without my knowledge or consent.

This is completely unacceptable. With my computer, I beg to be able to decide myself what I want downloaded and what not.

Apparently, while Google is making use of this big time, this "feature" is enabled in Firefox by default. So the responsibility for this rests originally not with Google, but with whoever had this bright idea at the Firefox development team. However, they are contributing massively to the problem. Downloading of potentially illegal files to their user's computers without the user's knowledge or consent is not a "cool feature", but a recipe for disaster to happen, even without considering the waste of Internet bandwidth associated with this.

I am not yet ready to stop using Google completely.

If you happen to share my feeling that this is a really dumb idea, Firefox makes it easy to disable hijacking of your browser by any page you visit.

Essential instructions for this (takes only a couple of seconds) here.

Comments and trackbacks are welcome at the Google pagerank zero shadow site.

Posted by Karl-Friedrich Lenz at 09:37 PM