I’d like to expand my point of view in today’s class a little bit. It's something re Google Books Project.
As is known to us, Google divided the all the books in the project into several groups: the ones already in public domain could be fully viewed online or downloaded; as to the ones still under copyright but the author opts in as a partner with Google, the viewable pages is decided according to the contracts; for books that may be covered by copyright and where the owner has not been identified, the full text is searchable but only "snippets" (two to three lines of text) are shown in response to customer’s search requests.
There is no problem with the first two groups. It is the third group that brings some trouble. The authors (or the Guild) allege it is copyright infringement because Google makes copies of copyrighted works by scanning them and store the digitalized copy in its database.
This made me think of the regular search engine, which is conducting almost the same copying behavior as Google book. It seems difficult to find a sound legal basis to argue industrial practice in the dispute, but it confused me that same behavior leads to two results.
According to my knowledge, the way most search engines work is that a robot keeps crawling through thousands of hundreds websites every minutes and coping the web pages into a big database. When a customer submits a search request to the search engine, the engine would search for the terms in its own database and respond with the results.
Many of the web pages are actually carrying copyrighted works, but nobody objects to be crawled and copied by the search engine. The reason is that it is a default that all the web pages want to be searched and viewed by the customers. Being crawled and searched by search engines greatly increases a webpage’s chance to be found and viewed by a customer. If a page doesn’t want to be searched by the search engine, it could easily adjust its metatags or other technical feathers to “opt-out”.
So it seems to be a wide accepted custom or rule in the search engine industry that copying copyrighted works for the sole purpose of building a database to be searched instead of displaying the contents is allowed.
The Google Books Project is in a similar situation. Digitalizing the books is just a process of building a search engine database, like the robot crawling through the websites. Google doesn’t substantially display the contents to the public. On the contrary, people are able to know certain books contain the information they need by searching the full context and could be directed to buy the hard or digital copies. So Google Book is actually increasing the sale of books.
Intuitively, mass scanning books is more acceptable and easier to be regarded as an infringement. However, when we think it over, we can see there is no essential difference between the website search engine and book search engine. If Google Books is copyright infringement, most of the regular search engines should also be.
Could anybody explain why the same behavior leads to two totally different results? One is OK while the other is copyright infringement? The only reasonable answer I could think of, though I still doubt it, is regular search engines are actually infringing. People can sue the regular search engines if they want, though nobody actually sues them because it is an industrial practice which has peacefully existed for more than ten years and the copying process is not as visible as scanning so that it is not as acceptable by ordinary people.
Both legal and technical comments are welcome!
Cheers,
Lawrence
No comments:
Post a Comment