Friday, October 9, 2009

This week in search 10/9/09

This is part of a regular series of posts on search experience updates that runs on Fridays. Look for the label "This week in search" and subscribe to the series. - Ed.

This week we made many small improvements to the functionality and usability of our search results. Here's an overview:

Quick, in-browser viewing of Google Docs
We've incorporated the "Gview" tool from Google Docs on search results. Instead of the old "View as HTML" view, PDFs on search results now have a "Quick View" link that shows you Gview's image-rendered version, which preserves tables and graphics from the document. This means you can view PDF documents quickly and easily right in your browser.
Example search: [1099] (note the "Quick view" links on the first two results)

An improved interface for local information in web search
We changed our interface for local business information when it occurs in search results. It's now much more readable (larger fonts) and friendlier to use (easier to click on just what you want).

Example search: [pizza palo alto]

Here are before and after shots for the search [bakeries san francisco]:

Click on either image for a larger version

Access to multiple providers in weather results
We also changed the interface for our weather results when they occur in web search. Now you'll see an array of different weather providers to choose from, including The Weather Channel, Weatherunderground and AccuWeather, if you want more detailed weather information. This way, you get the weather information you need, in the layout you prefer, from the service you choose.
Example search: [washington dc weather]

And here are before and after shots for the search [weather] (when done from the Googleplex, since the location is auto-detected):

Click on either image for a larger version

Public service information for searches related to poison control
While it's relatively infrequent, people do occasionally turn to Google during medical emergencies. Our goal in these cases is to get our users the help they need as quickly as possible. As of this week, searches related to [poison control] trigger a special result with the phone number for the poison control hotline.
Example searches: [poison control], [first aid bleach poisoning]

Search options panel for mobile
This week on mobile search, we added a Search Options panel so now you can get all of the same slice-and-dice functionality you have from your desktop when you search the web on your phone. Try doing a search from your phone and you will see an "Options" link on the righthand side above the results. Click on it and you see the same panel that you are accustomed to on search from your desktop.

Crawling AJAX
We also made an exciting announcement this week about making AJAX crawlable. Web applications are becoming increasingly popular, but much of what is contained with a web application is usually inaccessible to our crawlers and thus can't be found in our search. Our team has been busy working on techniques for how to crawl AJAX. This announcement just represents the start, as it's currently in the prototype phase, but it does demonstrate that we are constantly working on how to improve search — our features, ranking, and, in this case, our comprehensiveness. We're always very excited to include new content in our search to make our results even better.

Hope you enjoyed this week's features. Stay tuned for what's next!

New in Google Squared: quality improvements, sorting and exporting

Today we're launching a number of improvements to the amount and quality of information you can find with Google Squared, as well as new tools to sort and export the data.

As we explained when we first launched Squared in Labs this summer, the product takes on a difficult technical challenge. It's a first step towards automatically extracting useful facts from all over the web and presenting them in meaningful way. It has the potential to be particularly useful for research questions where the answers may not live on a single website, but instead must be combined from many different pages.

Rather than return a list of the most relevant websites, Squared returns a "square" (or table) of facts, sourced from across the Internet. For example, if you search Squared for [us presidents], each row on the resulting table represents a particular United States President, and the columns include relevant facts about him, such as date of birth, a picture and a short description.

At launch, your first square could include at most 30 facts. With today's update, squares display four times as much data — up to 120 facts. For example, instead of seeing only five presidents and three categories, now you'll see a table with 20 presidents and up to six attributes.



The quality of the information is also better, because we're ranking based on both relevance to your query and whether we can find high quality facts. For example, in the past we would show you a column for "First Lady" even if the column only included a couple accurate names. Now we're actively filtering out items (rows) and attributes (columns) from the initial square if we haven't found enough accurate data. Perhaps more interesting, we built Squared to learn from edits and corrections, so as people have been improving their squares, Google Squared has gotten better for everyone.

In addition to improving the information in Squared results, we've also added the ability to sort columns, so you can rank, group and compare items. Squared will even convert units in the background to make sure the data is sorted properly. For example:
We've also added the ability to export data from Squared to a Google Spreadsheet or a CSV file, which should make it easier to do interesting things with the data. For example, you can build a square for [african countries], add more items and columns, and examine the relationship between the literacy rate and GDP per capita. Once you've built your square to contain all the information you need, you can export the square to Google Spreadsheets and create a rough scatter plot:


There's a lot left to do before Squared is ready to leave Labs — we're still working on improving quality as well as the user interface — but we hope that our recent improvements make it more useful. In its experimental stage, Squared demonstrates an important future direction in search: understanding structured data from across the web to build new tools for organizing and presenting information. Try it out, and let us know what you think.

A tale of 10,000,000 books

The fundamental reasons why the electric car has not attained the popularity it deserves are (1) The failure of the manufacturers to properly educate the general public regarding the wonderful utility of the electric; (2) The failure of [power companies] to make it easy to own and operate the electric by an adequate distribution of charging and boosting stations. The early electrics of limited speed, range and utility produced popular impressions which still exist.
This quotation would hardly surprise anyone who follows electric vehicles. But it may be surprising to hear that in the year when it was written thousands of electric cars were produced, and that year was nearly a century ago. This appeared in a 1916 issue of the journal Electrical World, which I found in Google Books, our searchable repository of millions of books. It may seem strange to look back a hundred years on a topic that is so contemporary, yet I often find that the past has valuable lessons for the future. In this case, I was lucky — electric vehicles were studied and written about extensively early in the 20th century, and there are many books on the subject from which to choose. Because books published before 1923 are in the public domain, I am able to view them easily.

But the vast majority of books ever written are not accessible to anyone except the most tenacious researchers at premier academic libraries. Books written after 1923 quickly disappear into a literary black hole. With rare exceptions, one can buy them only for the small number of years they are in print. After that, they are found only in a vanishing number of libraries and used book stores. As the years pass, contracts get lost and forgotten, authors and publishers disappear, the rights holders become impossible to track down.

Inevitably, the few remaining copies of the books are left to deteriorate slowly or are lost to fires, floods and other disasters. While I was at Stanford in 1998, floods damaged or destroyed tens of thousands of books. Unfortunately, such events are not uncommon — a similar flood happened at Stanford just 20 years prior. You could read about it in The Stanford-Lockheed Meyer Library Flood Report, published in 1980, but this book itself is no longer available.

Because books are such an important part of the world’s collective knowledge and cultural heritage, Larry Page, the co-founder of Google, first proposed that we digitize all books a decade ago, when we were a fledgling startup. At the time, it was viewed as so ambitious and challenging a project that we were unable to attract anyone to work on it. But five years later, in 2004, Google Books (then called Google Print) was born, allowing users to search hundreds of thousands of books. Today, they number over 10 million and counting.

The next year we were sued by the Authors Guild and the Association of American Publishers over the project. While we have had disagreements, we have a common goal — to unlock the wisdom held in the enormous number of out-of-print books, while fairly compensating the rights holders. As a result, we were able to work together to devise a settlement that accomplishes our shared vision. While this settlement is a win-win for authors, publishers and Google, the real winners are the readers who will now have access to a greatly expanded world of books.

There has been some debate about the settlement, and many groups have offered their opinions, both for and against. I would like to take this opportunity to dispel some myths about the agreement and to share why I am proud of this undertaking. This agreement aims to make millions of out-of-print but in-copyright books available either for a fee or for free with ad support, with the majority of the revenue flowing back to the rights holders, be they authors or publishers.

Some have claimed that this agreement is a form of compulsory license because, as in most class action settlements, it applies to all members of the class who do not opt out by a certain date. The reality is that rights holders can at any time set pricing and access rights for their works or withdraw them from Google Books altogether. For those books whose rights holders have not yet come forward, reasonable default pricing and access policies are assumed. This allows access to the many orphan works whose owners have not yet been found and accumulates revenue for the rights holders, giving them an incentive to step forward.

Others have questioned the impact of the agreement on competition, or asserted that it would limit consumer choice with respect to out-of-print books. In reality, nothing in this agreement precludes any other company or organization from pursuing their own similar effort. The agreement limits consumer choice in out-of-print books about as much as it limits consumer choice in unicorns. Today, if you want to access a typical out-of-print book, you have only one choice — fly to one of a handful of leading libraries in the country and hope to find it in the stacks.

I wish there were a hundred services with which I could easily look at such a book; it would have saved me a lot of time, and it would have spared Google a tremendous amount of effort. But despite a number of important digitization efforts to date (Google has even helped fund others, including some by the Library of Congress), none have been at a comparable scale, simply because no one else has chosen to invest the requisite resources. At least one such service will have to exist if there are ever to be one hundred.

If Google Books is successful, others will follow. And they will have an easier path: this agreement creates a books rights registry that will encourage rights holders to come forward and will provide a convenient way for other projects to obtain permissions. While new projects will not immediately have the same rights to orphan works, the agreement will be a beacon of compromise in case of a similar lawsuit, and it will serve as a precedent for orphan works legislation, which Google has always supported and will continue to support.

Last, there have been objections to specific aspects of the Google Books product and the future service as planned under the settlement, including questions about the quality of bibliographic information, our choice of classification system and the details of our privacy policy. These are all valid questions, and being a company that obsesses over the quality of our products, we are working hard to address them — improving bibliographic information and categorization, and further detailing our privacy policy. And if we don’t get our product right, then others will. But one thing that is sure to halt any such progress is to have no settlement at all.

In the Insurance Year Book 1880-1881, which I found on Google Books, Cornelius Walford chronicles the destruction of dozens of libraries and millions of books, in the hope that such a record will “impress the necessity of something being done” to preserve them. The famous library at Alexandria burned three times, in 48 B.C., A.D. 273 and A.D. 640, as did the Library of Congress, where a fire in 1851 destroyed two-thirds of the collection.

I hope such destruction never happens again, but history would suggest otherwise. More important, even if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily. Many companies, libraries and organizations will play a role in saving and making available the works of the 20th century. Together, authors, publishers and Google are taking just one step toward this goal, but it’s an important step. Let’s not miss this opportunity.


(This first appeared in the New York Times, available here.)

Grab this Widget ~ Blogger Accessories