A search engine newsletter I subscribe to passed this link to me, and man is this amazing! Seems that Google is setting up a Catalog search (currently in Beta). Now this didn't seem like it would be too hard for them, considering their great indexing of webpages, PDFs, newsgroups, etc.. But then I went to http://catalogs.google.com/ and did a search. It looks like the catalog pages are images, yet they're highlighting the search terms where they come up! How do they do this?!!
Impressive(!!), but not very practical for someone on dial up (still waiting for page to finish loading).
Interesting how they are doing it though, using OCR to make a database then overlaying the highlighting over the image before they serve it back to you. That's quite a piece of work.
"3. How does catalog search work?
Google scans printed copies of the catalogs and automatically converts the text portion to a format that can be searched. Google then uses the same sophisticated algorithms that power our web search to enable search over the catalog content and ensure that the most relevant catalog pages are presented first in your results. The scanning and character recognition processes are not perfect, so you may notice occasional errors in page numbers or encounter pages that are hard to read. We apologize for these problems and are working to fix them in later versions of this service.
#3. "RE: Looks like Google's doing it again" In response to Grogan (Reply # 2)
Thanks Grogan, I completely missed that Catalog help section. I figured it was some sort of OCR. I agree that this would be mainly a broadband users only feature. Still amazing what those guys can come up with.
They may refine it (use higher compression for the images). When you click to see the full size image, the resulting image is over 200K in most cases. I think they could probably get away with a bit more jpg compression. This is just in testing phases after all.
#5. "ancedRE: Looks like Google's doing it again" In response to Grogan (Reply # 4)
Fantastic idea.
I have the advanced features set for 100 returns. I typed in Computer and Google loaded 81 of the returns in 5m 28s on my dialup. The last 19 returns had little squares with a red x in them. Looks like they need to do a little more work on it.
#6. "RE: ancedRE: Looks like Google's doing it again" In response to dubber (Reply # 5) Fri Dec-21-01 04:08 AM
Those last returns with the red x in them were the result of the page timing out on you while downloading. That is a really long download time for a web page!
You can try to force the images to show by right clicking on the red x and then clicking on Show image. That way you don't have to wait for the whole page to reload from hitting Refresh in the browser.