Main menu:

Site search

Categories

Archive

Das Janota-Officium

We were testing some search word tracking functionality that we are integrating to the Anole CMS system and entered anĀ anolecms search in Google. About the 8th result down caught my eye, as I don’t read German. It starts with Das Janota-Officium and Philip clicked it.

We were taken to this page.

The Anolecms result was misread by the Google OCR that read the scan of the book page. The real text was “Analects”. What surprised us is that Google really is gathering all known data within Earth’s atmosphere and is archiving it so we can search through it. If you’ve looked further into this than we did I have a couple questions to toss out to you.

  • What limits are imposed by copyright laws?
  • How much progress have they made toward “all books from the history of man” status?
  • Who sits around scanning these tomes into the server?
  • Do they have their own OCR running through there?
  • Is it only old books or are there book from this decade in there?

I’d love to hear from you if you want to take a couple minutes to share your findings.

Comments

Comment from Darin Codon
Time: May 3, 2007, 8:08 pm

Classics, those books no longer regulated by copyright have been online for some time. Google’s book program has been a real issue for them. I understand that publishers have applied limitations and as long as there is a revenue model…

Write a comment