Log in

No account? Create an account

May 10th, 2007

Web 2.0 ... II: Metadata

The fundamental problem with the World Wide Web -- to use the arcane term that dates me as Generation  X techie that differentiate documents from the physical "Internet" (instead of Generation Y) -- is not the enormity of information but the impotence of search engines.  Google, Yahoo and MSN robots still crawl websites as is they were the flat-files of the Web 1.0 era (1993-1998).  Google tried to mitigate these problems server-side by encouraging webmasters to submit sitemaps, but the solution is searchable metadata.  All the juicey reading material on the Web is stored in databases, each with their own, proprietary interfaces and languages.  The Open Access peeps claim that OAI will solve the problem by extracting such metadata from texts housed in public databases and house this data-about-Web-data in registry databases that can be searched from some single interface, such as OAIster and The Registry.  I'm a skeptic; as any programmer will tell you, 'getting two databases to "talk" to each other is hard; getting databases that already speak in different languages to "talk" in the same language is ...  a headache'.  Basically OAI is like asking a native Englishman and native Chinaman to learn Esperanto and go fluidly debating, despite the Englishman having a slight advantage because the middle-language is a European derivative.  The indices or so-called subjects of the LCC (Library of Congress Catalog) are a classic example of wedging unfamiliar language between the language of the books and the language of the patrons, where the latter typically cannot find what they are looking for because the librarian cataloged the book as "learning theory" instead of "pedagogy".  The inside cover of most modern books demonstrate this problem.  The librarian's advantage in knowing the cataloging language is useless if the purpose of the library is patron-based.  Something must be done.