ABSTRACT: New technologies are sometimes greeted with skepticism. XML is a key technology of strategic value to libraries. It can serve to help libraries have a more expansive role in the increasingly digital environment in which we must operate. After touching on open access and open source software, an exploration of issues surrounding improved information access will focus on the core issue of a schema for bibliographic and authority data that is oriented to the Web environment. Mapping of MARC to XOBIS, Lane Medical Library's experimental schema, will illustrate problems and prospects of such an endeavor.
|
Thank you. Hello, My name is Dick, and I'm a cataloger. [Kevin got a bigger laugh] The first step in self-help programs is to explore whether there is a problem. Cataloging is my avocation, and I even enjoy doing database cleanup late at night to escape from life's daily stresses. However, I view cataloging as a window thru which to observe the structural relationships of content, metadata, and systems interfaces. The ongoing transition from traditional to digital resources, and how these two coexist and are managed by libraries is in flux. Today, I would like to share some of my thoughts on trying to understand a very complex and changing environment, particularly as they relate to XML and our experimental schema XOBIS. Before going any further, How many are familiar with XML? How many MARC experts do we have in the audience? Web gurus? |
|
|
When Steve called me, I was surprised that the invitation wasn't XML-specific. He mentioned your theme of "Racing to a Bright Future". Themes for meetings are often just marketing, but the one for this meeting struck a chord with me. At the 2000 Meeting at ALA, RUSA, a public services group, sponsored a forum on "Is MARC Dead?" The panel ended on a rather somber tone. Most memorable to me were two discouraged librarians walking away from the meeting discussing what career options they had, since there was no future in cataloging. This perplexed me since it seemed to me we were just getting technologies like XML that could underpin improved functionalities not previously possible and make all our hard work pay off. |
|
|
As I thought more, it seemed to me that I didn't want to describe many competing schemas that might "win" a race. I was more concerned that the limitations of MARC and AACR might be better understood and increase the possibility of achieving an XML schema, perhaps different than any of the current ones. It is really more of a question of what we're doing as a profession. A number of library schools have closed. Cataloging in particular faces challenges. Why does it seem we value metadata specialists, who may apply simple set of Dublin Core elements to Web resources, to catalogers who must produce records to exacting standards, interpreting elaborate cataloging rules and encoding their efforts in what might arguably be considered the most complex scheme in general use the world today, MARC. |
|
|
Catalogs have to compete with other resources. Users are reluctant to search multiple resources, often preferring databases and fulltext journals to the catalog. There are too many of these silos to chose from. Converting to XML increases the flexibility in integrating cataloged resources with other Web resources. Often, the "competing" resources are largely bibliographic or authoritative in content. Are Web resources different enough to merit separate mechanisms for bibliographic control? Should "the catalog" be a more flexible animal that can respond to changing needs, emphases, trends, etc. I call this tendency toward separate treatment of traditional and digital as bibliographic apartheid. However, the existing MARC-based bibliographic apparatus hinders our likelihood of success in an increasingly competitive environment. Integration within the library and migrating between systems is more difficult; having to change everything simultaneously isn't always wise. Perhaps more importantly, integrating what we do with other agencies and initiatives within our environment is difficult. Here you can see various traditional functional areas and some of their interrelationships. External agencies and resources are a larger concern nowadays. Publishers use XML as Onyx, and universities are beginning to communicate with XML internally. The library information is segregated in MARC or proprietary formats. |
|
|
It helps me to cut thru the confusion to consider these layers. The top layer provides interfaces to and presentation of various resources (altho Z39.50 has not been a panacea in this regard.) Metadata is a confusing term. It is helpful to think of database or cataloging records as Managed Metadata in that it should be changeable independently of the resource referenced. The bottom layer represents digital content, fulltext, images, etc. that may contain embedded metadata. Changing embedded metadata would constitute changing the information object. Physical materials fit here nicely also, with managed metadata to identify their location, just as a URL identifies the location of digital content. Ideally, all of these layers could benefit from XML. The Extensible Markup Language has become a de facto standard for data and document exchange and continues to grow in popularity. Even Microsoft has incorporated support into several of the components of its new Office suite, although recent news indicates that some features will only be available in premium versions. |
|
|
To keep my coverage of XML succinct, I'll quote a couple of paragraphs for our book "Putting XML to Work in the Library": "Among Web technologies, XML stands out as strategically significant to libraries. Foremost, its single semantic markup syntax serves as a foundation for the development of Web-based information systems. This applies at multiple levels from raw documents to the machinery of sophisticated, interactive Web interfaces. Fulltext content of digital documents can use the same syntax as used in separate metadata records. Furthermore, associated technologies, designed specifically for XML, facilitate flexible document and data management, processing, merger, presentation, etc. Each of these is optimized for dealing with specific aspects of efficiently managing information on the Web. Harmony results from XML's shared syntax and the strength of its arsenal of tools." "As a markup language, XML goes far beyond the display markup that has been vital to the Web's success. Emphasis on display has limited the effectiveness of string and keyword searching. Librarians appreciate the power that fulltext indexing and ubiquitous access bring to otherwise dispersed and inaccessible information, but also immediately recognize limitations less obvious to the untrained eye. Savvy librarians can help users find the proverbial needle in the haystack. However, as the haystack continues to grow (and haystacks multiply), librarians also need to focus on strategies for improving access to that portion of the digital resources that really matter--the educational, scientific, cultural, etc. materials that have traditionally been the concern of archives, libraries, and museums." … "As libraries use XML more extensively, opportunities for collaboration and synthesis are bound to emerge. XML holds the promise that with broader use, libraries will one day be poised to offer much more than the sum of our individual efforts. Such success may well lie in the degree to which we can establish XML-based information standards." |
|
|
In a nutshell, 1) it underpins interoperability; 2) it also portends data longevity or future proofing because it is a neutral format; 3) it separates content from display and functionality, facilitating reuse of information; 4) it is extensible and different schemas can be referenced using namespaces; 5) it has advanced linking techniques; 6) it uses Unicode; 7) and most of all XML is ready to communicate in the web mainstream. With this thorough explanation of XML, I think it's time for a quiz. |
|
|
What do these words have in common? |
|
|
Would a hint help? These are all personal surnames. XML Markup can help deal with such ambiguity, bound to be a growing problem on the Web. |
|
|
This is my colleague Kevin Clarke who is an up and coming librarian-programmer at our library. It is unusual to find someone with experience in both rare books cataloging and computer programming. He has been instrumental in the success of our work. Next, I want to quickly show some Web resources to illustrate that others are providing information that could be better coordinated and become more useful if integrated with library resources. Interdisciplinary study is difficult when every resource treats similar information differently. This is an area where librarians could exercise more leadership, if our alternative were not so complicated. |
|
|
The UK database of historic parks and gardens includes links between persons and places, as well as including gender and dates. |
|
|
Jablonski's database of syndromes includes a text string, it's equivalent in Medical Subject Headings, and often "Personalia" linking those who were involved in the discovery and description of the often eponymous syndromes. |
|
|
The Mathematics Genealogy Project goes even further, providing relationships between mathematicians, their advisors, their student, their institution, etc. |
|
|
Namebase provides relationships between times people were in different places and works referring to them, and many others using graphical displays. |
|
|
This German resource provides cross-referencing of forenames by language, including links between the masculine and feminine forms and the origin, this case the relationship to the word faba, Bohne, or bean in English. |
|
|
[Slide 16 Planetary Nomenclature] This resource links named features to their astronomical body, in this case a crater on Venus, named after Florence Sabin, and includes their namesake and date of naming. |
|
|
Individual efforts make huge amounts of information available. This site includes definitions and name associations for all kinds of concepts, laws, etc. as well as associating chemists with their institutions. One might not expect to find this in this Canadian directory. |
|
|
This slide illustrates how easily such relationships can fit into authority files, in this case mapped to our community information format for public access. |
|
|
This search subsets Italian anatomists with links to portraits in the record. |
|
|
In the browse search for dermatology, you can spot dermatologists, organizations relating to dermatology, etc. Unfortunately, it's not integrated with the catalog and the limitations on indexing and display make it difficult to present the information effectively in MARC-based ILS systems. These may seem an odd assortment of resources, but they illustrate that the same kinds of information are being treated incompatibly in various places, that relationships between the data elements are important, and that plenty of folks are willing to compile such information--often as labors of love. |
|
|
Now, let's take a look at XOBIS. XOBIS is an experimental schema designed to model combined bibliographic and authority information and to stimulate interest in finding an XML replacement for MARC. Organic refers to common repeating patterns, especially sequential and hierarchical relationships. The following slides provide a taste of the tightly integrated whole, developed using the RELAX-NG schema language. |
|
|
|
|
Point out root element RecordList and potential for
RecordCollection to deal with merging records from different
sources. |
|
|
In developing XOBIS, we asked lots of hard questions to try to arrive at fundamentals, instead of focusing on quick solutions to our immediate problems. Author, title, subject, & form/genre sound rather fundamental, don't they? |
|
|
However, questions like this altered our perceptions. As we delineated PE's, such as Being, Concept, and Work, we found that author, subject, and form/genre were relationships between them. Often OPAC labels reflect relationships. This is based on what we found recorded in libraries and museums, pragmatic rather than theoretical. Titles are simply names for works. Schema development can be frustrating, but also very rewarding. |
|
|
XOBIS' 10 PEs are on the left with an "isa" relationship to the example on the right. World War III is a fictional Event. Fictional is a good example of universality. Using relationships, real and fictional mice can be distinguished; Minnie Mouse is a fictional mouse. |
|
|
All of the PEs derive from Concept, although String (a word or phrase) makes this somewhat of a chicken/egg issue. Six of them are "notional", representing the idea of something, the Concept heart being all hearts everywhere, distinguished from a single heart specimen being an Object. "Substantive" PEs can be owned, licensed, circulated, etc. with Work based on physical carrier. Intangible examples of these, such as our fictional examples, are lumped with the tangile ones because they behave homogeneously. Holdings and Items can attach to these, although XOBIS considers them separate schemas of a suite. |
|
|
This slide indicates how the PEs inter-relate … The Concept Volcanoes show the Name of its Entry, and may have Varia. It can be instantiated by specific, named volcanoes, which in this case are Places. Keeping the specifics separate could shrink the size of controlled vocabularies. |
|
|
We ended up with a great deal of recursion-- self-referencing or tangled hierarchies in Hofstader's Gödel, Escher, Bach. Here, a Place is qualified by another Place, with an Abbrev as a substitute used in lieu of the full Name. This permits authority control of qualifiers, although they are not required. |
|
|
To reduce the need for external documentation, allowable data values can be defined within the dataset it is intended to control (another example of recursion). The Concept "Cover Title" can be assigned to the set of "Title Variant", another Concept. This technique is used wherever such control is needed. Since XML doesn't do anything, software would be required to enforce this restriction. We envision central distribution of universal new records, a new language authority record when a new language is defined, for example. |
|
|
[Slide 31 Source-Target Relationships] Each of the 10 PEs may be related to any of the others as well as to another one of the same PE, e.g. a serial continuing another serial. The relationships match the 10 PEs but use adjectives as attribute values. For example the Work "Fatal Shore" is published by Knopf, an Organization like any other; is about the Place Australia; is by the Being Robert Hughes; and is in the English Language. There are 100 hundred possibilities. XOBIS doesn't provide cataloging rules, for that an international cataloging code is needed, but it provides a consistent structural model to record much more than is currently possible in MARC. |
|
|
The String element allows recording Varia and Relationships between words, making authority control of words and phrases possible. This could enhance keyword retrieval by automatically including close variants, and offering other variants and related words as optional inclusions and permit browsable keyword indexes with cross references. It amounts to keyword authorities. |
|
|
Time is the most complex PE. XOBIS has a single definition of time so that all values share the same syntax, except in Description. It permits ranges, has a calendar attribute and utilizes type and certainty to modify the meaning independently of the value. Time can be a Name or utilize the Year/Month/Day/etc. |
|
|
These examples indicate the range of values. |
|
|
This illustrates how time functions hierarchically and sequentially. |
|
|
This shows one Work related to another Work and some of the finer detail. This shows an individual "intellectual" work (as opposed to an "artistic" one). It illustrates TitleSegments and the nonfiling attribute. IDs on the Principal Elements indicate authority control. |
|
|
Point out difference in space and no space |
|
|
Organization is interesting to contrast with Event, Place, and Being. Changes over time are also significant, just as serial change titles. Note: Referential, e.g. Association see also Society |
|
|
[Slide 40 Organizational Relationships] Virtual relationships, a single relationship to retrieve related set of records. Drilling down to avoid long alphabetic retrievals. When the relationship existed versus dates of earlier name. |
|
|
[Slide 43 Relationships to Functionality] Sequence and hierarchy |
|
|
Defining Relationships will be a growth area. Relationship authority records are Concepts, with xrefs, e.g. continues/preceded by. |
|
|
Likewise, France can be subdivided by hierarchical sequence of governments that need not overlap perfectly. Standard might be developed for groups of libraries sharing data to use the same level of depth. Having bibliographic data in a versatile schema makes all sorts of things possible. I hope this glimpse of XOBIS interests you in taking a closer look as we believe it has lots of potential. |
|
|
Next, three slides show efforts that do not use XOBIS, but hint at the possibilities. This is Flamenco from UC Berkeley which uses crisp metadata to organize search results categorically. Note the breakdown various groupings and the use of named locations and times. |
|
|
This shows further hierarchical subdivisions or instantiation, such as the individual artists being listed. |
|
|
A retrieval with linked images. |
|
|
The individual record. |
|
|
Using XML, Lam, at the Hong Kong University of Science and Technology has designed an authority repository for CJK names. This takes advantage of Unicode and displays Chinese characters like ordinary ones. Note there are two established forms, the Chinese and the transliteration, in addition to the many variants. |
|
|
This is based on LC's MARCXML. Note the attributes clustering variants and naming languages and transliteration schemes. |
|
|
The search for Yi. |
|
|
Dan Chudnov at Yale has done a wonderful teaching tool for polyhierarchical Medical Subject Headings. This uses XML and SVG, scalable vector graphics, which allows mark up to describe images with XML or to incorporate regular images into XML. Text appearing in SVG is searchable. The diamond represents referencing another hierarchy. |
|
|
This shows how you can locate your context and pop up authority info like definitions and recenter the retrieval. I like that it shows the complex relationships so clearly. You've probably seen other graphical representations. XML's flexibility shines here. |
|
|
[Slide 55 Cultural Heritage/Memory Institutions] Libraries, Archives, and Museums have a lot in common. Interestingly, there is convergence in some of their more notable endeavors to create ontologies and schemas. IFLA has developed FRBR and more recently FRANAR for authorities. The archival community has the Encoded Archival Description and lately the Encoded Archival Context, which is akin to authorities for people and groups. CIDOC, an international documentation committee in the museum community has spent 10 years developing the CRM, Conceptual Reference Model, an ontology that aspires to cover library and archival information as well as museum objects. The goal is for schemas to adhere to the CIDOC ontology, at least in areas in which data are shared. The CRM, like XOBIS, incorporates authorities seamlessly. XOBIS is experimental; when we discovered the CRM, we were surprised at the high degree of overlap. We hope our work contributes to these ongoing efforts to increase interoperability internationally and across boundaries of disciplines. |
|
|
EAD |
|
|
EAC |
|
|
CRM |
|
|
It's difficult to know what represents vision in advance. An entrepreneur recently emphasized that there are lots of good ideas, but it takes more than a good idea to create a successful product. His example was that of wireless clocks that would be reset automatically just as computers on a network update their times from a central server, but not of interest to him to develop. The debate over good and ideal has been going on for some time. I was influenced early by the 4-H motto "To make the best better." |
|
|
[Slide 60 Simplicty/Pragmatism] As a corollary, I would offer that complexity, such as found in MARC, is counterproductive. It is possible that the needed level of detail can be achieved by a divide and conquer approach-- without eliminating anything essential. Focusing on interoperability first and then finding a place for details should make achieving a robust schema for bibliographic and authority information possible. MARCXML, being literal, translates problems with MARC into XML. MODS, covering only a subset of MARC is delving into the real issues, but has not gone far enough in my opinion. |
|
|
Often economic factors are cited as hindrances. As illustrated earlier, others are doing the work--albeit without standards. Who can blame yet others for developing ad hoc cataloging for digital collections when libraries have not provided a simple alternative. Perhaps domain specific shared databases would work better, leaving the ILS to handle circulation, etc, but not sophisticated retrieval. This could be better achieved cooperatively, rather than as so many redundant catalogs. Lastly, can we afford not to do a better job in the face of such stiff competition? |
|
|
I couldn't resist including this example of homonymic disambiguation. |
|
|
[Slide 63 Ad augusta per angusta] Thank you for sharing these explorations. |