MARCXML is a lovely thing, it takes dear old MARC and puts it in XML format, great stuff, but what can you do with that? Not a lot it seems. Let me explain: Open metadata providers (cf. LoC and BIBSYS) do a great job of getting the data out there, library solutions like Ex Libris’ X-Server provide the MARC/XML you want for your content, but what do you do with it once you’ve got it?
On the various mailing lists and blogs that cover the topic, you’ll notice that it’s fairly common practice to take the MARC/XML and parse it into some new format typically using X-Path/Query and/or regular expressions. It’s also possible to use XSL to present the pure MARCXML without conversion — BIBSYS has this interface — but you can see that this approach is not without its limitations.
One of the drawbacks of the XSL approach is that you often want to present data in a way that is not compatible with the way data is formatted in MARC(XML); you might want to gather all the authors from the various fields they are represented in and present them together. XSL/T can do this, but it’s often easier and possibly quicker to use X-Path/Query and/or regular expressions. Another related issue is that the structure of the MARC/XML you’re parsing may not have the values you want to present in a (sub)field of their own, which leaves you with little choice but to use regular expressions on textual data.
I’ve been looking at this problem for a while now; from my own personal experience, the XML tools that are available leave something to be desired in terms of ease of use, functionality and speed, the structure in MARCXML — as mentioned — is often inadequate, and the framework you’re working within may be best suited to using a different parsing method. Compounding this fact are issues like the library not having (the right) XML tools available, and programmers you’re working with having their own preferences.
Mostly, you’ll see MARCXML data being parsed into individual values before being converted into something else, be that XHTML or more perversely XML, and that it is this that is used as the basis of any presentation of content to end users. In many ways this is fair enough, but to my mind use of XSL(/T) (as with BIBSYS) is an elegant solution to this problem, though inviable in cases where the data structure is lacking.
These problems are widespread, and lead many solution providers to take shortcuts, massaging the data that you’d expect to be retrieving from MARCXML from sources like OpenURL ContextObjects or export formats meant for other applications. The reasons for this are, of course, speed, reliability and ease of use. Whether or not you feel comfortable extracting data from the non-definitive sources (which is what I consider these tertiary sources to be — they’re likely as not constructed from the MARCXML you’re avoiding) is a matter of conscience, but, for me, it’s odd that the data is there in MARCXML, but is unused.
Taking things at face value, we’re in a situation that we’ve got good metadata, and we have a definitive source for this, but we can’t use it for various reasons. How can this situation be remedied? For one, we need to express our concerns regarding structure in the unstructured parts of the XML we’re receiving from metadata providers, secondly we need to build a proper toolset for this particular library application. It’s all well and good building your one-off system, but we really need to merge our efforts, and provide multi-programming language packages that help us gather, for example, all the authors or owner data without having in-depth knowledge of a) MARCXML or b) the tool you’re using to parse this data. In this way, you’re providing a way of using MARCXML with a minimum of tricks, whether this be using regular expressions, XSL/T or X-Path/Query.