Archive for the ‘library’ Category

Library cataloguing breaks our data


Use case: user desires to view a collection of masters theses by subject.

Solution 1: register departmental names as corporations in MARC 710

Catch 1: the departments have merged to create large departments (for example, the old departments of English, Romance languages and Germanic languages  became the department of modern foreign languages). The data that would have identified theses in English is now lost when the authority in 710 is updated.

Catch 2: the metadata does uniformly differentiate between theses on different aspects of study, for example, performative music is not distinguished uniformly from theoretical music, linguistics of English is lumped together with American cultural studies and literary studies. Thus, unrelated theses are placed together because they come from the same department, but not the same study track.

Catch 3: it is difficult to identify theses from the institution because the institution has changed name, and because the institution was formed by a merger and the merged institutions were also subject to various name changes.

Solution 2: Restructure the metadata so that the theses belong to a series with a standard title, create a controlled vocabulary that is used to differentiate the various theses on the basis of topic and study track, use solution 1 retroactively.

Catch 1: Reality*.

Solution 3: Use RDF.

Catches: we’ll work them out.

*Actually, the biggest problem is that the users want to present the data in their own system, which would involve either caching the data gleaned via SRU  (not possible for the departmental staff) or screenscraping the OPAC (not a nice solution).

Simple additives: adding social functionality to your OPAC


A quick recipe for adding social functionality to your OPAC*.


  1. Argue with your IT people/system supplier about adding a Javascript snippet to your page templates
  2. While you’re doing (1), head on over to IntenseDebate and create an account
  3. Once you’ve managed (if you manage) to convince IT people that this functionality is worth the effort, (get your IT people/system supplier to) follow the instructions on the IntenseDebate support pages based on your account details
  4. Sit back and congratulate yourself

Now, there are a few things to note:

  1. The comments are hosted by IntenseDebate
  2. The users will need to have an OpenID in order to use the system (does your system support OpenID? Why not?)
  3. There may be a few legal issues related to “who owns comments” based on where you live

Potential issues:

  1. Your IT people/system supplier is reluctant
  2. Your system does not create links between things that should be related (e.g. manifestations/editions of books)
  3. No-one comments on anything

As you can see, implementing this kind of functionality in your system is so simple that it is worth trying out, however, don’t be disheartened if this doesn’t work out for you: the engagement users have with the “web content” in your system is probably limited.

*OPAC or any webpage, this isn’t “library IT”, it’s IT.

Information literacy: it’s over and out


“Information literacy” was a phenomenon of the late 1990s end early 2000s and it is officially dead. Looking at the numbers, you can see that the level of interest globally in information literacy is rapidly approaching zero. Take a look at the Google-trending data for this:

Trending data for searches for term "information literacy"

Google trending data for searches for term "information literacy"

What is “information literacy”? In libraryland, it’s a specific thing (I’ll translate the Norwegian Archive, Library and Museum Authority’s definition):

Information literacy is a collection of skills that make a person able to identify when information is necessary, and which make them able to locate, evaluate and use – in an effective way – this information.[1]

This sounds reasonable, however it isn’t, it’s silly: are there any plausible instances where people who are trying to achieve something don’t know when they need information? I hope not. Note that Plinius [Norwegian] has commented (so well in fact that I translated it) that “information literacy” is not really a valid thing in the traditional library sense; an interpretation of information literacy, however, that is viable is one where it is a facet of subject-related competence.

The idea that it is possible to teach localization, evaluation and use of information without reference to a subject-specific set of skill is ridiculous; let me explain: within certain formal disciplines, intuition is a valid way of gathering data, while within others it is really not. Knowing your subject-specific ethics will help you evaluate the content you are looking at. Knowing which sources to look at will also depend heavily on the subject-specific approach you’re taking: if you’re researching language, you might be interested in grammars, but you might equally be interested in literature from medicine and neuroscience. Using information effectively is where the ABM-definition really hits ground: how can you use information effectively without understanding it?

The library really doesn’t have very much to offer in terms of subject-specific skills: yes, an academic library may have subject librarians, but “subject specific” really equates to “individual”, and the extent to which a librarian will know the individual researcher’s needs is based on a dialogue with that individual, not on an understanding of the concept “information literacy”, and whatever they impart of useful information is likely to be based on the local systems in use at that particular library.

It isn’t the case, however, that the library doesn’t have anything to offer; we have a lot of resources that are likely yet to be discovered by researchers, and a number of tips and tricks that will make the researchers’ lives a lot easier. But creating heavyweight courses in CQL and search strategy isn’t going to cut it; it’s about marketing and one-on-one contact.

The death of monolithic library teaching should be nigh, and I hope that it is.

[1] Informasjonskompetanse — ABM-utvikling – Statens senter for arkiv, bibliotek og museum. (n.d.). . Retrieved January 18, 2010, from

[edited for grammar and imprecise formulation 2010-01-26]

Asking the right questions


“Ask the users what they want!” seems to be the default response to the question of how we can improve library services, but this isn’t a good response, it’s a trite, unthinking non-response that washes the library practitioner of the responsibility of knowing their trade.

There is a lot of talk about what library users want; many of these assume that there is a clearly defined scale with “asking library users what they want” and “librarians knowing best” at the two extremes. The result of this kind of thinking is that the library asks what users want, and then acts on the results, however, this results in something even worse than not asking the user what they want: “now we know the answers, lets get on with it”. Acting on a questionnaire where users are explicitly asked “what do you want”, or the comments from a user survey such as LibQual+® results in only one answer: “I want the Moon on a stick”.

The stupidity of this kind of approach is seen in the emphasis: it isn’t what is needed by everyone, just by one particular user, and of course, these opinions will be as various as the number of respondents. Even if the user responds in seemingly tangible way “the library staff are not helpful and the website is difficult to navigate”, it must be remembered that this is seen through the eyes of an individual that asks questions such as “do you have a photocopying service” and thinks that a negative response to this question equates with unhelpfulness and thinks that navigation of the website is difficult because their computer display is broken doesn’t show the colours of the links properly (an example of this is the student who complained about the library OPAC being bad, but didn’t come to the offered courses; when I finally met with the student, it turned out that they had not been using the OPAC at all, but a third-party interface based on the LMS’ Z39.50 interface). Making service decisions based on this kind of information at face value is less than worthless, it is damaging.

Knee-jerk reactions to user-dissatisfaction expressed in generalized questionnaires will always backfire, this is because user feedback needs to be feedback on a specific question, and not questions of the kind “what do you think of the website?” The questions need to be focused on specific aspects of the services provided by the library. One of the major findings of users of the LibQual+® survey is that there is a discrepancy between the expected level of service in the holdings and the actual service. To my mind, this can be seen as a result of the “me” aspect of respondents. The interpretation of the results of surveys needs to be tempered by the understanding of this being from the perspectives of a multitude of different users with differing needs, expectations and contexts. No user actually wants “the Moon on a stick”, but this is the interpretation that is most obvious when reading the generalized feedback.

When a commercial enterprise asks what their users want, they ask the question about a specific product, and elicit responses about specific aspects of the functionality of that product. A major rethink about the product and its viability may be the result. No-one approaches a potential market without some idea of what their service entails; except libraries.

The next time you hear someone say that we should pay more attention to what users want, ask yourself the following questions:

  • why do we want feedback?
  • what do we want feedback on?
  • will the feedback be usable?

If the answers to these questions happen to be along the lines of “we want to know if users like eBooks”, “eBooks” and “yes of course”, then it’s the same old story. The answers for the questions should rather resemble “we want to know what we can do to make service X work better”, “the eBook service we provide” and “hopefully, but we need continuous feedback to make sure that we’re doing the right thing”.

The final point here is that the feedback should be something that libraries have as a strategic point, not just a one-off or occasional hit-and-miss affair. The strategic planning of this kind of thing should not be left to individuals either, it is the responsibility of management to ensure that projects, strategic areas and goals are followed up systematically by getting targeted user feedback. Another point is that this feedback should take different forms, and should in preference be interpreted and re-interpreted in light of new data.

A good example here is the notion of analysis of website traffic; in order to get anything out of the statistics, you need to know what you want to measure. An example is “do people know how to find the OPAC?”; in order to do this, a particular kind of report can be generated. But the various goals that are identified need to be identified before the reports are generated – knowing what goals and success indicators you have will ensure that you know what to measure and how to change your service in order to achieve your goals. Typically, statistics are “gathered” and then dropped as raw data – often as graphs – into the laps of the various parties at the library; the problem with this approach is that, while it’s nice to know which pages are most visited, it is difficult to read any patterns and generate meaningful goals from data presented in this way.

In the end, what the library should strive towards is “the moon on a stick with reservations”; providing everything that the user wants, just within a framework that is feasible. An example of this is ensuring that the expectations of the level of service do not outstrip the perceived level of service – clear terms of service are a good start. If a library cannot support large volumes of acquisitions, it should not attempt to, but rather focus on providing a better ILL service and make this service more available to the users.

When we’ve achieved these things, we can start asking the real question: what do users need?

Please note: registered trademarks presented in this text are used for informational purposes only and represent neither endorsement nor recommendation of these products.

Linked data


What is linked data? (Note that I’m ignoring any of the specifics of RDF, on which Linked Data depends.)

The “data” of linked data is metadata on the web that describes documents and resources; the linked part refers to the links that exist between metadata items. If this seems a little abstract, consider the following:

I own ten books related to my four interests:

  • Anglo-Saxon language (properly, Old English)
  • The history of Winchester, England
  • Computer programming
  • Cookery

The titles I own are:

Arnow, G. W. D. M. (1998). Introduction to Programming Using Java: An Object-Oriented Approach. Addison-Wesley.

Arnow, D., Dexter, S., & Weiss, G. (2003). Introduction to Programming Using Java: An Object-Oriented Approach (2nd ed.). Addison Wesley.

Fearnley-Whittingstall, H., & Carr, F. (2008). The River Cottage Family Cookbook. Ten Speed Press.

Gamma, E., Helm, R., Johnson, R., & Vlissides, J. M. (1994). Design Patterns: Elements of Reusable Object-Oriented Software (illustrated edition.). Addison-Wesley Professional.

Hagen, A. (2006). Anglo-saxon Food & Drink. Anglo-Saxon Books.

Hawkes, B. A. L. M. A. S. C. (1970). Two Anglo-Saxon Cemeteries at Winnall, Winchester, Hampshire. Maney Publishing.

Hervey, T. (2007). The Bishops Of Winchester In The Anglo-Saxon And Anglo-Norman Periods. Kessinger Publishing, LLC.

Mitchell, B., & Robinson, F. C. (2007). A Guide to Old English (7th ed.). Wiley-Blackwell.

Sweet, H. (1982). Sweet’s Anglo-Saxon Primer (9th ed.). Oxford University Press, USA.

Sweet, H. (2008). An Anglo-Saxon Primer (3rd ed.). Tiger Xenophon.

I want some way of keeping track of my book collection, so I create a catalogue of RDF files where I tag the various books with their topics:

  • Books by Arnow, Arnow et al. and Gamma et al. are tagged as Computer Science
  • Books by Fearnley-Whittingstall and Hagen are tagged as Cookery
  • Books by Hawkes and Hervey are tagged as History — Winchester
  • Books By Mitchell & Robinson and Sweet are tagges as Language — Old English

Immediately, I see that I have several editions of the same book, so I add a simple SameAs relation between these books by adding a URL to the RDF metadata of the other book, so Arnow (1998) and Arnow et al. (2003) link to one another in this way, as do Sweet (1982) and (2008). In this way I can easily see which books are related by following a link (technically “dereferencing” a URL).

The book on design patterns is so fundamentally important within computer science that I add a SeeAlso link to this book from the other computer science titles; in the same way, I can choose to add a SeeAlso relation between the other books tagged with the same tags, allowing me to easily access each title from a related title.

Because my interest in Winchester primarily relates to the Anglo-Saxon period, and especially linguistic/onomastic aspect of its history, I find it useful to link (SeeAlso) the titles on Winchester to the books on Old English. At the same time, I also add a SeeAlso to the book on Anglo-Saxon cookery for each of the titles on Winchester and Old English.

Based on this, I can at any time explore my book collection in a novel way; from any given starting point, I have numerous avenues to explore. I have a simple way to see that there are several editions of a title, and that the titles in my collection relate to a number of topics, which typically interlink. It is difficult to find a link between cookery or Anglo-Saxon history and language and computer science, but I am sure that more formal analyses within computational linguistics would fit into the model I have described in an understandable fashion.

It is worth noting that it is debatable whether my use of SeeAlso and SameAs strictly speaking correct, but it illustrates the point about enriching a collection of metadata with links. More information about metadata schemas for linked data can be found in the links section below.

It is also worth noting that this interlinking is two way, and that this leads to redundance (in order to get from A to B you need an explicit link, in order to get from B to A, you need another explicit link). This isn’t really a problem because the data-storage overhead is minimal, and the dereferencing of URLs can be done in such a way that redundance does not create unnecessary work (by, for example, not dereferencing URLs that have already been visited).


RDF homepage (for RDF basics, schema and ontologies)

BIBSYS modernization


[This is a draft, I’ll be revising it]

The stream of mail on the Biblioteknorge mailing list about BIBSYS’ modernization has been almost unstoppable — at least five mails 🙂 — and names like Knut Hegna, Hans Martin Fagerli, Kim Tallerås and Dagmar Langeggen have been connected to the topic.

What’s the fuss? Well, it seems like BIBSYS will continue developing its own library management system, rather than buying off the shelf software. This contradicts the findings of “that there report” — you know, the one that recommended getting a new, off-the-shelf system. Full of wackiness — especially points at which it contravened the standard rules that a report should contain information about its subject from commonly accepted reality rather than the creative imagination — it is easy to ignore that report, but maybe it shouldn’t have been simply “ignored”.

My take on the whole thing: really very dull, I’m sorry.

Starting at the outside edge: the online public access catalogue is a relatively uninteresting concept today, and is becoming less interesting with each day that passes, here’s why:

  • Users aren’t finding their information there, they’re finding it on the web (report)
  • University libraries aren’t registering their information in an OPAC (at best they import a subset of the e-journal and e-book data they are spending the majority of their finances on)
  • The metadata a researcher needs is not registered in an OPAC, it is available only in research databases
  • Attempts at integrating/federating search in metadata for academic content have failed
  • Portals cost money, and this is money wasted when the user doesn’t want or need to use it

The internals of a monolithic library management system are also past their sell-by-date for the majority of academic institutions:

  • Packages and subscriptions increasingly account for the majority of spending
  • Metadata import related to packages are typically limited
  • Acquisitions are increasingly expected to conform to norms applied in the rest of the institution
  • User systems are in place that register users’ role and access privileges

Given that the majority of spending goes on resources that are already findable using other methods (the metadata we’re importing must come from somewhere, and Google is the preferred tool of discovery), there is really very little need to register the majority of objects we’re currently registering.

Academic institutions — especially publicly funded ones — have resource management systems that ensure that every economic transaction is done by the book, i.e. ensuring that things are put out to tender, and that an economic overview is available to the various controllers around and about. This means that an economy module in a library management system is not a good thing, it encourages practices that go outside normal routines, and hinders the financial controllers from doing their job (you do not want or need more than one system for this kind of thing).

Another aspect of the library management system is that of loan data, where a user profile system provides what data is needed about a borrower, and then attaches various objects to this. At a modern academic institution integration of the insitutional user-profile system with the library management system’s user-profile system is one big headache…so why do things that way around? Why not implement the necessary slots for library data in the existing institutional system? The framework for the kind of query across several thousand records is already in palce, so why not use it? There is time and money to be saved here too.

We’ve got rid of a few subsystems here, but were back to the sticky issues:

  • registration of items in a library’s collections
  • sharing of data

Simplicity itself would be requiring all third parties to supply Linked Data for their products (and yes, this is a realistic thing to ask), and then registering the remainder by creating a local linked data store containing either totally unique metadata for items that are otherwise not registered anywhere, or by linking to existing items registered in other Linked Data stores (and this can include non-Norwegian sources). In this way, a massive web of data is available to the library’s users, containing references not only to things in the local library’s holdings, but potentially to all existing items in any library that provides Linked Data.

The OPAC can now either be replaced by the generic, or domain specific semantic browser that a given user prefers, or a user interface that provides a wrapper for SPARQL queries and a presentation format for the data returned from the dereferenced URIs contained in the Linked Data. The latter here could be something created locally by the library IT staff, or an enterprising librarian who knows a bit of Javascript, and can follow the instructions given in a typical Javascript library.

BIBSYS can potentially be a provider of tools for a) creating Linked Data, and b) storage and retrieval of this. Modules for solving issues related to lending and finances could be off-the-shelf software, supplied where these were deemed necessary.


Linked data [wikipedia]

Semantic Marc, MARC21 and the Semantic Web

BIBLIOTEKNORGE om vedtak om modernisering av BIBSYS Biblioteksystem

Publiseringskanaler: Feil i DBH


Det viser seg at papirkildene er mer verdsatt i utdanningssektoren!

I tabellen finner du oversikt over feilene jeg fant som en del av statistikkarbeidet jeg gjør for NTNU Biblioteket — basert på en kryssreferanse med sjekk på alle ISSN registrert i DBH sin publiseringskanaloversikt og NTNUs SFX-implementasjon. Tittel-lenken går til søk etter tittel, ISSN-lenkene går til ISSN-søk.

Tittel ISSN 1 ISSN 2 Disiplin 1 Disiplin 2 Nivå v1 Nivå v2
Journal of Design History 0952-4649 1741-7279 Arkitektur og design Publiseringskanaler 2 1
ELH 0013-8304 1080-6547 Engelsk Engelsk 2 1
Molecular & cellular proteomics 1535-9476 1535-9484 Biomedisin Publiseringskanaler 2 1
Requirements engineering 0947-3602 1432-010X Tekn – Datateknikk og datavitenskap Tekn – System- og teknologiutvikling 2 1
Communication Theory 1050-3293 1468-2885 Medier og kommunikasjon Publiseringskanaler 2 1

Pedantisk? Spørs om registrerting i FRIDA blir påvirket (i disse tilfellene har det ingenting å si).



FRIDA, eller Forskningsresultater, informasjon og dokumentasjon av vitenskapelige aktiviteter, er den grei, eller dårlig … altså er den konge, eller does it BITE? You decide!

That was then, this is now

  1. Read an article about the library of the future by S. Bell
  2. Read my old article about the same
  3. Take a look at Science@Cambridge
  4. Take a look at iSoton
  5. And an example of what not to do

Glad we got that cleared up then 😉

Note: (5) was an attempt at making the library webpage like the one described in (2). Did I succeed? No. Did I learn anything? A lot — we’re totally dependent on having a good profile service provided by the university IT department, without this, we’re scuppered.

Trends 2009


Some trends and predictions for academic libraries in 2009 — it’s the last post of the year (and the first one for a looong time — hey, I’ve been busy). Not a definitive list, things are duplicated, and horribly skewed towards my interests…so E&EO!

Things we’ll be seeing more of

  • Mobile web
  • Monolithic (and especially “Integrated”) search systems
  • Java
  • 2.0-ization where “shutting down” would be a better idea
  • Reduced budgets

You can add a whole slew of less positive things to this list, including “nuisance lawsuits”, and “futile attempts to manipulate the web by individuals, corporations and governments”, but I rather think that these aren’t preditictions…. It’s the end of the road for a few technologies, one of these is “the OPAC” (at least as we know it), which I believe will rapidly be replaced by the monoliths mentioned in the list above.

Some of the oddities in the list include Java and XML/XSLT — these are old, old technologies, but they aren’t seriously used in libraries. Now is the time for libraries to explore the possibilities of serious software development on a small scale. Robust software simply cannot be developed without suitable development tools, and the frameworks provided by among other things Java application servers are top notch.

2.0 will continue to wash over our community, washing driftwood with it — the OPAC? — in the same way as budgets can be relied upon to disappear slowly but surely.

And this year’s biggie? Mobile web “m.” is the future (at least for the present) 🙂

Godt nytt år! Happy New Year!