Archive for the ‘bibliotek’ Category

Library cataloguing breaks our data

2010-12-17

Use case: user desires to view a collection of masters theses by subject.

Solution 1: register departmental names as corporations in MARC 710

Catch 1: the departments have merged to create large departments (for example, the old departments of English, Romance languages and Germanic languages  became the department of modern foreign languages). The data that would have identified theses in English is now lost when the authority in 710 is updated.

Catch 2: the metadata does uniformly differentiate between theses on different aspects of study, for example, performative music is not distinguished uniformly from theoretical music, linguistics of English is lumped together with American cultural studies and literary studies. Thus, unrelated theses are placed together because they come from the same department, but not the same study track.

Catch 3: it is difficult to identify theses from the institution because the institution has changed name, and because the institution was formed by a merger and the merged institutions were also subject to various name changes.

Solution 2: Restructure the metadata so that the theses belong to a series with a standard title, create a controlled vocabulary that is used to differentiate the various theses on the basis of topic and study track, use solution 1 retroactively.

Catch 1: Reality*.

Solution 3: Use RDF.

Catches: we’ll work them out.

*Actually, the biggest problem is that the users want to present the data in their own system, which would involve either caching the data gleaned via SRU  (not possible for the departmental staff) or screenscraping the OPAC (not a nice solution).

Simple additives: adding social functionality to your OPAC

2010-12-13

A quick recipe for adding social functionality to your OPAC*.

Step-by-step:

  1. Argue with your IT people/system supplier about adding a Javascript snippet to your page templates
  2. While you’re doing (1), head on over to IntenseDebate and create an account
  3. Once you’ve managed (if you manage) to convince IT people that this functionality is worth the effort, (get your IT people/system supplier to) follow the instructions on the IntenseDebate support pages based on your account details
  4. Sit back and congratulate yourself

Now, there are a few things to note:

  1. The comments are hosted by IntenseDebate
  2. The users will need to have an OpenID in order to use the system (does your system support OpenID? Why not?)
  3. There may be a few legal issues related to “who owns comments” based on where you live

Potential issues:

  1. Your IT people/system supplier is reluctant
  2. Your system does not create links between things that should be related (e.g. manifestations/editions of books)
  3. No-one comments on anything

As you can see, implementing this kind of functionality in your system is so simple that it is worth trying out, however, don’t be disheartened if this doesn’t work out for you: the engagement users have with the “web content” in your system is probably limited.

*OPAC or any webpage, this isn’t “library IT”, it’s IT.

Information literacy: it’s over and out

2010-01-18

“Information literacy” was a phenomenon of the late 1990s end early 2000s and it is officially dead. Looking at the numbers, you can see that the level of interest globally in information literacy is rapidly approaching zero. Take a look at the Google-trending data for this:

Trending data for searches for term "information literacy"

Google trending data for searches for term "information literacy"

What is “information literacy”? In libraryland, it’s a specific thing (I’ll translate the Norwegian Archive, Library and Museum Authority’s definition):

Information literacy is a collection of skills that make a person able to identify when information is necessary, and which make them able to locate, evaluate and use – in an effective way – this information.[1]

This sounds reasonable, however it isn’t, it’s silly: are there any plausible instances where people who are trying to achieve something don’t know when they need information? I hope not. Note that Plinius [Norwegian] has commented (so well in fact that I translated it) that “information literacy” is not really a valid thing in the traditional library sense; an interpretation of information literacy, however, that is viable is one where it is a facet of subject-related competence.

The idea that it is possible to teach localization, evaluation and use of information without reference to a subject-specific set of skill is ridiculous; let me explain: within certain formal disciplines, intuition is a valid way of gathering data, while within others it is really not. Knowing your subject-specific ethics will help you evaluate the content you are looking at. Knowing which sources to look at will also depend heavily on the subject-specific approach you’re taking: if you’re researching language, you might be interested in grammars, but you might equally be interested in literature from medicine and neuroscience. Using information effectively is where the ABM-definition really hits ground: how can you use information effectively without understanding it?

The library really doesn’t have very much to offer in terms of subject-specific skills: yes, an academic library may have subject librarians, but “subject specific” really equates to “individual”, and the extent to which a librarian will know the individual researcher’s needs is based on a dialogue with that individual, not on an understanding of the concept “information literacy”, and whatever they impart of useful information is likely to be based on the local systems in use at that particular library.

It isn’t the case, however, that the library doesn’t have anything to offer; we have a lot of resources that are likely yet to be discovered by researchers, and a number of tips and tricks that will make the researchers’ lives a lot easier. But creating heavyweight courses in CQL and search strategy isn’t going to cut it; it’s about marketing and one-on-one contact.

The death of monolithic library teaching should be nigh, and I hope that it is.

[1] Informasjonskompetanse — ABM-utvikling – Statens senter for arkiv, bibliotek og museum. (n.d.). . Retrieved January 18, 2010, from http://www.abm-utvikling.no/bibliotek/bibliotekutvikling/kompetanseutvikling/informasjonskompetanse.html

[edited for grammar and imprecise formulation 2010-01-26]

Excluding self-citation in Google Scholar

2009-12-5

It seems that it is possible (to some extent at least) to exclude self-citation in Google Scholar, this is how:

  1. Search for author name in the usual way
  2. Click “cited by number
  3. Identify how Google Scholar represents the name you want to exclude in the hits (typically “A Name”)
  4. Add a standard Google query string which excludes the name you identified in point 3 to your current citation url in the following format &q=-“A Name”

A practical example of removing self-reference: Aspects of the theory of syntax by N Chomsky without self citation:

http://scholar.google.com/scholar?cites=7563750853896762876&hl=en&as_sdt=2000&q=-“N Chomsky”

This reduces the original number of hits from “around 12,350” to “around 11,600”.

Perhaps this is useful? Feedback?

BIBSYS: Anskaffelse av nytt biblioteksystem

2009-11-3

Jeg gjetter bare, men når noen skriver at de går til innkjøp av et bibliotekssystem, så har de bestemte seg for at de vil ha et bibliotekssystem. Men når fant de ut at de trenger et slikt system? I min forsåelse har denne analysen uteblitt i det nåværende tillfellet.

Begrepet “bibliotekssystem” innebærer flere deler, økonomi, akkvisisjon, katalogisering, presentasjon til publikum, håndtering av persondata, osv. Problemet for de aller fleste BIBSYS institusjoner er at de har andre systemer for de aller meste av innkjøp, økonomi og persondata. At bibliotekssystemet legger til nye funksjoner for dette er ikke nødvendig eller ønskelig i veldig mange tilfeller. Ikke ta feil og tro at jeg vil bli kvitt systemer for bibliotek, vi trenger disse dataene, men de trengs ikke finnes i ett og samme system, særlig når dette skaper hodebry (jf. import/eksport av brukerdata ved universitetsbibliotek).

Bibliotekssystemene som finnes på markedet idag har kataloger som baserer seg på å lagre metadata om ting i MARC format, noe jeg ville påstått var ikke det lureste en kan gjøre av tre sentrale grunner:

  • manglende ekstensibilitet for å dekke behov relatert til nye typer ting som skal registereres
  • manglende semantikk i MARC
  • et format som hemmer gjenbruk pga domene-spesifisering

I et system for organisering av kunnskap er det vanligvis slik at en vil gjerne ha dataene presentert på en måte som passer brukerene, og dette bør være uavhengig av datalagets struktur. MARC og AACR (standarden som styrer hvordan ting katalogiseres) kommer godt frem som et grunnleggende element i bibliotekskataloger, selv om disse ikke skal ha noe å si ifht. hvordan vi presenterer  ting overfor brukerne.

Når dette er sagt kan en tro at jeg dømmer MARC og AACR nord og ned, men dette er ikke tilfelle, jeg snakker utelukkende om MARC som et lagringsformat. Det er fullt mulig å katalogiser i MARC ifht. AACR og likevel lagre dataene på en forsvarlig måte. På denne måte er MARC en visning av dataene (er dette vanskelig, kan en tenke at tallene {1,2,3} kan vises som rådata, eller som søylediagram, kakediagram osv. uten å ta skade, og på samme måten kan en bok katalogiseres i en MARC skjema, for å så lagres på en annen måte).

I dagens IT-verden finnes det et stort spekter av teknologier som løser disse problemer, men de mest lovende for bibliotek, mener jeg er de som kan betegnes som “semantiske”, og det er her jeg ville hente inspirasjon for et nytt “bibliotekssystem”, ikke hos en tradisjonell, kommersiell leverandør av MARCdatasystemer.

Linked data

2009-07-14

What is linked data? (Note that I’m ignoring any of the specifics of RDF, on which Linked Data depends.)

The “data” of linked data is metadata on the web that describes documents and resources; the linked part refers to the links that exist between metadata items. If this seems a little abstract, consider the following:

I own ten books related to my four interests:

  • Anglo-Saxon language (properly, Old English)
  • The history of Winchester, England
  • Computer programming
  • Cookery

The titles I own are:

Arnow, G. W. D. M. (1998). Introduction to Programming Using Java: An Object-Oriented Approach. Addison-Wesley.

Arnow, D., Dexter, S., & Weiss, G. (2003). Introduction to Programming Using Java: An Object-Oriented Approach (2nd ed.). Addison Wesley.

Fearnley-Whittingstall, H., & Carr, F. (2008). The River Cottage Family Cookbook. Ten Speed Press.

Gamma, E., Helm, R., Johnson, R., & Vlissides, J. M. (1994). Design Patterns: Elements of Reusable Object-Oriented Software (illustrated edition.). Addison-Wesley Professional.

Hagen, A. (2006). Anglo-saxon Food & Drink. Anglo-Saxon Books.

Hawkes, B. A. L. M. A. S. C. (1970). Two Anglo-Saxon Cemeteries at Winnall, Winchester, Hampshire. Maney Publishing.

Hervey, T. (2007). The Bishops Of Winchester In The Anglo-Saxon And Anglo-Norman Periods. Kessinger Publishing, LLC.

Mitchell, B., & Robinson, F. C. (2007). A Guide to Old English (7th ed.). Wiley-Blackwell.

Sweet, H. (1982). Sweet’s Anglo-Saxon Primer (9th ed.). Oxford University Press, USA.

Sweet, H. (2008). An Anglo-Saxon Primer (3rd ed.). Tiger Xenophon.

I want some way of keeping track of my book collection, so I create a catalogue of RDF files where I tag the various books with their topics:

  • Books by Arnow, Arnow et al. and Gamma et al. are tagged as Computer Science
  • Books by Fearnley-Whittingstall and Hagen are tagged as Cookery
  • Books by Hawkes and Hervey are tagged as History — Winchester
  • Books By Mitchell & Robinson and Sweet are tagges as Language — Old English

Immediately, I see that I have several editions of the same book, so I add a simple SameAs relation between these books by adding a URL to the RDF metadata of the other book, so Arnow (1998) and Arnow et al. (2003) link to one another in this way, as do Sweet (1982) and (2008). In this way I can easily see which books are related by following a link (technically “dereferencing” a URL).

The book on design patterns is so fundamentally important within computer science that I add a SeeAlso link to this book from the other computer science titles; in the same way, I can choose to add a SeeAlso relation between the other books tagged with the same tags, allowing me to easily access each title from a related title.

Because my interest in Winchester primarily relates to the Anglo-Saxon period, and especially linguistic/onomastic aspect of its history, I find it useful to link (SeeAlso) the titles on Winchester to the books on Old English. At the same time, I also add a SeeAlso to the book on Anglo-Saxon cookery for each of the titles on Winchester and Old English.

Based on this, I can at any time explore my book collection in a novel way; from any given starting point, I have numerous avenues to explore. I have a simple way to see that there are several editions of a title, and that the titles in my collection relate to a number of topics, which typically interlink. It is difficult to find a link between cookery or Anglo-Saxon history and language and computer science, but I am sure that more formal analyses within computational linguistics would fit into the model I have described in an understandable fashion.

It is worth noting that it is debatable whether my use of SeeAlso and SameAs strictly speaking correct, but it illustrates the point about enriching a collection of metadata with links. More information about metadata schemas for linked data can be found in the links section below.

It is also worth noting that this interlinking is two way, and that this leads to redundance (in order to get from A to B you need an explicit link, in order to get from B to A, you need another explicit link). This isn’t really a problem because the data-storage overhead is minimal, and the dereferencing of URLs can be done in such a way that redundance does not create unnecessary work (by, for example, not dereferencing URLs that have already been visited).

Links

linkeddata.org

RDF homepage (for RDF basics, schema and ontologies)

BIBSYS modernization

2009-06-27

[This is a draft, I’ll be revising it]

The stream of mail on the Biblioteknorge mailing list about BIBSYS’ modernization has been almost unstoppable — at least five mails 🙂 — and names like Knut Hegna, Hans Martin Fagerli, Kim Tallerås and Dagmar Langeggen have been connected to the topic.

What’s the fuss? Well, it seems like BIBSYS will continue developing its own library management system, rather than buying off the shelf software. This contradicts the findings of “that there report” — you know, the one that recommended getting a new, off-the-shelf system. Full of wackiness — especially points at which it contravened the standard rules that a report should contain information about its subject from commonly accepted reality rather than the creative imagination — it is easy to ignore that report, but maybe it shouldn’t have been simply “ignored”.

My take on the whole thing: really very dull, I’m sorry.

Starting at the outside edge: the online public access catalogue is a relatively uninteresting concept today, and is becoming less interesting with each day that passes, here’s why:

  • Users aren’t finding their information there, they’re finding it on the web (report)
  • University libraries aren’t registering their information in an OPAC (at best they import a subset of the e-journal and e-book data they are spending the majority of their finances on)
  • The metadata a researcher needs is not registered in an OPAC, it is available only in research databases
  • Attempts at integrating/federating search in metadata for academic content have failed
  • Portals cost money, and this is money wasted when the user doesn’t want or need to use it

The internals of a monolithic library management system are also past their sell-by-date for the majority of academic institutions:

  • Packages and subscriptions increasingly account for the majority of spending
  • Metadata import related to packages are typically limited
  • Acquisitions are increasingly expected to conform to norms applied in the rest of the institution
  • User systems are in place that register users’ role and access privileges

Given that the majority of spending goes on resources that are already findable using other methods (the metadata we’re importing must come from somewhere, and Google is the preferred tool of discovery), there is really very little need to register the majority of objects we’re currently registering.

Academic institutions — especially publicly funded ones — have resource management systems that ensure that every economic transaction is done by the book, i.e. ensuring that things are put out to tender, and that an economic overview is available to the various controllers around and about. This means that an economy module in a library management system is not a good thing, it encourages practices that go outside normal routines, and hinders the financial controllers from doing their job (you do not want or need more than one system for this kind of thing).

Another aspect of the library management system is that of loan data, where a user profile system provides what data is needed about a borrower, and then attaches various objects to this. At a modern academic institution integration of the insitutional user-profile system with the library management system’s user-profile system is one big headache…so why do things that way around? Why not implement the necessary slots for library data in the existing institutional system? The framework for the kind of query across several thousand records is already in palce, so why not use it? There is time and money to be saved here too.

We’ve got rid of a few subsystems here, but were back to the sticky issues:

  • registration of items in a library’s collections
  • sharing of data

Simplicity itself would be requiring all third parties to supply Linked Data for their products (and yes, this is a realistic thing to ask), and then registering the remainder by creating a local linked data store containing either totally unique metadata for items that are otherwise not registered anywhere, or by linking to existing items registered in other Linked Data stores (and this can include non-Norwegian sources). In this way, a massive web of data is available to the library’s users, containing references not only to things in the local library’s holdings, but potentially to all existing items in any library that provides Linked Data.

The OPAC can now either be replaced by the generic, or domain specific semantic browser that a given user prefers, or a user interface that provides a wrapper for SPARQL queries and a presentation format for the data returned from the dereferenced URIs contained in the Linked Data. The latter here could be something created locally by the library IT staff, or an enterprising librarian who knows a bit of Javascript, and can follow the instructions given in a typical Javascript library.

BIBSYS can potentially be a provider of tools for a) creating Linked Data, and b) storage and retrieval of this. Modules for solving issues related to lending and finances could be off-the-shelf software, supplied where these were deemed necessary.

Links:

Linked data [wikipedia]

Semantic Marc, MARC21 and the Semantic Web

BIBLIOTEKNORGE om vedtak om modernisering av BIBSYS Biblioteksystem

Publiseringskanaler: Feil i DBH

2009-06-4

Det viser seg at papirkildene er mer verdsatt i utdanningssektoren!

I tabellen finner du oversikt over feilene jeg fant som en del av statistikkarbeidet jeg gjør for NTNU Biblioteket — basert på en kryssreferanse med sjekk på alle ISSN registrert i DBH sin publiseringskanaloversikt og NTNUs SFX-implementasjon. Tittel-lenken går til søk etter tittel, ISSN-lenkene går til ISSN-søk.

Tittel ISSN 1 ISSN 2 Disiplin 1 Disiplin 2 Nivå v1 Nivå v2
Journal of Design History 0952-4649 1741-7279 Arkitektur og design Publiseringskanaler 2 1
ELH 0013-8304 1080-6547 Engelsk Engelsk 2 1
Molecular & cellular proteomics 1535-9476 1535-9484 Biomedisin Publiseringskanaler 2 1
Requirements engineering 0947-3602 1432-010X Tekn – Datateknikk og datavitenskap Tekn – System- og teknologiutvikling 2 1
Communication Theory 1050-3293 1468-2885 Medier og kommunikasjon Publiseringskanaler 2 1

Pedantisk? Spørs om registrerting i FRIDA blir påvirket (i disse tilfellene har det ingenting å si).

FRIDA

2009-03-18

FRIDA, eller Forskningsresultater, informasjon og dokumentasjon av vitenskapelige aktiviteter, er den grei, eller dårlig … altså er den konge, eller does it BITE? You decide!

That was then, this is now

2009-02-20
  1. Read an article about the library of the future by S. Bell
  2. Read my old article about the same
  3. Take a look at Science@Cambridge
  4. Take a look at iSoton
  5. And an example of what not to do

Glad we got that cleared up then 😉

Note: (5) was an attempt at making the library webpage like the one described in (2). Did I succeed? No. Did I learn anything? A lot — we’re totally dependent on having a good profile service provided by the university IT department, without this, we’re scuppered.