Author Archive

Things the “Google generation” say

2009-11-7
  1. I don’t use Firefox, it keeps updating itself [they used IE6 in 2009]
  2. [On the ‘phone] I’ve got a problem with Endnote on my laptop, I think that you should come to my house to fix it
  3. I’ll never remember that, it’s too difficult [the url to Google Scholar]
  4. Librarians don’t need to try and be so hip, no-one else uses social media
  5. I think it is a kind of CD-player. [Answer to me pointing to a picture of a cassette tape and asking “what is that?”]
  6. I’m only interested in articles [I was trying to explain that they needed to use the OPAC to get literature, but they associated that with “printed literature”, i.e. not online articles]
  7. According to the term paper instructions, I need to cite a book and two articles [I asked the question “is this for real?”, unfortunately, yes.]
  8. No, I only want printed articles because the lecturer said that the internet is not a good source of information [I was showing them the fulltext articles they were looking for]
  9. Err… What on Earth are you doing? [I was using the command line on their PC to move files]
  10. I want the library to keep printed journals so I can read the tables of contents [I gave up at this point]
Advertisements

BIBSYS: Anskaffelse av nytt biblioteksystem

2009-11-3

Jeg gjetter bare, men når noen skriver at de går til innkjøp av et bibliotekssystem, så har de bestemte seg for at de vil ha et bibliotekssystem. Men når fant de ut at de trenger et slikt system? I min forsåelse har denne analysen uteblitt i det nåværende tillfellet.

Begrepet “bibliotekssystem” innebærer flere deler, økonomi, akkvisisjon, katalogisering, presentasjon til publikum, håndtering av persondata, osv. Problemet for de aller fleste BIBSYS institusjoner er at de har andre systemer for de aller meste av innkjøp, økonomi og persondata. At bibliotekssystemet legger til nye funksjoner for dette er ikke nødvendig eller ønskelig i veldig mange tilfeller. Ikke ta feil og tro at jeg vil bli kvitt systemer for bibliotek, vi trenger disse dataene, men de trengs ikke finnes i ett og samme system, særlig når dette skaper hodebry (jf. import/eksport av brukerdata ved universitetsbibliotek).

Bibliotekssystemene som finnes på markedet idag har kataloger som baserer seg på å lagre metadata om ting i MARC format, noe jeg ville påstått var ikke det lureste en kan gjøre av tre sentrale grunner:

  • manglende ekstensibilitet for å dekke behov relatert til nye typer ting som skal registereres
  • manglende semantikk i MARC
  • et format som hemmer gjenbruk pga domene-spesifisering

I et system for organisering av kunnskap er det vanligvis slik at en vil gjerne ha dataene presentert på en måte som passer brukerene, og dette bør være uavhengig av datalagets struktur. MARC og AACR (standarden som styrer hvordan ting katalogiseres) kommer godt frem som et grunnleggende element i bibliotekskataloger, selv om disse ikke skal ha noe å si ifht. hvordan vi presenterer  ting overfor brukerne.

Når dette er sagt kan en tro at jeg dømmer MARC og AACR nord og ned, men dette er ikke tilfelle, jeg snakker utelukkende om MARC som et lagringsformat. Det er fullt mulig å katalogiser i MARC ifht. AACR og likevel lagre dataene på en forsvarlig måte. På denne måte er MARC en visning av dataene (er dette vanskelig, kan en tenke at tallene {1,2,3} kan vises som rådata, eller som søylediagram, kakediagram osv. uten å ta skade, og på samme måten kan en bok katalogiseres i en MARC skjema, for å så lagres på en annen måte).

I dagens IT-verden finnes det et stort spekter av teknologier som løser disse problemer, men de mest lovende for bibliotek, mener jeg er de som kan betegnes som “semantiske”, og det er her jeg ville hente inspirasjon for et nytt “bibliotekssystem”, ikke hos en tradisjonell, kommersiell leverandør av MARCdatasystemer.

Internet Librarian International 2009

2009-10-25

First things first: this year’s Internet Librarian International was a much better affair than last year’s effort…in fact, it was actually worth while being there (sorry if this is negative, but I feel it needed saying).

OK, report time:

Cory Doctorow spoke about how copyright is not working on the Internet, how large corporations push for consistently stronger legislation in this area, and how this leads to nothing but criminalization of what the general public in increasingly greater numbers consider perfectly acceptable behaviour. He used the notion of how people originally had a closed network of telephones of a single make that produced good quality sound and a reliable service, but that people opted for cheaper, lower quality because it made it possible to ring more people; similarly, popular services on the Internet are not those high-quality channels pushed by the corporate media companies, but lower resolution, high-quantity channels such as YouTube. There is no way to compete in this market with pricing schemes adopted from the non-online world.

Doctorow’s commentary on the facts about international copyright and how this complicates matters in terms of a delivery via channels such as the Internet was rather saddening; the fact that search engines that show previews of content in search results are actually illegal in many countries demonstrates how out of touch some copyright law really is. At the same time, it has never been easier to break copyright law: it is now possible to knock off a few thousand copies of a film in an afternoon because of sharing technologies. Is there any reason to continue using legislation designed to prohibit copying when copying  was difficult?

Punishment for “suspected” infringement of copyright law includes cutting off internet connections, which according to Doctorow is a an abuse of human rights in a digital age; he’s probably right here too, given that I would struggle without the Net.

Doctorow argued that since copyright law is not generally created in an open process, but in a closed process controlled largely by industrial interests, it is important to lobby for change in the way such laws are created. Examples of how engaged people are in the way copyright law affects them was demonstrated by two politicians in Canada who lost their seats due to misunderstanding the electorates’ disaffection with the way copyright law was being treated.

It was pointed out that the way industry has tried to combat copying culture has largely been based on encrypting content and selling this with a built in decryption key, which is largely going to fail because the key and the encryption are provided together – making is not exactly difficult to decrypt said content. An interesting thought on the value of copying culture is that the industrial revolution was driven by copying – creating many cheap copies on looms, and creating copies of other people’s equipment.

Regarding ebooks, Doctorow contends that part of the thing that makes books so dear to people is the experience of owning the book; ebooks have largely removed this feeling causing people to not have any affection for this medium. He points out that users do not own the content – in essence they lease content (note the Amazon book recall – this would never have happened with printed books).

A final, really relevant example of copyright silliness that was outline is the obsession of UK universities with patenting research – a Thatcher invention – this intellection property protection (IPP) means that for every £1 IPP brings in, universities pay £19 for the use of other universities intellectual property. This kind of rot has to be stopped – for many years, publicly funded research was not subject to patent protection because is was deemed to be in the public domain, for the benefit of society. Because the political attitude prevailing at the time assumed that educational institutions should be self-funding, it was deemed necessary to bring in money in this way – without realizing the fact that with income comes outgoings to other institutions doing the same. I wonder if this has any current parallels in Norway?

Doctorow concluded by encouraging librarians to work for better copyright law that has been produced in an open setting, and is adjusted for current contexts.

Tony Hirst provided a short talk on providing invisible services that gave access to library information, these are the kind of thing that we used to develop at NTNU Library about two–three years ago. He is of the opinion that the library should stop buying books, and that members of faculty staff should use Library Thing to circulate the books in their offices instead.

Peter Bryant talked about next-generation library catalogues; he is of the opinion that library catalogues provide no clue to users about the quality of the information contained therein because the cataloguers have no skills to provide this information – this is subject-related competence. He talked about the creation of authentic knowledge about content in a library catalogue using Linked Data, and points out that citation numbers in ISI and Google Scholar are supposed to provide information about academic quality, but yet they do not – in fact they provide little help. The library catalogue needs to provide ways of delivering authentic knowledge about subjects by typically providing links to other sources of content and metadata.

AK Sandberg told us about Pode, a Norwegian project. They want to create a library catalogue with a better user experience by using mashups based on open standards such as Z39.50, SRU, MARC, etc. They document everything and provide source code under an open license. (I have been involved in this project, so I can say that they are doing a lot of interesting work that runs in a different but related vein to that taking place on UBiT2010.

Behrens/Larsen provided an introduction to the work that has been taking place in Denmark on Summa, which is a system for “integrated search”. The system aggregates metadata from different sources and creates a list of hits. The hits are ranked and presented in a special interface. User testing revealed that the ranking was not good enough, and that users simply did not use the facets that had been provided as a way of narrowing the search. They had a number of ideas about how to work with the interface and improve the ranking, but found that changing the placement of facets had not improved their usage statistics.

Alan Oliver from Ex Libris told us about bX – a recommendation service based on clickthroughs in SFX – and was hacked to pieces by Peter Murray-Rust because of Ex Libris’ use of the word “open” in its marketing. Murray-Rust contended – correctly to my mind – that Ex Libris’ conception of open does not match up to the rest of the world’s conception.

Brian Kelly “standards are like sausages”; this presentation was an interesting look at standards, open standards and good things that are neither open or standards. Standards were originally seen as a way of ensuring interoperability, accessibility and avoiding vendor lock-in. Open standards such as OOXML are bad because they are in essence bound to one supplier, conflict with other standards, and are in truth ODF in an uglier wrapper (my prejudices might be coming through here too :D) Skype is a good non-open, non-standard, as is Google. Brian said that the 00s are characterized by an understanding that standards need to be applied sensibly, with that all-important contact with reality cf. the fact that W3Cs main page CSS does not validate properly because they want it to look right in all browsers. Standards should be written so human beings can understand them! Peter Murray-Rust: standards should be about rough consensus and running code.

The second day keynote, Peter Murray-Rust presented a set of challenges for libraries in the 21st century. I can really see a few ways of working the library into a key role at the university, if we take up some of these challenges. Especially as regards knocking the wind out of academic publishers – they are powerful, and power corrupts (Powerpoint corrupts absolutely).

I presented at a session on mobile libraries, and then had a long discussion with Patrick Danowski who presented at the same session. Following this, I attended an unconference session on the Semantic Web. This session was extremely interesting, but it somewhat difficult to relate, other than by stating that Linked Data is something that we should definitely be doing (but we know that already, yeah?)

All in all, this year’s ILI provided some food for thought, but I fear that the sessions that were most interesting for me are those that I have not really reported, such as the Semantic Web unconference and Peter Murray-Rust’s keynote. The reasons why these were so interesting were slightly different: Murray-Rust was inspirational, something that is nice to have now and again — it reminds you why you accept lower pay than you get offered by the commercial sector (oh, yes, that day is coming). The session on the Semantic Web reminded me personally why I work at NTNU Library: to be at the absolute cutting edge of academic library technology. I get the impression that we don’t generally understand this, and I get the impression that our web pages will never reflect this…as I said, that day is coming.

@ILI2009

2009-08-29

I’m off to Internet Librarian International 2009 in London in October, I’ll be giving a session on some mobile stuff we’ve been doing at NTNU. I’m looking forward to this because I am currently passionate about two work-related things: mobile devices and linked data, and I’m getting to talk about both (the data model for our mobile platform is linked data).

There’s quite a lot of new stuff I’ll be talking about, aimed at librarians with some technical expertise (i.e. attend this if you work in a library and know what a web-browser looks like). I promise an interesting session for people who are jaded with institutional web-infrastructure management.

Hope to see you there 🙂

NTNU/SiT for students? REMA 1000 & Bunnpris for staff?

2009-08-24

Worried about extending your meagre state wage (we’re not in it for the money)? Drop SiTs offerings and take a walk from Gløshaugen to Bunnpris, from Dragvoll to Rema 1000.

…joyous (you also get the exercise you’ll miss out on when you return your card for the Sports centres 😉 )

Linked data

2009-07-14

What is linked data? (Note that I’m ignoring any of the specifics of RDF, on which Linked Data depends.)

The “data” of linked data is metadata on the web that describes documents and resources; the linked part refers to the links that exist between metadata items. If this seems a little abstract, consider the following:

I own ten books related to my four interests:

  • Anglo-Saxon language (properly, Old English)
  • The history of Winchester, England
  • Computer programming
  • Cookery

The titles I own are:

Arnow, G. W. D. M. (1998). Introduction to Programming Using Java: An Object-Oriented Approach. Addison-Wesley.

Arnow, D., Dexter, S., & Weiss, G. (2003). Introduction to Programming Using Java: An Object-Oriented Approach (2nd ed.). Addison Wesley.

Fearnley-Whittingstall, H., & Carr, F. (2008). The River Cottage Family Cookbook. Ten Speed Press.

Gamma, E., Helm, R., Johnson, R., & Vlissides, J. M. (1994). Design Patterns: Elements of Reusable Object-Oriented Software (illustrated edition.). Addison-Wesley Professional.

Hagen, A. (2006). Anglo-saxon Food & Drink. Anglo-Saxon Books.

Hawkes, B. A. L. M. A. S. C. (1970). Two Anglo-Saxon Cemeteries at Winnall, Winchester, Hampshire. Maney Publishing.

Hervey, T. (2007). The Bishops Of Winchester In The Anglo-Saxon And Anglo-Norman Periods. Kessinger Publishing, LLC.

Mitchell, B., & Robinson, F. C. (2007). A Guide to Old English (7th ed.). Wiley-Blackwell.

Sweet, H. (1982). Sweet’s Anglo-Saxon Primer (9th ed.). Oxford University Press, USA.

Sweet, H. (2008). An Anglo-Saxon Primer (3rd ed.). Tiger Xenophon.

I want some way of keeping track of my book collection, so I create a catalogue of RDF files where I tag the various books with their topics:

  • Books by Arnow, Arnow et al. and Gamma et al. are tagged as Computer Science
  • Books by Fearnley-Whittingstall and Hagen are tagged as Cookery
  • Books by Hawkes and Hervey are tagged as History — Winchester
  • Books By Mitchell & Robinson and Sweet are tagges as Language — Old English

Immediately, I see that I have several editions of the same book, so I add a simple SameAs relation between these books by adding a URL to the RDF metadata of the other book, so Arnow (1998) and Arnow et al. (2003) link to one another in this way, as do Sweet (1982) and (2008). In this way I can easily see which books are related by following a link (technically “dereferencing” a URL).

The book on design patterns is so fundamentally important within computer science that I add a SeeAlso link to this book from the other computer science titles; in the same way, I can choose to add a SeeAlso relation between the other books tagged with the same tags, allowing me to easily access each title from a related title.

Because my interest in Winchester primarily relates to the Anglo-Saxon period, and especially linguistic/onomastic aspect of its history, I find it useful to link (SeeAlso) the titles on Winchester to the books on Old English. At the same time, I also add a SeeAlso to the book on Anglo-Saxon cookery for each of the titles on Winchester and Old English.

Based on this, I can at any time explore my book collection in a novel way; from any given starting point, I have numerous avenues to explore. I have a simple way to see that there are several editions of a title, and that the titles in my collection relate to a number of topics, which typically interlink. It is difficult to find a link between cookery or Anglo-Saxon history and language and computer science, but I am sure that more formal analyses within computational linguistics would fit into the model I have described in an understandable fashion.

It is worth noting that it is debatable whether my use of SeeAlso and SameAs strictly speaking correct, but it illustrates the point about enriching a collection of metadata with links. More information about metadata schemas for linked data can be found in the links section below.

It is also worth noting that this interlinking is two way, and that this leads to redundance (in order to get from A to B you need an explicit link, in order to get from B to A, you need another explicit link). This isn’t really a problem because the data-storage overhead is minimal, and the dereferencing of URLs can be done in such a way that redundance does not create unnecessary work (by, for example, not dereferencing URLs that have already been visited).

Links

linkeddata.org

RDF homepage (for RDF basics, schema and ontologies)

Zotero…

2009-06-28

I have just started using Zotero’s groups feature that allows you to share a set of references with other users. This feature is really great, and it works seamlessly with the Zotero plugin. It occurs to me that this must be a great tool for teachers wishing to share references with students, as well as project groups like the one I work in.

My current bugbear is that it isn’t possible to copy a folder over into the group library from my library, just as well that I’m a belt ad braces type who tags everything in addition to creating indecipherable folder structures.

On my travels on the Zotero webpage, I came across a list of registered users broken down by research interests. The interesting thing to note is that Zotero is now used in many fields. These numbers represent the optional choice of registering a field of interest when a Zotero account is created. Note that this means that a single person may be registered multiple times or not at all (Zotero does not require users to have an account to use the software).

Again, Zotero really is the best reference tool — if you’re not locked in, you’re not locked out. (And thank goodness for BibTeX support!)

BIBSYS modernization

2009-06-27

[This is a draft, I’ll be revising it]

The stream of mail on the Biblioteknorge mailing list about BIBSYS’ modernization has been almost unstoppable — at least five mails 🙂 — and names like Knut Hegna, Hans Martin Fagerli, Kim Tallerås and Dagmar Langeggen have been connected to the topic.

What’s the fuss? Well, it seems like BIBSYS will continue developing its own library management system, rather than buying off the shelf software. This contradicts the findings of “that there report” — you know, the one that recommended getting a new, off-the-shelf system. Full of wackiness — especially points at which it contravened the standard rules that a report should contain information about its subject from commonly accepted reality rather than the creative imagination — it is easy to ignore that report, but maybe it shouldn’t have been simply “ignored”.

My take on the whole thing: really very dull, I’m sorry.

Starting at the outside edge: the online public access catalogue is a relatively uninteresting concept today, and is becoming less interesting with each day that passes, here’s why:

  • Users aren’t finding their information there, they’re finding it on the web (report)
  • University libraries aren’t registering their information in an OPAC (at best they import a subset of the e-journal and e-book data they are spending the majority of their finances on)
  • The metadata a researcher needs is not registered in an OPAC, it is available only in research databases
  • Attempts at integrating/federating search in metadata for academic content have failed
  • Portals cost money, and this is money wasted when the user doesn’t want or need to use it

The internals of a monolithic library management system are also past their sell-by-date for the majority of academic institutions:

  • Packages and subscriptions increasingly account for the majority of spending
  • Metadata import related to packages are typically limited
  • Acquisitions are increasingly expected to conform to norms applied in the rest of the institution
  • User systems are in place that register users’ role and access privileges

Given that the majority of spending goes on resources that are already findable using other methods (the metadata we’re importing must come from somewhere, and Google is the preferred tool of discovery), there is really very little need to register the majority of objects we’re currently registering.

Academic institutions — especially publicly funded ones — have resource management systems that ensure that every economic transaction is done by the book, i.e. ensuring that things are put out to tender, and that an economic overview is available to the various controllers around and about. This means that an economy module in a library management system is not a good thing, it encourages practices that go outside normal routines, and hinders the financial controllers from doing their job (you do not want or need more than one system for this kind of thing).

Another aspect of the library management system is that of loan data, where a user profile system provides what data is needed about a borrower, and then attaches various objects to this. At a modern academic institution integration of the insitutional user-profile system with the library management system’s user-profile system is one big headache…so why do things that way around? Why not implement the necessary slots for library data in the existing institutional system? The framework for the kind of query across several thousand records is already in palce, so why not use it? There is time and money to be saved here too.

We’ve got rid of a few subsystems here, but were back to the sticky issues:

  • registration of items in a library’s collections
  • sharing of data

Simplicity itself would be requiring all third parties to supply Linked Data for their products (and yes, this is a realistic thing to ask), and then registering the remainder by creating a local linked data store containing either totally unique metadata for items that are otherwise not registered anywhere, or by linking to existing items registered in other Linked Data stores (and this can include non-Norwegian sources). In this way, a massive web of data is available to the library’s users, containing references not only to things in the local library’s holdings, but potentially to all existing items in any library that provides Linked Data.

The OPAC can now either be replaced by the generic, or domain specific semantic browser that a given user prefers, or a user interface that provides a wrapper for SPARQL queries and a presentation format for the data returned from the dereferenced URIs contained in the Linked Data. The latter here could be something created locally by the library IT staff, or an enterprising librarian who knows a bit of Javascript, and can follow the instructions given in a typical Javascript library.

BIBSYS can potentially be a provider of tools for a) creating Linked Data, and b) storage and retrieval of this. Modules for solving issues related to lending and finances could be off-the-shelf software, supplied where these were deemed necessary.

Links:

Linked data [wikipedia]

Semantic Marc, MARC21 and the Semantic Web

BIBLIOTEKNORGE om vedtak om modernisering av BIBSYS Biblioteksystem

emtacl10: a website for the academic conference

2009-06-14

I have updated the emtacl10: emerging technologies in academic libraries website; the changes add a lot of content (and some new dates!) to the information that was published previously.

For me, the interesting thing was combining a set of technologies:

  • blueprint css
  • jquery
  • eXtensible Metadata Platform (XMP)
  • RSS
  • AJAX

The really cool thing about these technologies is that they made everything really quite easy; easy to create valid, accessible code, and easy to do all of this quickly.

The “assets” list is created from metadata embedded into the files that are listed there; this, and the rest of the content is updated using AJAX provided by the jquery framework. Blueprint css is used for the layout.

Two day’s work inclusive of everything! (And I really dislike creating webpages, but this verged on fun.)

Publiseringskanaler: Feil i DBH

2009-06-4

Det viser seg at papirkildene er mer verdsatt i utdanningssektoren!

I tabellen finner du oversikt over feilene jeg fant som en del av statistikkarbeidet jeg gjør for NTNU Biblioteket — basert på en kryssreferanse med sjekk på alle ISSN registrert i DBH sin publiseringskanaloversikt og NTNUs SFX-implementasjon. Tittel-lenken går til søk etter tittel, ISSN-lenkene går til ISSN-søk.

Tittel ISSN 1 ISSN 2 Disiplin 1 Disiplin 2 Nivå v1 Nivå v2
Journal of Design History 0952-4649 1741-7279 Arkitektur og design Publiseringskanaler 2 1
ELH 0013-8304 1080-6547 Engelsk Engelsk 2 1
Molecular & cellular proteomics 1535-9476 1535-9484 Biomedisin Publiseringskanaler 2 1
Requirements engineering 0947-3602 1432-010X Tekn – Datateknikk og datavitenskap Tekn – System- og teknologiutvikling 2 1
Communication Theory 1050-3293 1468-2885 Medier og kommunikasjon Publiseringskanaler 2 1

Pedantisk? Spørs om registrerting i FRIDA blir påvirket (i disse tilfellene har det ingenting å si).