Embed presentation
Downloaded 85 times
Establishing the Connection: Creating a Linked Data Version of the BNB Neil Wilson Head of Metadata Services
Changing Expectations Public Sector Metadata The Web has accelerated development of a collaboration culture & fostered expectations that information & content should be as freely available as the Internet itself Many wider benefit arguments have been advanced for public bodies to make their data freely available 2009 saw an increasing Government commitment to the principle of opening up public data for wider re-use. The "Putting the Frontline First: Smarter Government" report required " the majority of government-published information to be reusable, linked data by June 2011"
Developing an Open Metadata Strategy Choices and Challenges When developing an open metadata strategy we wanted to: Try and break away from library specific formats e.g. MARC and use more cross domain XML based standards e.g . DC, RDF etc Develop the new formats with communities using the metadata Get some form of attribution while also adopting a licensing model appropriate to the widest re-use of the metadata Adopt a multi track approach addressing the needs of: Traditional libraries Researchers wanting to ‘data mine’ catalogues & new linked data developers & users ... And deliver the above with decreasing resources
First Steps Toward An Open Metadata Strategy During 2010 We... Developed a capability to supply metadata using RDF/XML standards used in the wider web community Conducted trials with a range of new users including: the UK Intellectual Property Office & UNESCO Developed a free Z39.50 MARC record download service for libraries to assist with derived cataloguing etc Hosted a linked data workshop with 40 representatives from key international organisations
Current Status Since August 2010 We Have: Created a new email enquiry point for BL metadata issues: [email_address] Signed up nearly 400 organisations worldwide to the free MARC21 Z39.50 service Worked with JISC, Talis & other linked data implementers on technical challenges, standards & licensing issues Begun to offer sets of RDF/XML metadata under a Creative Commons 0 (CC0) license Supplied multi-million record sets to organisations including: the Open Bibliography Project, the Open Library & Wikimedia Commons
Library Metadata & The Promise of Linked Data Traditional library metadata uses a self contained, proprietary document based model The Semantic Web uses a more dynamic data based model to establish relationships between data elements via links By migrating from traditional models libraries could begin to: Integrate their resources in the web, increasing visibility & reaching new users Offer users a richer resource discovery experience Transition from costly specialist technologies & suppliers & widen their choice of options Open Standards Dynamic/Reactive Links to external resources Micro Portal - Interacts with users & systems in response to queries Offers options for further inquiry Proprietary, library specific standards Passive Self contained Linear text -‘Read’ by users as result of database query Offers end result ‘ Semantic’ Metadata Properties Traditional Library Metadata Properties
Our Linked Data Journey... What to Offer? Wanted to offer data allowing useful experimentation & advancing discussions from theory to practice Why BNB? General database of published output and not an institutional catalogue of unique items Mass produced works on all subjects, many with internationally recognised identifiers e.g. ISBN Reasonably uniform format across 60 years of publication Significant amount of data – 3 million records in various languages
Our Linked Data Journey... What do we need to get there? Wanted to undertake the work as an extension of existing activities and as an opportunity to develop expertise using: Existing staff – librarians rather than IT experts As many pre-existing tools or technologies as possible Standard PC hardware for conversion Library MARC21 data as a starting point Established linked data resources to connect to A proven platform that would enable us to concentrate on the data issues
Our Linked Data Journey... First stage: How To Migrate the Metadata? From a flat catalogue card model to something more appropriate... Preliminaries: Staff training in linked data modelling concepts & increased familiarisation with RDF & XML concepts Experience of working with: JISC Open Bibliography Project & Others Feedback on initial MARC to XML conversion work Incremental approach adopted Open Data License RDF/XML Format Add External Links Re-model Create Linked Data
Our Linked Data Journey... Second stage: Selecting trusted resources to link to To begin placing library data in a wider context & supplement or replace literal values in records Looked for library sites: Dewey Info LCSH SKOS VIAF Plus more general sites: GeoNames Lexvo RDF Book Mashup
Our Linked Data Journey... Third Stage: Matching and Generating Links Three main approaches used: Automatic Generation of URIs from elements in records e.g. DDC Matching of text in records with linked data dumps e.g. personal names to VIAF & subjects to LCSH to identify URIs Two stage crosswalk/matching process for some coded information e.g. MARC country & language codes for GeoNames
Our Linked Data Journey... MARC to RDF Conversion Workflow 1) Selection In-house utilities / MARC Report Exclusions (CIP; multiparts; serials) 2) Pre-processing MARC Global Normalise data values, Remove trailing punctuation Move/copy data values to improve machine matching/transformation 3) Character set conversion In-house utilities Decomposed UTF-8 converted to precomposed for conformance with W3C recommendations 4) URI creation In-house utilities Create BL URIs in MARC fields) Harvest URIs from external sources 5) Data Transformation MARC Report & MARC 21/RDF XSLT Convert to RDF & Insert URI prefixes MARC to RDF Conversion Consists of multiple automated steps using a range of tools
Our Linked Data Journey... MARC to RDF Conversion Workflow
Our Linked Data Journey... Which took us from here...
Our Linked Data Journey... Via here...
Our Linked Data Journey... To here...
bnb.data.bl.uk Preview Options bnb.data.bl.uk/sparql bnb.data.bl.uk/describe bnb.data.bl.uk/search . Includes: BNB Books 2005-11 485,000 records 18,000,000 RDF Triples
bnb.data.bl.uk Sample ‘Labelled Concise Bound Description ’
Our Linked Data Journey... Journey’s End...Point? Preview Details at: http:// www.bl.uk/bibliographic/datafree.html Roadmap for next steps includes: Staged release over coming months for: books, serials, multi-parts etc Aiming to update on a monthly basis once complete Documentation & further refinement of data model Looking at RDF triple dump option What else might be offered?
Lessons Learned on the Journey General It is a new way of thinking Legacy data wasn’t designed for this purpose so starting can be problematic There are many opinions...but few real certainties Everyone is learning & multiple solutions exist so you may be the best judge Don’t reinvent the wheel...there are often tools or experience you can use. Start simple & develop in line with evolving staff expertise Give careful thought to data modelling & sustainability issues e.g. Where possible use cross domain standards e.g. ISO codes in data Select relevant & stable targets when providing links if you are doing so
Lessons Learned on the Journey Data Issues Reality check by offering samples for feedback to wider groups Be prepared for some technical criticism in addition to positive feedback & try to continually improve in response Conversion inevitably identifies hidden data issues...& creates new ones! ... But it’s often better to release an imperfect something than a perfect nothing!
Lessons Learned Along The Way Staff and Resource Issues It can be a steep learning curve so: Look for training opportunities to develop staff skills to support new open metadata standards Cultivate a culture of enquiry & innovation among staff to widen perspectives on new possibilities Look into collaborative pilot projects with peer organisations to share resources & expertise See what tools are already out there that can save you development time or assist in checking data
Final Thoughts... For Others Contemplating a Similar Journey It’s never going to be perfect first time We expect to make mistakes We aim to learn from them We hope others will learn something too ... and that everyone benefits from the experience So if anyone is thinking of undertaking a similar journey..... Just do it!
Any Questions...? bnb.data.bl.uk/sparql bnb.data.bl.uk/describe bnb.data.bl.uk/search Images from