Entries in the Category "metadata"

Dublin Core Metadata Initiative (DCMI) Conference

20 October 2010
Dublin Core Metadata Initiative (DCMI)conference
Notes

Tutorial 1: history, key concepts, evolving context
Jane Greenberg, UNC

Dc:terms (slides 30-33).
Imt internet media type
Classes – nouns
Properties – verbs
Property expresses a relationship

Tutorial 2: DCAM, syntax, and semantics
Jon Phipps

Xml and rdf are different data models, not different encoding
RDF schema and non-literal
RDF not record-oriented format
RDF is a framework
XML as document vs. data model. Roots are in document model.
RDF is strictly a data model

Ontology: collection of meanings
RDF has URI. RDF collects information about you; exists in terms of context it is used, not context in which it is created
Xml creator predefines the context.
Rdf statements stand alone; has to assume certain things
e.g. drivers license number implies human being (rdf), vs. you are a driver, therefore you must have a driver license number (xml)
URI – name of a property thing. Human-readable URI abstract. Opaque URIs

Ontology: provides a representation of knowledge. "these are the things I know" conveys knowledge, not syntax (which RDF does).
URI has to be an identifier in RDF, not necessarily a location.
RDF no notion of container. Just an aggregation. No limit; can go to infinity.

Tutorial 3
Karen Coyle

http://www.kcoyle.net/
Web of data
Library has dark web
Semantic web. Formal rules for machines.
Triples:
subject – predicate – object.
E.g. Melville – wrote – moby dick
http – creator – Melville

semantic web allows you to make queries.
You can retrieve data rather than documents
Get back answers, not bib record
Query language is SPARQL
http://dbpedia.org/About

problem currently is that data is in records/silos
rather than create a database to link records together, there is a way to avoid this
to hold records together, have the thing it refers to have an identifier
i.e. create a box that contains a center. Coherence of info is not lost. Data is represented in different ways.
Open records slide: data is opened up (cf tinker toys) with edges that are able to connect to other things. You can take connecting points and have it be the focus.
Web of data.

Freebase
http://www.freebase.com/
metaweb. See youtube video on this -- http://www.youtube.com/watch?v=tBSdYi4EY3s
freebase uses triples
Google just purchased freebase.
If data is in this format, libraries can do this, too.

internet archive
does not entirely use triples

open library: http://openlibrary.org/
semantic web: http://semanticweb.org/wiki/Main_Page

we have to learn to create data, not text. Emphasize the data in metadata
e.g. anything date related, to put into timeline
e.g. geographical data can be put onto a map

Data has to be web-friendly. Therefore…
Define data elements. Use RDF, or OWL.
Controlled vocabularies. SKOS. (simple knowledge organization systems)
*** http://id.loc.gov – definitely explore this
identifier can have displays in any language (an advantage over using language)
w3c validator for triples

URI's are good. You can see who is the responsible party. You should be able to click on it to get more info

Tutorial 4
Ron Daniel

http://www.linkedin.com/pub/ron-daniel/7/1a2/47b
many use 15 dc elements, with some local additions
why create metadata? 1st, search results still poor. 40 percent report not being satisfied with search results. (Jane Greenberg, eContent Magazine, 2010)
don't try to sell the semantic web. Have a plan that is incremental, and gradually builds to the semantic web.

Problems:
Content that is not Deduped
Content that is poorly prioritized
Unavailable content
GIGO
Poor curation

Search engine optimization. To get content to appear first. Most companies don't do this.

Add to the core local elements:
Group that published the item, data to review an item, etc. Use DC refinements, such as access rights, audience, etc.

Cross-collection functionality.

Facets. Shift from text values in the various fields to a faceted taxonomy with managed list of values. Where current state mostly is.
Faceted taxonomies are basis for classes and instances in an ontology. Can make facets pay for themselves.
Facets are separate authority files with no relationship between them.

Faceted taxonomies. (Bush's law of facets)
Take existing list, use it.
Look at existing vocabularies, see repeated structures. No particular requirement for the order of the hierarchy.

Vocabularies are vital for the linked data world. Facets help organizations deal with vocabulary selection and maintenance
Vocabulary tools embody terms vs. concepts. SKOS is based on concepts. SKOS-XL added to deal with 'terms'. SKOS-XL follows the RDF model.
Unfortunately, term-based approach appears necessary when dealing with large problems.

RDF and Web Identifiers
RDF gives framework.
People don't want to see URLs or RDF. They want to see: author, title, artist, photographer, artist.
(URI vs URL. Difference isn't that big a deal.)
To fit with RDF, terms in standard vocabularies need URIs
If you are creating vocabularies, define a URI scheme for them. W3C 2-level URI scheme is a best practice for documents. Allow for version identification.
If you are using other people's schemes, you can define interim URLs.

Linked data principles
Use URIs to identify things (including field names and taxonomy term record IDs)
Use http URIs so that these things are looked up.

Results in more accurate mappings between different internal data items due to additional context available. Also, add value to proprietary data by linking to common concepts (people, places, …), and better access to information.
Having defined URIs in the RDF and web identifiers phase, start mapping to equivalent concepts in Linked Data hubs
Mapping to a public hub does not expose your data

Entity recognition.
Fact extraction. Builds on results of entity extraction, and uses patterns of connections between entities.

Hard to get good content in metadata.
*** taxonomy warehouse (definitely check this out) http://taxonomywarehouse.com/

Closed world assumption. "If I can't say it's true, then it's false"
Looks dumb, but you can compute something.

DAY 2

Stuart Weibel
DC developed in conjunction with linked data and semantic web (since 2000)
Core vocabulary for linked data is DCMI vocabulary
Singapore framework. http://dublincore.org/documents/singapore-framework/
RDF aspirational technology.
DC failed at data models.
RDF is metadata architecture for the web. Remains an aspirational technology
Linked data: the web, recast. Idea of persistent identifiers for each piece of metadata/concept
But… data quality needs to be maintained, and boundaries between semantic vocs need to be bridged
We need metadata more than ever, in spite of Google.
See Malcolm Gladwell New Yorker article, Oct. 4, 2010

M. Zeng, A. Salaba
FRBR. Overview.
FRSAD. Functional requirements for subject authority data (thema)
FRAD. Functional requirements for authority data (person, family, corp body)
*** Coyle and Baker, 2009. Guidelines for DC Application Profiles
Singapore Framework.
Allison, Johnson, Powell (2007). DC Application Profile for scholarly works
Proposed extension of SWAP domain model. http://www.springerlink.com/content/mtm7gu9t8bnuuu16/
Scholarly work supervised by/is created by Agent is translated by/is edited by Expression
Scholarly work has subjects/represented by subjects
Modifications: subject relationship and agent relationships
Need for a general AP model. More agent – group 1 relationships.
Scholarly work and expression and manifestation is funded by/funds Agent

Jenn Riley, Indiana University
V/FRBR project at IU
Test for FRBR, using music (scores, recordings). Investigating FRBRized cataloging interfaces
Develop a model that embodies FRBR principles. Data must be interoperable, with data modeling that works outside of libraries
Interest is in data itself. Trying to make FRBRized data go beyond local library system.
Using selected FRBR/FRAD entities
Examining FRBR attributes.
Music Ontology and DCMI/RDA Task Group. Has formal specifications made public.
Data in XML format released. RDF/Linked Data begun
Schemas used: interoperable. Top level collection schema, imports item schemas, etc.
Working with linked data.

Jennifer bowen, Univ of Rochester
Moving library metadata to linked data
Is it possible to turn legacy library MARC data into linked data in an automated environment?
How can extensible catalog software be used for this purpose?
Use URIs for names of things
RDF triples: subject – predicate – object
To create linked data, software need to transform legacy data, then analyze the mapping of legacy data to linked data.
XC extensible cataloging software is open-source, developed at U Rochester. http://www.extensiblecatalog.org/
This is a grant-funded project.
User interface (CMS) – metadata processing – connectivity tools
XC and linked data. Converts MARC codes to vocabulary values, remove extraneous data, normalize inconsistencies.
Problems: some MARC fields/subfields difficult to map.
MARCXML parsed to XC work, expression, manifestation.
880 fields do not map well to FRBR group 1 entities. 880 should be mapped to contributor or subject. XC cannot do this mapping.
Language attribute 041 subfield h. should be based on FRBR Expression. XC software cannot do this mapping.
If MARC is mapped to Group 1 entity, possible to create linked data. But is this necessary? RDF triples might be more advantageous.
Goal is to output XC metadata in RDF. Unique id's for metadata records. Have unique URIs for subject—predicate—object.
Unique id's for each XC schema.
Data elements from DCMI, and "open metadata registry": http://dcmi.kc.tsukuba.ac.jp/dcregistry/navigateServlet
Embed URIs into the XML records.

Linked data 1: domain models
DBPedia.
FRBR based on user needs: find, identify, select, obtain. According to Karen Coyle, this is too linear.

CONTENTdm Metadata best practices
Han, Bair, and Lee

Western Michigan University
Map CONTENTdm metadata to MARC. Using qualified DC is recommended.
*** Ingest metadata into WorldCat
Increase access to collections to a broader user base.
Need for guidelines. Loss of richness and meaning when sharing DC metadata in aggregated environments.
Difficulty of mapping to MARC
12 core elements and 4 recommended "as appropriate"
4 recommended are source, relation-is part of, coverage-spatial, and coverage-temporal
Title – leave off brackets
Creator – do NOT use unknown, but leave blank instead
Publisher – use "digitized by"
Templates for photographic collections, archival collections.
All this based on "digital information seeker" research, surveying users.
Prefer natural language, heavy reliance on Google,
Users want to search names by keyword, search for subjects by browsing
Metadata as a marketing tool. Catalogers should develop their own local controlled vocabularies
Create subject headings that comprise 20 percent of the work (LC policy)
CONTENTdm Metadata User Group Wiki. http://contentdmmwg.wikispaces.com/

OLAC-MOUG Conference (Sept 26-28)

Of the sessions I attended at the OLAC-MOUG 2008 conference held Sept. 26-28 in Cleveland, one in particular of interest to work in our department currently was "Metadata for Audiovisual Materials and its Role in Digital Projects," given by Jenn Riley of Indiana University. Her blog is itself worth viewing at: Inquiring Librarian

Among the issues and standards discussed:

Types of Metadata
1. Descriptive: e.g. title, author Used for searching and display
2. Administrative: (a) technical. e.g. software, pixels, compression, etc. (b) preservation. e.g. how digital content was created, what migration was, how hardware was used, how things were done, and (c) rights. e.g. who can do what with it
3. Structural: navigation within an object
4. Markup Languages: puts information within the content itself. e.g. TEI

3 Types of Standards
1. Data structure standards: e.g. MARC, MODS, DC
2. Data content standards: e.g. AACR2, DACS, CCO
3. Controlled vocabularies: e.g. LCSH, MESH

One must mix and match to meet one's needs.

General Descriptive Metadata
1. MARC
2. MARCXML
3. MODS. General MARC inspiration, but not its equivalent. Intended to be useful to a wider audience than MARC. Used when you want a library type aproach but more interoperability than MARC
4. Dublin Core. Perhaps the most misunderstood metadata standard. Is not meant to replicate MARC. Abstract model is the current focus (e.g. DC record would report on slide of Mona Lisa, not the Mona Lisa itself. Follows 1:1 principle. Unqualified DC is generally used as a format for sharing metadata with others.

Still Image Descriptive Metadata
1. VRA 4.0. Work and image in separate records. Focus is on creation, style, culture. Best used on collections of reproductions of works of art and architecture.
2. CDWA Lite. To help museums share metadata about their collections.

Music Descriptive Metadata
1. Variations2 at Indiana University
2. Music Ontology Specification.
3. ID3 tags for MP3 files.
N.B. No discipline-generated format has emerged.

Other Media Metadata Standards
1. MPEG-7. From Moving Pictures Experet Group. Focus is on low-level features, not library bibl. info. Intended to cover descriptive, technical, rights
2. Public Broadcasting Core (PB Core). to support the creation, mngt., and discovery of media items. Good for broadcasting archives. 4 classes: (1) IntellectualContent, (2) IntellectualProperty, (3) Instantiation, and (4) Extensions.

Technical and Administrative Metadata for A/V Materials
1. Metadata for Images in XML (MIX) Derivable from image itself. Has compression level, pixel dimensions, format-specific data, and bit rate. Maintained by LC Network Development and MARC Standards Office
2. Audio Engineering Society (AES) Core Audio Metadata. Under development. Can be used for analog and digital audio. Audio editing software should generate this format.
3. LC Audio-Visual Prototyping Project VIDEOMD Data Dictionary. Just video information; assumes separate format for audio track. Note duration, sample rate, physical tape characteristics, frame size/rate
4. AES Process History Metadata. Records "processing events." Used to support digital preservation process. Used for any audio file.

Structural Metadata
1. METS. Wrapper to package many types of metadata together for a resource. Expectation is that METS documents would be generated automatically.
2. SMPTE Material eXchange Format (MXF). Wrapper for metadata and media files.
3. Synchronized Multimedia Integration Language (SMIL). For multimedia presentations. Embedded media, transitions, timing.

Music Markup Languages
1. MusicXML. For content. Not "metadata." Encodes musical notation itself. Tends to include header with some descriptive metadata.
2. Music Encoding Initiative (MEI). Akin to TEI. UVa maintains this.

Conclusion
Support for these standards is "ridiculously low"
Use MODS for discovery (item level description of a variety of formats), and METS (structural metadata for multi-page objects) for delivery.

Postcript

Much talk throughout the OLAC-MOUG conference concerned RDA. The RDA initial release is slated for 3rd quarter 2009. In early 2010, JSC national libraries will evaluate RDA prior to implementation. See the JSC RDA website.

August 2008 presentation

Here is the Powerpoint presentation I gave August 6, 2008. Note: Some links to external files will not work:
Powerpoint: Cataloging Trends and Challenges

shareable metadata

Before knowing how to construct metadata, it might be useful to know what metadata is and what its use can be. Three articles I recommend are:

1. Calhoun, K. "Being a librarian: metadata and metadata specialists in the twenty-first century."

2. Shreeves et al. "Is Quality Metadata Shreable"

3. Shreeves, Riley, & Milewicz. "Moving towards shareable metadata"