« July 2008 | Main | September 2008 »

August 26, 2008

Some comments on image tagging and organization

An upcoming exhibit at the Museum of Contemporary Art in Antwerp entitled "The Order of Things" confronts some of the questions on image archives and online image banks. Natural language searching is often faulted its non-specificity and inability to produce fully relevant search results. An interesting juxtaposition when combined with the visual image. Order and classification often lose out to rather arbitrary category assignments by the creator. These can make for an interesting cultural and social commentary on the varying associations and significance of images, though also creates many problems in the search.

In a book written in 1966 by Michael Foucault with the same title as the exhibition, the author writes about the will to order by the individual. "Order, on the other hand, is established without reference to an exterior unit..... one cannot know the order of things in their isolated nature, but by discovering that which is the simplest, then that which is next simplest, one can progress inevitably to the most complex things of all" (p. 59). Foucault identifies a problem that is inherent to the classification of an individual image. The image is considered and referred to in a relatively simple and flat dimension, as the thing itself, often not as a series or in any relation to other items. These methods of classification also usually occur at different times and within different environments, so is it a surprise that there are so many error and mistakes in searching with natural language, particularly with images?

In indexing images, there is also a difference of thinking about the content over the concept (or vice versa). If you were to add key terms to a work of art, for example, a difference between the 'ofness' and 'aboutness' of the work will quickly become apparent. Is the content of the work important here, or other iconological identifiers (cultural, religious, symbolism)? Sometimes it depends on the purpose of the database, or even in the user group who will be using the database. Some art databases are structure to search using color, shape, texture, material or other aspects of the work that would not be useful outside of that context.

A related work by Roy Arden, 'The World as Will and Representation' (2007) is a collection of 28,144 images combined in a whirlwind slide show format of images found online and other sources. The collection of images illustrates the chaotic nature in the multitude of images that were taken from online images. These images articulate the disorder and lack of context when separated from original environment.

Google has asked for help on their Image Labeler, where people provide labels to different images which are compared to an identical set of images from another person. People are asked to focus on descriptive labeling in this exercise. These tend to become generic, and less specific to place, time and content. Instead the basic elements are chosen ('guitar', 'singer', 'bird', 'sky', etc.)

ALIPR is an automatic photo tagging and visual image search. The automatic image annotation is limited to only 322 English words at the moment, though also points to more generic tagging. Interestingly, the engine includes emotions in this limited set (some examples are 'surprising', 'amusing', 'pleasing', etc.). As more images are added, the results become more broad and less specific, and therein more time spent sifting through unrelated hits.

Posted by vad17 at 03:35 PM | Comments (0) | TrackBack

August 19, 2008

Web 2.0 meets the National Library of Scotland's digital library

An interesting blog on the newer applications taken on recently by the National Library of Scotland. The National Library currently has a presence on Facebook, and collections on YouTube and Flickr. The library is using these newer modes of social media to promote the library and provide a means to reach new users, while also striving to create human interaction within these medias.

An article discussing some of the metadata issues from the YouTube project at the National Library by Eilidh MacGlone from the Scottish Screen Archive can be found here. MacGlone addresses some of the shortcomings of the descriptive metadata incorporated into the videos hosted on YouTube, while also highlighting the potential of effective tagging.

While some of the control of the regular library catalog may be lost through these avenues, other methods of access and inter-relation emerge. One of the newer features on YouTube is the 'warp speed', which relates videos in a more visual manner, or Flickr's geotagging which can correlate a certain geographical location to an image. While this is also prone to error and inaccuracy, I think these methods are also evidence of other ways of searching, particularly with strictly Web reliant users. Many web users search the web using natural language, giving more credibility to tagging in Web contexts.

In any regard, there is a way of keeping track of usage on both Flickr and YouTube, so in the least an institution can get a sense of how many hits a single image or video receives. Whether these users will venture on to the institution's website or even use the collection in another method remains to be seen, but make for an interesting study.

Posted by vad17 at 01:57 PM | Comments (1) | TrackBack

August 05, 2008

Digital preservation

Digital preservation; a paradox in and of itself?

The topic of preserving digital files is a complex subject, particularly in large-scale digital projects, where project management and the sheer logistics of the storage of massive amounts of digital information is a daunting reality to most content and IT managers. When concrete terms of longevity are unknown in any certainty for digital formats, the reality of preserving digital files is a difficult topic to assess. Project managers for digital projects can put enormous amounts of time into planning for the storage of digital files to ensure the longevity of digital information, and incorporate methods of migration, duplication and even emulation of obsolete applications, operating systems, or hardware platforms in the efforts of preserving digital information.

Avoiding technological quicksand, a report from Council on Library and Information Resources from 1999 hits on many of the key issues of digital preservation that are still applicable to issues of digital preservation, even almost ten years later. The author is the first to point out that the paradox in the term, digital preservation: "...given the fact that digital documents can be copied perfectly, which is often naively taken to mean that they are eternal." Rothenberg adds that the problem is not solely in the actual digital file, but also in the "administrative, procedural, organizational, and policy issues".

Media decay and hardware obsolescence (and, in some cases, software) are facts of life in the existence of almost every digital format, though careful planning in the overall incorporation into a database or digital library is key to successful 'survival rates' of digital information. An interesting aspect to Rothenberg's report, was the idea of three distinct definitions and relation to time that is apparent in digital projects. Many projects are approached in relatively short term efforts, in that digitization is used as a method of reformatting material or transferring to a digital format in an effort to salvage information from the original format or container. The medium term effort considers issues of digital longevity, but retains formats that are contemporary with the time the digital file was created (oftentimes, software based, such as Microsoft Office formats for example). These formats are usually considered as being more commonplace at the point of digitization, yet are still defined by a propitiatory software or assume a level of compatibility to systems down the road, and thus, are not as sustainable in the larger picture. Rothenberg defines the long term as digital formats which are not software dependent, and "must handle current and future records of unknown type in a uniform way, while being capable of evolving as necessary." Interestingly, the author calls for minimal human interaction in the long term effort, with method of data refreshment and migration to take place in a uniform, automatic and synchronous manner.

Rothenberg also considers the access, fidelity, and the ease of the entire document management system to be tradeoffs between each other in terms of priority in the retention of digital files. The theory is that increased accessibility will lead to lower quality digital formats, as with the ease of the system, etc. With all that said, the evolution of archival digital files (or, dare I say, a singular file format as suggested by Rothenberg?) will be an interesting development to watch transpire.

More recent report on Digital Preservation: Mind the Gap: Assessing Digital Preservation Needs in the UK, 2008

Posted by vad17 at 01:40 PM | Comments (0) | TrackBack