Main

January 14, 2009

Overview of JPEG 2000

I’ll begin by reviewing a Technology Watch Report from the Digital Preservation Coalition entitled JPEG 2000 – a Practical Design Preservation Standard? by Robert Buckley, Ph.D., of the Xerox Research Center in Webster, New York. Buckley has been a researcher with Xerox since 1981 and is the Xerox representative on the US JPEG 2000 committee and “was the Project Editor for Part 6 of the JPEG 2000 standard.” 28

I found this article fascinating because it provided detailed insight into JPEG 2000 and offered answers to questions not only that I had, but that I hadn’t even considered asking. As well, it provided concrete support for the decision here at Case to go ahead and pursue JPEG 2000 as our delivery format and even possibly our preservation format.

I’ll begin by reviewing a Technology Watch Report from the Digital Preservation Coalition entitled JPEG 2000 – a Practical Design Preservation Standard? by Robert Buckley, Ph.D., of the Xerox Research Center in Webster, New York.  Buckley has been a researcher with Xerox since 1981 and is the Xerox representative on the US JPEG 2000 committee and “was the Project Editor for Part 6 of the JPEG 2000 standard.” 28

I found this article fascinating because it provided detailed insight into JPEG 2000 and offered answers to questions not only that I had, but that I hadn’t even considered asking.  As well, it provided concrete support for the decision here at Case to go ahead and pursue JPEG 2000 as our delivery format and even possibly our preservation format.

Definition

“JPEG 2000 is a wavelet-based standard for the compression of still digital images.  It was developed by the ISO JPEG committee to improve on the performance of JPEG while adding significant new features and capabilities to enable new imaging applications.” 2

In examining this article, I want to do some deconstructing.  Everywhere I go, for instance, the definition of JPEG 2000 kicks off by stating “is a wavelet-based standard,” but almost no where does anyone stop to describe what exactly a wavelet-based standard is… It is also important, I think, to note the reasoning given for JPEG 2000: “to improve on the performance of JPEG while adding significant new features…” this to me is important because a lot of people have been stating that it is an archival or preservation quality format for images.  Now, I have read and am reading articles that are performing research to demonstrate the truth or falsity of this statement (one of which was a partnership with Harvard, Google, and the Open Content Alliance and I’ll discuss this later).  There is evidence that JPEG 2000 can be used (and possibly successfully) as a preservation standard, but it was NOT DESIGNED for that.

Application and Features

JPEG 2000 boasts its use for geospatial and medical imaging, digital cinema, image repositories and networked image access, and describes the following feature set:

As the article states, “an application can access and decode only as much of the compressed image as needed to perform the task at hand…The zoom, pan, and rotate operations that users increasingly expect in networked image systems are performed dynamically by accessing and decompressing just those parts of the JPEG 2000 codestream containing the compressed image data for the region of interest.” 2

Per the above, one interesting feature of JPEG 2000 that will be discussed in more detail later on is ROI or Region of Interest functionality.  By using ROI features (which takes advantage of tiling) an end user can look at one area of an image in extraordinarily high resolution and benefit from only loading that particular region/tile, rather than the entire image (which would be cumbersome depending on the image size, user bandwidth, etc).

From the standpoint of an archives or repository, “Using a single JPEG 2000 master to satisfy user requests for dynamic viewing reduces storage costs and management overhead by eliminating the need to maintain multiple derivatives in a repository.” 2

For those of you not familiar with how image archives have worked in the past, you would present the user with options: several options.  Option one might be the full uncompressed TIFF image, which for the WPA Print images in Digital Case can average around 350MB; a second option would be a JPEG version of the original with no physical size reduction (resolution) but with compression; then a 1024 x 768 version; then an 800 x 600 version; and usually a thumbnail to represent the collection.  So, in this example alone you see that there is one original image with 4 derivative images of varying sizes (based on an assumption of end user choice).  At worst this process involved a student manually manipulating each image.  At best a batch process is created in an application, like Adobe Photoshop, to automatically produce these derivative images.  From the “storage costs and management overhead” perspective, then it must be considered that each of these images had to be loaded to the server for display: thus, one image is not only cumbersome, but has multiple other images which vary in their cumbersomeness as well.
This is to say that JPEG 2000 results in dynamic generation of many sizes of images, so small, medium, and large files don’t need to be generated manually.  This is one of JPEG 2000’s most attractive features.

“JPEG 2000 is being used increasingly as a repository and archival image format…many repositories are storing visually lossless JPEG 2000 files…compared to uncompressed TIFF, visually lossless JPEG 2000 compression can reduce the amount of storage by an order of magnitude or more.” 2
This is what I’ll discuss in upcoming articles regarding the use of JPEG 2000 in this way.  Surprisingly, in the HUL/Google/OAC article I mentioned above they were using lossy JPEG 2000 compression.

Article Overview

Buckley begins his discussion of JPEG 2000 by describing it as an International Standard, specifically Part 1 of the standard, ISO/IEC 15444-1 and ITU-T T.800.  I’ll try to come back to this in a later article to discuss it.  What is of particular interest is that Buckley here mentions a problem with “modern signal-based approaches to image compression”—that it is difficult to improve the “compression performance and efficiency,” so new standards are adding “significant new features and capabilities.”  And that these enhancements are a major reason for the attraction to JPEG 2000.  Beyond this, JPEG 2000 is not just “a standard” but is actually a “suite of standards” that goes beyond the scope for which it was originally intended.

Buckley then takes a historical look at the origins of the JPEG standard in 1992.  This standard is a “lossy compression method” meaning that the “original, uncompressed image that existed before JPEG compression was applied cannot be exactly recovered from the JPEG-compressed data or codestream.” 3 Buckley further remarks that this compression is “small and either not noticeable or unobtrusive” a fact which leads to this type of compression being referred to as “visually lossless: the before and after images are mathematically but not visually distinguishable.”  However, despite this, Buckley notes that the compression is irreversible.  He then describes circumstances where “mathematically lossless behavior is desired;” circumstances in which the original image needs to be extracted from the compressed data “bit for bit.”  “This type of compression is lossless or reversible compression.” 3

Visually Lossless Compression: a form of compression applied to data or a codestream in which the image that existed before compression cannot be exactly recovered; this type of compression is called lossless as the before and after images are mathematically but not visually distinguishable.

Lossless Compression (Reversible): a form of compression in which there is no resulting loss of data to the codestream, in which the image recovered from the compressed data can be reproduced identically, bit-for-bit, with the original, uncompressed image as it existed before compression was applied.

When it came time to create a new standard (JPEG 2000) “seven areas were identified that would be addressed:

A call was put out for contributions to the standard and in December of 1997 a decision was made to use a “wavelet-based approach as the basis for the new standard.” 4

“The result was Part 1 of the JPEG 2000 standard, which defines the core decoder.  This part of the standard focuses on the codestream, which is the collection of bits that contain the compressed image data and the parameters needed for interpreting it…and an optional file format for encapsulating the codestream with associated metadata including colour encoding.” 5  JP2 is the optional file format.

Buckley then lists the 14 Parts of the JPEG 2000 standard.  He notes that Part 1 “was designed so that implementations of it would not require licensing fees or royalties.” And further that “JPEG 2000 is an open standard.”  6  As well, “JPEG 2000 is not natively supported by web browsers. In this respect it is like TIFF and PDF, neither or which is natively supported by browsers.  And like them, plug-ins are available.” 6

Features

“The ability to create JPEG 2000 compressed images that contain different quality levels has a consequence worth noting here. It makes it possible to obtain lower quality image versions from a higher quality compressed image without having to recompress or even decompress the image…it makes it possible to distinguish the needs of image archive from image delivery but have them both satisfied by a single file.” 10

File Formats

There are 4 file formats for JPEG 2000: JP2, JPX, MJ2, and JPM.  These formats associate with different parts of the standard (remember, there are 13 parts).

At a later date I will address the features of the file formats in a separate article.

The JP2 file is made up of boxes, very like a TCP/IP packet, and have the “same structure as atoms in QuickTime and MPEG-4 files.” 11  Which explains the use of QuickTime as one of the viewers for JP2 files through a web browser.  The header of the JP2 file carries color information, as discussed next:

“The JP2 file format was designed with digital photography in mind. This is reflected in the colour encodings defined in the standard. JP2 supports two colour encoding methods: the sRGB family of colour spaces and restricted ICC profiles. The sRGB encodings are sRGB, sYCC and a greyscale encoding defined in Part 1 that applies the sRGB non-linearity to luminance values. The use of ICC profiles is restricted to the Monochrome Input or Three-Component Matrix-Based Input profile14. These profiles are adequate to represent density values and RGB encodings such as ProPhotoRGB and Adobe RGB, which have a wider gamut than sRGB and are more suited to digital preservation.” 12

The JP2 file format also carries both XML and UUID data.  XML is extensible markup language and UUID stands for universally unique identifier.  These boxes allow the JP2 file to carry metadata.  The XML box carries XML-formatted metadata (which can include Dublin Core elements, as well as schema and namespace references), while UUID carries vendor defined or created metadata.  Again, I’ll get into the specifics of this in later articles.

The JPX file format has greater possibilities for Digital Preservation, as it: can contain multiple images within one file (as well as the information on how they should be ordered or reconstructed), has expanded color support features; allows metadata to link with regions of interest in an image (which raises interesting possibilities for TEI or other such metadata schemes).  This data travels with the file so that other applications can take advantage of it. 13-14

Performance

The process of compression works like this:

Other features are built into this process, including such things as the creation of precincts, which “allow for regions of the compressed image to be accessed and decoded independently of other regions” and a “quality layer” which adds (or takes away) data to improve (or reduce) the quality of the image—thus handling the resolution.

JPEG 2000 also “defines five progression orders or packet orderings. In resolution-major progression orders, the packets for all layers, positions and components of the lowest resolution come in the codestream before all those for the next higher resolution level…in layer-progression order… all the packets for one quality layer occur in the codestream before all the packets for the next quality layer. Progression order is another design choice in the use of JPEG 2000.” 19  On this point, when one adds the plug-in to Adobe CS3 or CS4 for use to export or save files in a JPEG 2000 format the progression order is one of the options presented during the save process: that is, one can choose whether to save the image with resolution-major progression or with layer-progression.

JPEG 2000 also has greater error-checking and error-handling capabilities than does the JPEG standard.  I will address these in a future article, too.

Access and Preservation

“JPEG 2000 is being used for geospatial imaging, medical imaging and by the cultural heritage and digital preservation communities. Many digital collection and library systems support JPEG 2000, and several institutions use it in their collections.” 20

The article here goes on to site the many projects that have used JPEG 2000 for various projects:

The growing use and adoption of JPEG 2000 in digital preservation and other domains enhances its standing as a sustainable format. The increased application means increased commitment on the part of both users and suppliers to the persistence of the format, the image data committed to it and the resources and tools for supporting it.

Conclusions

JPEG 2000 is an “open standard for the compression of still digital images” (26) that is not only meant to replace the original JPEG standard, but greatly enhances the JPEG standard with a diverse set of features, including region of interest functionality and onboard metadata packaging.  It is clear from many of the projects using JPEG 2000, that this standard has increased application in libraries and in digital preservation programs.  Over the next several weeks I will delve into this standard with an eye on exploring each portion of what this article by Buckley outlines, but also to shine some light on the more dense sections of the standard, such as wavelet compression, file formats and features, component transforms, quantization, as well as highlighting the other projects that are using JPEG 2000.

Posted by twh7 at 10:58 AM | Comments (0) | TrackBack

January 06, 2009

Knowledge Management and the Academy

Recently I was given an article on a novel system named OSU:pro developed by the Center for Knowledge Management (CKM) at Ohio State University. The article, Knowledge Management and the Academy, describes the process through which Ohio State went to create a system that integrates already existing information systems (such as Enterprise Resource Planning: ERP) to populate an online database that presents faculty, staff, and student portfolio information. Originally conceived for faculty, OSU:pro provides a centralized area for maintenance of Curriculum Vitae and biographical sketches, but more, it acts as a mechanism for locating expertise on campus.

Article discussed:

Timothy J. Cain, Joseph J. Branin, and W. Michael Sherman, “Knowledge Management and the Academy,” Educause Quarterly, vol. 31, no. 4 (2008), pp 26-33, http://connect.educause.edu/eq

The introduction to the article describes the current environment on campuses across the country, but one that, in fact, has probably always existed: an environment in which faculty and students and staff are generating “extraordinary quantities” of information; and the main problem: how to find it, access it, and use it in a timely fashion. The article puts it more succinctly:

The need to manage and assimilate a constantly growing pool of information, technology, and human expertise creates unique challenges for faculty, staff, administrators, and students in the modern university. To meet the needs of these diverse user communities on The Ohio State University (OSU) campus, the Center for Knowledge Management (CKM) was created in 2003…” 26

The goal of the center is to “leverage the strengths of people, processes, data, and technology to foster the creation, analysis, and dissemination of new knowledge.” 26 To do so, the CKM assembled a team of “technology professionals (programmers, media designers, and so forth)” along with “information stewards (librarians)” to “transform information services, streamline academic computing support, augment research stewardship, and accelerate the creation of knowledge-based solutions and innovations.” The ultimate goal, beyond the above, being to “transform the ways the expertise and knowledge of faculty and staff are documented and shared at OSU.”

To drive the point home, the article articulates the fact that “expertise of people is one of the greatest assets of a university” and describes the inherent problems of managing information across diverse departments at the university.

“OSU has over 150 academic departments organized into 19 colleges, requiring methodologies to document, organize, track, and access the efforts of more than 18,000 professional staff and faculty. Access to and retrieval of this information is typically tedious, inconsistent, and cumbersome, often relying on traditional paper methods.” 27

To describe the problem clearly the article considers the information gathering processes involved in “faculty appointment, promotion, and tenure (AP&T).” A process that usually consists of faculty members rifling through their papers that represent the work of one year (or whatever time frame) to figure out how many talks they’ve given, what committees they’ve worked on, what courses they’ve taught, and many other activities. This information then has to be compiled and organized and added to the faculty member’s CV to make it current for whatever purpose: “timely responses to requests for information, annual performance discussions, and professional advancement.”

The article cites several different surveys that were conducted by OSU to examine the data management activities of other universities—20 peer institutions and 18 peer institutions respectively—that demonstrated the same problems at each of them. The surveys “invited member schools to share how they track and manage faculty scholarly activity data (publications, sponsored projects, courses taught, outreach efforts, and so forth), with a particular emphasis on AP&T workflow.” 27 The study results demonstrated the need for “better ways to streamline the collection, reporting, and sharing of expertise data within and between universities.” 27

The challenge, however, as the article notes, is getting it to work at a university. The article notes that such systems are common place “in the corporate sector—where business viability, management practices, and competitiveness can drive adoption of new systems.” But the article advocates that such systems should be adopted at universities and would benefit the “knowledge management strategies” at the academy, as well as make “tracking knowledge and expertise” easier.

The article then goes into a modestly detailed description of the hardware and software used to develop the system, the most important feature, I think, being the “federated data model that leveraged preexisting intuitional data sets to provide more complete views of faculty activities.” To achieve this, CKM tapped “the offices of academic affairs, health sciences, information technology, research, and libraries,” and created OSU:pro and positioned it as a “single-point information resource and institutional strategy for supporting the data needs of AP&T workflow.” 27 Data was gathered from “enterprise sources…human resources, the registrar, libraries, research foundation databases” and presented as a federated data source “to provide users with prepopulated and contextualized views of their professional activities.” OSU:pro was then enhanced to allow end users to access the profile and enhance it with data not captured by “authoritative systems.” That is, faculty could login and add “language expertise, service to professional societies, honors and recognitions,” educational degrees, schools and colleges attended, etc. Thus a “straightforward way to add and edit profile information” was included. 28

Three views were provided: users, public, and administrative. Everything was made securely accessible and authentications to view levels were assigned, etc.

Probably some of the more amazing, from my perspective, additions included the sorts of uses to which the information could be put: for instance, the CVs of faculty, being stored in a database or XML file, now became discretely available for connections with external sources—so citations in a CV could be connected through OhioLINK to bibliographic database and to the fulltext articles in the Electronic Journal Center. Capabilities were built in to allow faculty to search 20 databases and directly pull in citations to their profiles. Reporting tools were created which allowed for the export of data in formats that support other profile software applications; data in .doc, .xls, .txt. A variety of views were created so that faculty (or deans, staff) could print out CVs, bios, or even abstracted bio sketches. The cumulation of data made it possible to not only see the number of degrees, honors, etc., held by all faculty at OSU; but utilizing Google Maps APIs it was possible to visualize the colleges and universities from which faculty had received their degrees. As well, it was possible to comprehensively see where faculty published articles and to perform citation analysis functions, including assessing journal impact factors.

By this point, the CKM team had grown to include “metadata librarians and project managers… programmers, interface designers, and customer liaisons.” 28 The article at this point also discusses the process of pushing OSU:pro out for the campus to use (http://pro.osu.edu) as well as the requirements of training and support. Use has grown exponentially and in the one year since its launch (this was as of 2007 December) 25% of faculty had active profiles (1,250—up from 0 in December 2006—or whatever their pre-launch number would have been, assuming some beta testers).

As well, OSU:pro actively encouraged the re-use of data in OSU:pro for other activities so that faculty only had to enter information in one place one time and--thus "maximizing system output" and reducing the number of "data management tasks" in which faculty had to engage. OSU:pro thus generates XML files so that information can be re-captured or re-used or re-purposed. For instance, "The OSU College of Optometry...was interested in highlighting faculty scholarly publications on their website...web developers simply consumed and redisplayed the XML data availabe from OSU:pro. Updates made to optometry faculty profiles in OSU:pro trigger a refresh sequence updating the college's website." 31

Of greater implication to me, philosophically, is the planned integration of OSU:pro with Knowledge Bank, Ohio State’s digital repository. As the article notes:

"OSU:pro follows in the footsteps of other major digital initiatives at OSU, including the introduction of a digital institutional repository, or Knowledge Bank, for the intellectual works of faculty, staff, and students. Future enhancements to OSU:pro include creating points of synergy with the Knowledge Bank, our enterprise learning management system, and our new student-information system" 33

Thus, there is a connection established directly between the institutionally archived resources created by the faculty with their public profile. "Piccoli and her colleagues stress the benefits that inventorying the university's knowledge assets can bring to revving up the knowledge creation delivery cycle" 32 a fact that the article stresses as being central to knowledge management in both theory and practice and emphasizes the fact that what is the central asset to a university is its people and the expertise that they generate. The OSU:pro system centralizes both aspects of this, providing a key mechanism for locating expertise and centralizing access to it. As I mentioned above, such as system at Case could be further enhanced through a connection with Digital Case, which would provide access to the knowledge assets themselves: white papers, grey literature, data sets, survey results, maps, and much more; all by-products of the research and knowledge-generation process. Further, as the article notes, the "trend toward interdisciplinary centers and programs in higher education speaks to the observation that new ideas and innovations often arise when investigators interact with colleagues at the nexus between traditional disciplines. Providing individuals with the tools and resources that allow them to readily identify and locate potential collaborators has become a mainstay in business…" 32

The article wraps up by presenting the benefits to various audiences, including administrators, faculty and staff, students, and the greater community. It acknowledges that corporations such as Hewlett Packard and Microsoft have used systems like this for over a decade under the title Expertise Location and Management (ELM) systems.

"While all too often the tendency is to focus on the tools and technologies...ELM systems...have recognized the importance of fostering a...culture of knowledge sharing. Many have convincingly argued that starting with the technology rather than the goals and outcomes dooms most initiatives. Success with knowledge management efforts requires a holistic approach to understanding the needs and culture of users--what motivates them, how they work, how they communicate, where they learn, how they interact with the technology, and what processes can be enhanced." 32

"The middleware strategy embodied in OSU:pro builds upon rather than replaces institutional investments in systems of record, leveraging preexisting information assets...to augment [ERP's] institutional value through the contextualization of faculty information and enhancement of AP&T" 33

Posted by twh7 at 12:16 PM | Comments (0) | TrackBack