« Text encoding in the era of mass digitization, TEI conference 2009, Ann Arbor, MI | Main | Memento: Time Travel for the Web : OCLC Research Distinguished Seminar Series Presentation »
November 14, 2009
TEI conference, day 2
(notes from Day 2)
Computational Work with Very Large Text Collections: Google Books, HathiTrust, the Open Content Alliance, and the Future of TEI [slides] (Gallery, Hatcher Graduate Library North) Speaker: John M. Unsworth
Unsworth spoke on integrating research tools/other databases into a single interface, offering faceted browsing. Also differentiated between high level research and "non-consumptive research" (ex. image analysis, textual analysis, citation extraction, indexing)
Unsworth poses the question: does there exist a marriage of convenience between computer science and the humanties? (a doctored image of christopher columbus and Pocahontas- have the two worlds collided?)
Micropapers - (5 min mini presentations)
DeReKo goes P5: Customizing TEI P5 for the Mannheim German Reference Corpus- Andreas Witt- Database of written contemporary language, ca.
3 3/4 billion words (+300 million words added every year)
XCES was used initially; internal usage only in the beginning
using P5 now
The Chicago Foreign Language Press Survey in TEI- Douglas Knox
Transribed foreign language press; a number of languages surveyed- basic encoding. Using XSLT to expand on basic encoding. Project also points to a taxonomy created specifically by terms in survey (serves as a way to correct some of the issues, such as misspellings, authority name records, etc)
Evolving TEI standards and the burdens of digital project maintenance
-Andrew Jewell
Beginning to think about the transition from P4 to P5 with the Willa Cather archive (which is almost completely in TEI. When/how to migrate- how to make the decision for conversion, particularly with other migrations likely to occur in the future? Jewell states that in the digital realm, 'stability is an illusion'- something will always be changing down the road. How to make these long-term decisions about content?
The role of TEI in large text-analysis projects
-Brian Pytlik Zillig
Uses Abbot software for the project. Refers to the 'gated communities' of larger digital libraries (halitrust, etc.)
TEI documentation and the need to be responsive and accessible to a varied user community -Brett Barney
The difficultly of figuring out some of the more complex tags (restore)
How can the researcher turned digital project decifer the P5 guidelines- where does the computer science take over? is this tangible for researchers/humantists to use as well
making TEI more tangible and legible for consumption by a larger audience?
TEI in the classroom, with emphasis on the need for mark up that engages student interpretive interests
-Amanda Gailey
From the perspective of an english prof making applications in a classroom setting-
How to merge 2 worlds- mass digitization w/ literature
How to create meaningful projects
How to make this more approachable to non-techies?
Are there more learning environments and workshops to address this?
Also posed some larger issues on the TEI subject from discussion:
Can we study how TEI projects are used/researched (to what level of encoding, for example- basic?)
How to logistically keep up with levels/coding - how much time to spend on conversion, every level- every time you upgrade? or not?
where is your text going? do you want it to be conformant with other projects/digital repositories
How to sustain small TEI projects- where should they go? who will store these? curate? track?
Posted by vad17 at November 14, 2009 06:40 PM
Trackback Pings
TrackBack URL for this entry:
http://blog.case.edu/digitallibrary/mt-tb.cgi/21500