April 03, 2008
Live Blogging from KSL GIS Symposium--Closing panel discussion
Joe Koonce moderated a closing panel discussion with several of the earlier speakers: Andrew Curtis, Nina Lam, Daniel Janies, Uriel Kitron, Dave Wagner
Preservation of data sets
Open access issues
Tools are required to reconstruct the data
There are no standards, some emerging standards/best practices
Archiving becomes more complicated
Curtis: After Katrina, people were keeping control of data sets by means of manila folders "up" or "down" in a file drawer. His group created the "Katrina [data] warehouse" using a revision of LSU climate gathering software. As soon as FEMA stopped paying for it, it became a dead entity. The data sets were a complete mess. You need money to support preserved data: answering queries, organizing, etc. Much of the data has now been lost, because there was no system/structure in place to protect and preserve it.
Lam: One of her funders now requires the deposit of data and metadata about the data. Even data that cannot be deposited (e.g. because of privacy issues), must have metadata deposited.
Janies: A lot of projects turn into software development projects. He uses journals/supplemental data.
Kitron: With human data, for all practical purposes the data is not available for privacy issues. Librarians are now actively recruiting data sets.
Audience member 1: Even if you have large data sets and they are created by proprietary software, do you really have access to your data if you stop paying the license fee, or if the software vendor stops supporting features. He encourages the use of open source GIS software.
Curtis: It is an issue of both data collection and dissemination: how do we redistribute the data, especially to developing countries who may not be able to afford the licensing fees?
Audience member 2: Question about availability of air quality data.
Kitron: Gave several possibilities
Wagner: Difficulty of getting data in a standard format
Koonce: Are there emerging standards for data?
Lam: Yes, there are standards for some types of data; but there is a difference between the standard and the quality of the data. Also some cross-mapping is going on.
Audience member 3: We do not have the "ecosystem" for data that we do for physical artifacts (e.g. a truck, with a garage, with a mechanism for fueling it, and a mechanism to prove ownership)
Koonce: When will this problem change?
Panel members: When it is funded top down.
[This concludes the live blogging from the Case Western Reserve University Kelvin Smith Library 2008 GIS Symposium.]
Live Blogging from KSL GIS Symposium--Breakout Session: Andrew Curtis on Yellow Fever
Andrew Curtis, GIS Research Laboratory, Dept. of Geography, University of Southern California
"Using GIS to Reveal Spatial Patterns in the 1878 Yellow Fever Epidemic of New Orleans."
Challenges for health analyses:
Lack of data
Lack of dynamic data (most events vary in both space and time)
He described the GPS/video techniques used for capturing data post-Katrina in the Holy Cross neighborhood of New Orleans.
Showed examples of medical cartography.
Showed contemporary maps from 1897 yellow fever epidemic to show mosquito distribution and cases of yellow fever in New Orleans.
Why study epidemics from the past using GIS?
1. Devlop our understanding of the event itself
2. Use the data to improve our methods of analysis/visualization
3. Look for insights into the spread of the disease.
The disease entered New Orleans almost annually. N.O. was a major trading hub. Survival of infection gave immunity. The 1878 was the most georgraphical devastating. Led to a better quarantine system.
discussion of development of GIS maps useful for the study of the epidemic.
[There was a simultaneous breakout session by Daniel Janies, PhD, Dept. of Biomedical Informatics, Ohio State Univeresity, "Genomic and Georgraphic Analysis of the Evolution and Spread of Infectious Disease"]
Live Blogging from KSL GIS Symposium--Breakout session: Dave Wagner on Anthrax, Plague and Tularemia
Dave Wagner, Center for Microbial Genetics and Genomics, Northern Arizona University
"Using GIS to Understand the Ecology,Dispersal and Evolutionary History of Diseases: Examples using Anthrax, Plague and Tularemia"
Resource focus: potential bioterrorism agents (Anthrax, Plague, Tularemia. i.e., Category A Select Agents)
Dogma is that anthrax was introduced to North America and South America by colonists from Europe. Mr. Wagner's group mapped the genomics, indicating that the anthrax more likely came with humans coming across the land bridge from Asia to Alaska.
Further discussions of spread of plague through prairie-dog colonies in the American West.
[There was a simultaneous breakout session by Nina Lam, Department of Environmental Studies, Louisiana State University, "Reducing Uncertainties in Health Risk Assessment through GIS and Spatial Analysis."]
Live Blogging from KSL GIS Symposium--Breakout session--Peter Tuckel on the Influenza Pandemic of 1918
Peter Tuckel, from the Dept. of Sociology at Hunter College, CUNY.
"The Diffusion of the Influenza Pandemic of 1918 in Hartford, Connecticut"
Hartford was hard hit by the pandemic and had a varied population. (And it was Prof. Tuckel's birthplace.)
Prof. Tuckel explained the methodology and outcomes of the study.
Analysis of death certificates (place of death--home or hospital; age, sex, race, marital status; national origin; undertaker, embalmed or not) of everyone who perished from he disease between 9/1/1918 through 12/31/1918.
A digital street level map of Hartford during the period of the pandemic. He used a 1990 digital street map that was altered to reflect the situation of 1918, not only putting in streets but putting in address ranges. (The address ranges came from a 1918 city directory.) The addresses were significant in order to geocode the places where people died. There was an analysis of the housing stock (single-family dwellings as opposed to multi-unit buildings) He created "sub-ward" profiles by ethnic group. Southern/Eastern European ethnic groups had higher death rates; native-born had a higher death rate if they lived in an immigrant neighborhood.
For each victim they assigned a numeric code relative to the date of their death. (Sept 1 = 1; Dec. 31 = 122). By means of assigning these values he was able to map the progress of the pandemic. Death rate was highest where it struck earliest, but also had the shortest duration of time, and manifested itself in immigrant neighborhoods. "It decimated everyone in a short amount of time in those neighborhoods."
Instead of viewing the epidemic as a solitary event, one can better understand it as a set of somewhat discrete events or "mini-epidemics." Native-born or "older immigrants" were less susceptible. It ran its course much more rapidly in congested poor areas than in more sparsely populated, affluent areas.
[There was a simultaneous session presented by Shubhayu Saha, Dept. of Forestry and Environmental Resources, North Carolina State University: "Minerals, Forest and Health: Does Resource Extraction undermine Human Development?"]
Live Blogging from KSL GIS Symposium--Lunch break and poster sessions
We're now on lunch break until 1:15, with poster sessions spread around the 2nd floor of KSL. Although lunch is limited to symposium registrants, the posters are available for all to view.
Back for the afternoon, which consists of several break-out sessions.
Technorati Tags: GIS
Live Blogging from KSL GIS Symposium--Plenary Session #2 Uriel Kitron
Uriel Kitron, Dept. of Environmental Studies, Emory University
"West Nile Virus in Chicago: Considering the past, Understanding the present, Predicting the Future"
Prof. Kitron pointed out the importance of libraries as centers for providing geospatial data.
He spoke of the role of GIS, remote sensing and spatial analysis in VBD research. Scale of considerable importance. Spatial (village/town, continental) and temporal scales (seasons, years, decades) and the resolution of the scales. Can be considered simultaneously as well as consecutively.
West Nile Virus appeared in NYC during 1999 (from the Old World)
in 2002 it appeared in Chicago and surroundings. The virus has moved very quickly across the United States.
Prof. Kitron explained in detail factors related to the spread of West Nile Virus in the Chicago area and relationships to earlier infections of other diseases in some of the same geographical areas. One factor seems to relate to many undocumented storm drains filled with water, organic waste, making an excellent breeding place for the mosquito larvae that spread WNV. If the drains are flushed regularly with frequent rains the larvae are likely to be washed away and there is less problem.
Prof. Kitron's future research is to investigate the fundamental ecological proeceses that drive the fine-scale variations in WNV transmission; focus on fine-scale spatial relationships for transmissions.
Live Blogging from KSL GIS Symposium--Plenary Session #1 Charles H. King
Joseph Koonce, CWRU biology professor, introduced Charles H. King, MD, from the Center for Global Health and Diseases and Dept. of Epidemiology & Biostatistics, at Case Western Reserve University.
"Microsope to Macroscope - Using GIS to Understand Environmental Complexity in Dease Causation."
* Background on the philosophy of medical science
* Resistance to GIS
* Complexity and theory of environmental analysis
* GIS and the new practice of "eco-social epidemiology."
Since the 1700s the microscope has been the rationalist/positivist tool of choice; reductionistic; germ theory was a major breakthrough; now molecular medicine.
What is wrong with environmental studies in medicine: It's messy; It's too complicated; it's difficulty to isolate cause and effect; it tells us things we don't want to hear. Complexity reigns in our political world. Genetics, exposure and environment all relate to infection of disease.
Radomized control trials do not reflect the real world: "Why don't patients get getter on 'proven' regimens?'" There is "hidden stratification" in samples that will end up with unpredictable results.
Complex systems organize themselves into predictable but chaotic-appearing patterns.
What are our health research goals? Explanatory; Predictive (past performance does not necessarily predict future performance)
He discussed his own research about transmission of a parasite spread through water and snails in Africa. He described his use of GIS for analysis/data mining. Use of remote sensing and satellite imaging for mapping, creation of spatial data. Data must be confirmed on the ground with GPS data. Data is correlative, not causative.
Other dimensions such as poverty and socioeconomic factors play a role. We cannot ignore the context.
Why do we do this research? We want to be able to use all of these factors to analyze how they combine to foster disease.
Opportunities for GIS:
* New impetus for ecological research
* Comprehensive multi-scale picture of local/regional/global epidemiology
* Consideration of temporal changes
* Integration of molecular data with environmental.
Spatial data, concerns and limitations:
* massive amounts of data,
* but paucity of of accurate epidemiological data
* lack of data is readily masked in maps
* meaning of area boundaries is importans--show what you don't know.
* There are limites to the use of spatial auto correlation and interpolation
Q. How do you obtain data, and how much data is available?
A. "It's all over the place" Rainfall data for the city is discarded every day; some weather data is retained, but it may not be what is needed. It may be necessary to set up your own sensors, such as what Prof. King did in Africa.
Q. Comment: a focus on complex ecology is important. Multidisciplinary research (ecology + disease) is a problem for NSF and NIH--there is one joint panel to handle such applications.
Q. Can mapping move to prediction?
A. Some generalities can be made for some cases, but in most instances there is not enough data to make predictions (e.g. West Nile virus).
Live Blogging from KSL GIS Symposium--Clifford Lynch keynote address
Cliff Lynch, Executive Director of the Coalition for Networked Information, spoke about the notion of cyberinfrastructure as related to scientific inquiry:
high performance computing; sensors connected to the Network; very large data sets; virtual organizations. These concepts are known collectively as e-science, especially in Europe. In the U.S. the same concepts are known (in Lynch's view, somewhat perversely) as cyberinfrastructure. The NSF is the guiding body for these concepts in its Office of Cyberinfrastructure, headed by Dan Adkins.
He discussed simulation as a fundamental tool for science, which some argue is a topic that should be taught broadly to undergraduate students. Examples: disaster planning based on certain characteristics (time of day, spring break, on a bridge); simulating early agrarian societies;
Sensors: use of very tiny sensors--"smart dust"--that can be "spread around" to gather data. Ecologists and environmentalists using "dumb sensors" that can sense only a few kinds of phenomena, but then overlaid with a system of "mobile sensors" that can move to a place that indicates the need for a more sophisticated gathering of data. Social elements: closed circuit TVs; monitoring highways; cell phones that know where you are and are now starting to have other kinds of sensors--with certain kinds of sensors built in, it would be easy to build a ubiquitous national sensing network. Much social, commercial and societal activity has been moved to the Internet that can now be monitored and tracked. It creates an enormous social sensor network that we have not had before. He described his impression that the major software firms (Google, Yahoo, Microsoft) are very protective of the privacy of their users for good business reasons; if they let the data be "non-anonymized" they would not be able to gather it.
Data: It is a fundamental cornerstone of e-science. Not just preservation, but data curation. Notion that you are not just keeping data for altruistic purposes, but to be able to do new scholarship and repurpose. Not just for the sake of creating archives, but for the possibility of creating downstream new knowledge. Examples: meta-analysis across separate data sources, especially using diverse data sources removed from their original purposes. Preservation of data costs money, and we don't necessarily want to preserve everything. There is still a strong bias in sciences to preserve as little data as possible, creating a "nightmare scenario" which requires future funding to preserve old data. We are just beginning to have a language to discuss preservation of data: "data curation." "Data scientists" is a new breed of person starting coming out of schools of information science. Most projects will not be able to support large-scale data curation staff. There are some scientific areas in which it seems an unsolvable long-term problem (e.g. high energy physics). Data also needs to be collected in some sort of context. Once it is packaged, some entity needs to take responsibility for managing the data in the long term. It is part of fundamental scientific results:
*disciplinary repository (e.g. molecular biology) with norms for collection and sharing; some agencies are now demanding pre-publication data sharing; who pays for the repository
*journal publishers: "give us the whole package": article, data, computer programs. Sometimes this is in reaction to academic fraud cases. The journals are quite vague about who has the long term responsibility for these "supplemental materials." How much supplemental data can you give them?
*universities themselves who host the research, especially through the university library. Serious financial issues for the university/library who undertakes these efforts. It is a big expansion of role for the library. Also need to deal with area of duplication of effort by spreading areas of expertise for academic disciplines among fewer institutions.
Lynch pointed out that his use of "e-science" can more correctly be termed as "e-research" since the same techniques of for research can be used in the humanities and social sciences. Humanities are beginning to generate large data sets that will also need to be preserved and curated.
QUESTIONS FROM THE AUDIENCE:
Q. Product liability for information? e.g. faulty sensors and data that causes catastrophe. Pharmaceutical trials in which data may have been suppressed.
A. Lynch sees that this is a serious problem that will get worse, especially from corporate lawyers who want to have data destroyed as soon as possible, because old data is "pure liability". What constitutes the material that is used for peer review? (Article? Data? Computer programs?)
Q. Comment on data storage.
A. For most data, raw costs of storing are not very significant. Human-produced data (writing, speaking, video) is now "not that big a deal." Getting rid of data will be done for other social purposes, not for costs. Historians now do not have enough hours to review the entire human record, so data mining becomes essential.
Q. Will there be improvement of metadata?
A. Deposited data should be streamlined as much as possible, and manage the metadata in other ways.
Live Blogging from KSL GIS Symposium
I am reporting live today from the Kelvin Smith Library biennial symposium GIS Technology: Sustaining the Future, Understanding the Past. The symposium, funded by an anonymous donor is taking place today on the second floor of KSL. The topic this year relates to the use of GIS with the spread of disease, pandemic, and the effects of environmental change on the disease.
Lynn Singer, Deputy Provost of Case Western Reserve University, welcomed the 100 attendees, pointing out the ongoing nature of the university's pandemic planning efforts.
Joanne Eustis, University Librarian, thanked the planning committee for the symposium and introduced the keynote speaker, Clifford Lynch, Executive Director of the Coalition for Networked Information. Summary of his talk to come.