Welcome to the fourth iteration of the semi-annual HathiTrust Research Center (HTRC) UnCamp. This is where members of the HTRC community gather to explore the latest developments in using HTRC tools and services to anlayze the HathiTrust Digital Library corpus. Visit https://www.hathitrust.org/htrc_uncamp2018 for more information or see our online proceedings at https://osf.io/view/htrc_uncamp2018 hosted by OSF Meetings.

Friday, January 26 • 9:30am - 10:45am
Use Cases

Session Moderator: Kathryn Stine

Subject headings and beyond: Mapping the HathiTrust Digital Library content for wider use (Trevor Edelblute, Angela Zoss, Inna Kouper)
With over 15 million volumes of digitized materials from many academic libraries, HathiTrust Digital Library (HTDL) provides opportunities for various types of research. Thus, projects that use HTDL examine works of fiction, build predictive algorithms and genre classifications, and create cultural models of text. Despite a number of creative uses of HTDL already in place, enabling wider computational use of this digital library is still a challenge. Such a challenge can be addressed through training, outreach, and tool creation. We propose a complementary approach to the outreach efforts that relies on scientometric perspective and argue that more sophisticated representations of HTDL contents that go beyond simple summaries of its metadata can invite more and varying types of research. If more researchers know about what is available, for example, from the social sciences domains, they may be more willing to incorporate HTDL in their teaching and research. To test this, a range of HTDL contents summaries and visualizations is needed. In this panel presentation we will describe our approach to using subject headings and other metadata in mapping topical content and evolution of books in three disciplines: sociology, anthropology, and psychology. We will describe our workflow for improving metadata quality and supplementing missing metadata, discuss our approach to record deduplication, and, finally, provide visualizations using metadata and title analysis. In conclusion, we will invite the audience to discuss the challenges of mapping the boundaries and comprehensiveness of HTDL in particular domains.

Digital Text Analysis for Quantifying Descriptivity of Writing (Sayan Bhattacharyya, Alex Anderson and José Eduardo González)
Our project is to quantify the notion of descriptive-ness, or descriptivity in writing. Digital text analysis using the resources of the HathiTrust Research Center offers an opportunity to operationalize the anecdotal notion of descriptivity by developing quantified metrics for descriptivity. Parameters such as the relative number and other features of adjective-noun co-occurrence, computed using part-of-speech-tagged tokens made available by the HathiTrust Research Center, serving as a proxy for the “descriptivity” of a text, can provide an initial estimation as to which volumes constitute the most and least “descriptive” volumes and/or pages within those volumes in relation to those parameters. By identifying volumes (and pages within volumes) from the existing, publicly distributed HathiTrust Research Center part-of-speech-tagged tokens, we will create an initial estimation as to which volumes constitute the most and least “descriptive” volumes (as well as the most and least descriptive pages within those volumes) in relation to those parameters.
Although these metrics will have many and varied uses, we, as literary scholars, are interested in them because of our interest in a conjecture made by several critics: that writing has, since World War II, taken an overall turn away from "tell" towards "show" — which arguably means that writers are becoming increasingly more interested in description. While this is considered primarily an Anglo-American phenomenon, there is reason to believe that the trend has extended into non-Anglophone writing, too: several critics have argued that, globally, writing is becoming increasingly more homogeneous and self-similar, a hypothesis our metrics can assess. 

Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library (Jaimie Murdock and Colin Allen)
We present the results of a NEH Digging into Data Challenge Grant, which partnered with the HTRC from 2012-2014. We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. Through a novel approach of “drill-down” topic modeling—simultaneously reducing both the size of the corpus and the unit of analysis—we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of “close reading” needed to generate original interpretations that is the heart of scholarly work in the humanities. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted. This work was recently published in PLoSone: https://doi.org/10.1371/journal.pone.0184188

The University of California ClioMetric History Project (Zach Bleemer)
Studies of higher education in the United States have long been limited by historical data availability and minimal centralized data collection. The University of California ClioMetric History Project leverages HathiTrust collections of university registers, course catalogs, professional directories--along with student transcripts and other university-held records--to visualize and analyze universities' contributions to California's 20th century growth, health, economic mobility, and gender/ethnic equality.


Colin Allen

University of Pittsburgh

Alex Anderson

University of Pennsylvania

Sayan Bhattacharyya

Price Lab for Digital Humanities, University of Pennsylvania

Zach Bleemer

UC Berkeley

Trevor Edelblute

Indiana University Bloomington

José Eduardo González

University of Nebraska-Lincoln
Inna Kouper

Inna Kouper

Indiana University
Jaimie Murdock

Jaimie Murdock

Indiana University Bloomington
Jaimie Murdock is a joint PhD student in Cognitive Science and Informatics. He studies the construction of knowledge representations and the dynamics of expertise. While majoring in two scientific disciplines, most of Jaimie's research occurs in the digital humanities, where he u... Read More →

Angela Zoss

Duke University

Friday January 26, 2018 9:30am - 10:45am
Moffitt Library, 5th floor

