HTRC UnCamp 2018 has ended

Welcome to the fourth iteration of the semi-annual HathiTrust Research Center (HTRC) UnCamp. This is where members of the HTRC community gather to explore the latest developments in using HTRC tools and services to anlayze the HathiTrust Digital Library corpus. Visit https://www.hathitrust.org/htrc_uncamp2018 for more information or see our online proceedings at https://osf.io/view/htrc_uncamp2018 hosted by OSF Meetings.

Back To Schedule
Thursday, January 25 • 4:00pm - 4:30pm
Lightning Talks

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Session Moderator: Eleanor Dickson

Topic Modeling of anti-Imperial and Emancipatory Pamphlets form the Baltic States, 1917-1922 (Stanislav Pejša)
I plan to present the preliminary findings of my research on the impact of the Wilsonian "New Diplomacy" on the emancipatory and anti-imperial propaganda between 1917 and 1923. I investigated pamphlets and propaganda publications mainly presented to the participants of the Paris Peace Conference in 1919, but also shared with general public. In my research I intend to use pamphlets and other literature that is available in the HATHI Trust Digital library. The proposed lightning talk will summarize my pilot study that explored feasibility of the text mining and topic modeling for this type of transnational historical research.
I applied the LDA topic modeling via MALLET on the pamphlets, available in the HT Digital Library, from the Baltic states, i.e. Estonia, Latvia, and Lithuania that were till 1915 part of the Russian Empire. The goal of Estonians, Latvians, and Lithuanians in Paris was to achieve international recognition of their independence to counter both the German advances and the Soviet intrusions. Even if they coordinated their efforts, each nation had different priorities and their historical, cultural, and linguistic context differed too.
In further study, I plan to investigate other regions that asserted independence or home rule therefore it is import to be able to investigate the narratives both in aggregate and to see the common topics, but also it necessary to be able to distinguish the topics that are culturally or ethnically specific within the collection.

Text Analysis in the Intro to DH Classroom (Jason Cohen)
In this 5-minute lightning talk, I aim to show two related elements of a text analysis project under development for classroom use. As a teacher-scholar at a liberal arts college, I have been fortunate to win a grant to generate a DH curriculum that will include textual analysis at several levels, including an Intro to DH as well as a course involving higher level scripting and NLP tools. This talk will lay out some parameters for student introductory work with HTRC materials and their processing, particularly as the HTRC materials relate to a parallel archive, and it will solicit future possible applications or pedagogical approaches using these starting points. 

One Hundred Years of American Science : Topic Modeling of Scientific Journals in HathiTrust (Shawn Martin)
What if we could model the majority of American scientific articles for an entire century? What might this data tell researchers about the development of science? Could it help understand professionalization and scholarly communication patterns in the future? This paper uses topic modeling and statistical analysis of keywords within early American scientific journals in order to better understand the professionalization of American science in the late nineteenth century. The American Journal of Science was the first regularly published scientific journal in the United States, starting in 1819 and the Journal of the American Chemical Society was specialized scientific journal starting in 1879. Using the full-text of these journals from HathiTrust and topic modeling their content for the first one hundred years (1819-1922), it becomes clear that the professionalization of science had much to do with external factors affecting science in the U.S. Topics shift within the American Journal of Science between 1871 to 1897, at exactly the same period when specialized scientific professional societies such as the American Chemical Society form. Additionally, within the American Chemical Society, it was not until the 1890s that issues of professional identity became prominent. Both of these trends reflect wider trends of professionalization within universities, other professions such as medicine, and government in the late nineteenth century. Understanding how science developed may help to understand how scientific dissemination patterns have responded to outside pressures and the past, and may continue to do so as digital technologies influence scholarly communication.

Collections as Data on labs.loc.gov (Abigail Potter, Jaime Mears, Meghan Ferriter and Katherine Zwaard)
The Library of Congress launched labs.loc.gov as a place for innovation and a pathway to enable more computational use of Library collections. The Labs staff will update the HTRC community on initiatives of interest, including an upcoming OCR challenge, our Innovator-in-Residence program, and digital scholarship workshops.

The Representation of National Canons of Prestige in the HathiTrust Collection (Lisa Teichmann) 
**CANCELLED due to illness** 
To what extend does the HathiTrust collection represent national canons of fiction and prestigious canons of world literature? How can it be expanded to be a valuable source in education beyond academia? Based on this question, this presentation aims at giving insights into the national canons of fiction in German and Turkish within the HATHI collection as well as reflections on the representation of other canons of literary prestige, such as bestseller lists and translations in the mentioned languages.
Two annotated datasets for each cultural context will be used:
1. national canon: high school reading lists
2. canon of prestigious works/world literature: lists of bestsellers, most translated authors and works
Statistical measures presented in this project include the percentage of works on these lists in the HT collection, author gender, period and genre. I hope to illustrate how the collection incorporates national canons of prestige and present a project under development to curate a core corpus of national canons of fiction for German and Turkish that could be a useful resource in literary education.
Further, this project addresses the broader question of HATHITrust in literary education and establishing the HT collection as a valuable resource of digital pedagogy in high school classrooms.


Jason Cohen

Berea College
avatar for Meghan Ferriter

Meghan Ferriter

Sr. Innovation Specialist, Library of Congress
Senior Innovation Specialist with Library of Congress Labs. Anthropologist and historian by training; thoughty by nature.Come find me to talk about supporting digital scholarship, crowdsourcing, access & use of digital collections, piloting & evaluating, collaboration and partnerships... Read More →

Shawn Martin

Indiana University Bloomington
avatar for Stanislav Pejša

Stanislav Pejša

data curator, Purdue University

Thursday January 25, 2018 4:00pm - 4:30pm PST
Moffitt Library, 5th floor