Loading…
HTRC UnCamp 2018 has ended

Welcome to the fourth iteration of the semi-annual HathiTrust Research Center (HTRC) UnCamp. This is where members of the HTRC community gather to explore the latest developments in using HTRC tools and services to anlayze the HathiTrust Digital Library corpus. Visit https://www.hathitrust.org/htrc_uncamp2018 for more information or see our online proceedings at https://osf.io/view/htrc_uncamp2018 hosted by OSF Meetings.

View analytic

Log in to bookmark your favorites and sync them to your phone or calendar.

Thursday, January 25
 

8:00am

Registration Table Open
Thursday January 25, 2018 8:00am - 1:30pm
Moffitt Library, 4th floor

9:00am

Mastering Metadata
At its best, bibliographic metadata can help researchers drill down into detail to find subsets of larger collections based on dates, topical coverage, authorship, and other facets, though there can be interpretive and analytic pitfalls in some of the assumptions that we bring to this metadata. This primer on the metadata used to describe HathiTrust resources will provide insights, caveats, and, ultimately, useful approaches to navigating and making sense of bibliographic metadata for researchers who build and/or compare collections for analysis. We’ll provide hands-on opportunities to slice and dice collections along bibliographic parameters and invite you to bring your use cases to workshop!

Speakers
TC

Tim Cole

University of Illinois at Urbana-Champaign
avatar for Barbara Cormack

Barbara Cormack

Metadata Analyst, California Digital Library
avatar for Valerie Glenn

Valerie Glenn

Federal Documents Analyst, HathiTrust
LR

Lisa Rowlison de Ortiz

Head, Catalog & Metadata Services, UC Berkeley Library
JS

Josh Steverman

HathiTrust - Michigan Library
avatar for Kathryn Stine

Kathryn Stine

California Digital Library


Thursday January 25, 2018 9:00am - 11:30am
D-Lab (356 Barrows)

9:00am

Text Analysis FUN!damentals: Methods, Approaches, Tools, and Techniques
This workshop provides an overview of computational text analysis methods and tools. No experience in this area is expected or required. The goal is to provide an orientation for those wishing to go further with text analysis and interpret results of these methods.

Speakers
CH

Chris Hench

University of California, Berkeley


Thursday January 25, 2018 9:00am - 11:30am
AIS (117 Dwinelle)

9:30am

HTRC Crash Course: What is it and what can I do with it?
A quick introduction to the HathiTrust Research Center, its tools and services, and how it can help you support text analysis research on your campus or at your institution. This hands-on workshop will introduce you to HTRC tools and services and to how the HTRC enables scholars to perform text analysis on the massive HathiTrust Digital Library. It will provide a foundation for UnCamp and offer insight from the Center's IMLS-funded "Digging Deeper, Reaching Further" curriculum on how to initiate scholarly support for text analysis via HTRC.

Speakers
avatar for Eleanor Dickson

Eleanor Dickson

University of Illinois; HTRC
HathiTrust Research Center, text analysis, #dlfteach, digital library pedagogy
avatar for Harriett Green

Harriett Green

Head of Scholarly Communication and Publishing, University of Illinois at Urbana-Champaign
I am head of Scholarly Communication and Publishing, Scholarly Communication and Publishing Librarian, and associate professor, University Library at the University of Illinois at Urbana-Champaign. My current research projects include working as projects and publications manager... Read More →


Thursday January 25, 2018 9:30am - 11:30am
Moffitt Library, 5th floor

10:00am

Working with Restricted Collections: Technologies and User and Library Needs
** With apologies, this pre-conference had to be cancelled due to illness **
** Please feel free to attend another pre-conference on the schedule. **


The goal of this session is to gather information about uses of restricted and sensitive collections in academic libraries and current technologies and discuss the possible use of HTRC Data Capsule as a technology that can help provide secure computational access to restricted collections. Participants are invited to share their experiences working with such collections either as users or as service providers and learn about and discuss current technologies for restricted access and Data Capsule. All participants interested in this topic and possible uses of Data Capsule are invited.

Speakers
avatar for Inna Kouper

Inna Kouper

Indiana University


Thursday January 25, 2018 10:00am - 11:30am
BIDS (190 Doe Library)

11:30am

Lunch
Lunches not provided. Please see the HTRC UnCamp site for a list of recommended restaurants in walking distance from campus.

Thursday January 25, 2018 11:30am - 1:00pm
Local restaurants See options at: https://www.hathitrust.org/htrc_uncamp2018_travel#dining

12:00pm

Research IT Brownbag: Restricted data types used in secure computing environments
This session is open to all, no UnCamp registration required. (Bring your own lunch)


The UCB library is participating in a research project that investigates how to deploy a secure environment for working with library collections. The secure environment is an extension of the HathiTrust Research Center (HTRC) service called Data Capsule. HTRC facilitates computational analysis of content in the HathiTrust Digital Library (a consortium that preserves millions of digitized books collected from libraries around the world) with emphasis on developing methods of non-consumptive research. Such research enables "distant reading" and computational analysis of text corpora while respecting boundaries of intellectual property protections.


In this session led by Erik Mitchell and Inna Kouper from Indiana University (project lead) we invite attendees to explore together the various types of restricted data in UCB library collections and how they could be used in a secure computing environment.  Broadly, we'd like to discuss the following questions:


  • What library collections would benefit from using the Data Capsule?
  • What kind of restrictions do such collections have? What computational uses do we envision for those collections?
  • What are the technical, policy, and logistical challenges of providing access to restricted collections?


Attendees are encouraged to review the following web pages for additional context and definition:


About the HathiTrust Digital Library https://www.hathitrust.org/digital_library
About the HathiTrust Research Center (HTRC) https://www.hathitrust.org/htrc
About the HTRC Data Capsule https://wiki.htrc.illinois.edu/display/COM/HTRC+Data+Capsule
The HTRC's Non-Consumptive Use Research Policy https://www.hathitrust.org/htrc_ncup

Speakers
avatar for Inna Kouper

Inna Kouper

Indiana University
EM

Erik Mitchell

UC Berkeley


Thursday January 25, 2018 12:00pm - 1:00pm
BIDS (190 Doe Library)

1:00pm

Keynote: Elizabeth Lorang and Leen-Kiat Soh-Increasing Our Vision for 21st-Century Digital Libraries (preceded by conference opening remarks)
Title: Increasing Our Vision for 21st-Century Digital Libraries


Through the frames of digital library and collections processing histories, Lorang and Soh will consider how digital libraries enable researchers to find materials within their collections, look at the intersection of research and development with application and practice in digital libraries, and discuss the roles of digital libraries in opening up or closing off the types of questions people can ask—as well as those they might imagine. Within this context, they will introduce the work of Image Analysis for Archival Discovery (Aida), its research questions, methods, and current work, and they will look to the future to propose some expanded visions for digital libraries development.





Speakers
avatar for Elizabeth Lorang

Elizabeth Lorang

Humanities Librarian, University of Nebraska-Lincoln
avatar for Leen-Kiat Soh

Leen-Kiat Soh

Professor, University of Nebraska


Thursday January 25, 2018 1:00pm - 2:15pm
Moffitt Library, 5th floor

2:15pm

Coffee Break
Thursday January 25, 2018 2:15pm - 2:30pm
Moffitt Library, 5th floor

2:30pm

HathiTrust Research Center Updates Plenary
This update on the HathiTrust Research Center will be presented by HTRC Co-Directors:
John A. Walsh-Indiana University
J. Stephen Downie-University of Illinois

Speakers
avatar for J. Stephen Downie

J. Stephen Downie

Co-Director HTRC, University of Illinois at Urbana-Champaign
avatar for John Walsh

John Walsh

Co-Director HTRC, Indiana University


Thursday January 25, 2018 2:30pm - 3:00pm
Moffitt Library, 5th floor

3:00pm

HTRC Advanced Collaborative Support (ACS) Awardee Project Panel
Session Moderator: Eleanor Dickson




The impact of OCR quality on natural language processing (David Bamman)
The rise of large-scale digitized book collections such as the HathiTrust is enabling a fundamentally new kind of text analysis that exploits the scale of collections to ask questions not possible with smaller corpora.  One prerequisite for this work is high-quality optical character recognition (OCR), in which image scans of individual pages in a book are converted to text.  While OCR errors can complicate even simple analyses of word frequency (as seen in Google Ngram data), it poses an even greater challenge for structured representations of language in NLP, such as part-of-speech tagging or syntactic parsing.  In this short talk, I'll describe research into quantifying the impact of OCR quality in the HathiTrust on the quality of downstream NLP, and sketch out approaches for automatically assessing OCR accuracy without recourse to a gold-standard reference transcriptions.




HathiTrust ACS Report: A Writer’s Workshop Workset with the Program Era Project (Loren Glass, Nick Kelly, Nicole White)
How can computer-assisted text analysis help us document and explore the history of Creative Writing program in the University? This presentation looks at how the Program Era Project, a DH initiative at the University of Iowa, is working to build an online, public-facing database of information on the Iowa Writers’ Workshop, its writers, and their work. We will focus on the Project’s recent ACS collaboration with HathiTrust and how we plan to incorporate text analysis data from an HT-provided corpus of Workshop-affiliated writings into our larger database of institutional and biographical information on Workshop writers. In addition to providing background on the Project, the text analysis tools it has developed, and the necessities that led to the HathiTrust collaboration, we will discuss our progress (and impediments) with the collaboration and the next steps we plan to take in working with our HathiTrust data capsule.




Evaluating the History of the Chicago School: Why Supervised Algorithms? (Dan Baciu)
The history of the Chicago School has become digital, but what does this fact mean for data accessibility, research, and future dissemination? At the HathiTrust, the term is found in over 100,000 books and periodicals covering the last two centuries. Can digital tools analyze this massive history of publication? The values of the Chicago Schools have been disseminated, translated and transformed. This present work attempts a first computer aided, systematic and critical evaluation. The research has been supported by several institutions including the HathiTrust Research Center (HTRC), the Fulbright Program and the Swiss National Science Foundation (SNF). Digital technology is implemented on three levels: analysis of the historic text data, interpretation of the results, and scholarly exchange.
Our collaboration with the HathiTrust Research Center and the Cognitive Computation Group at University of Illinois provided sufficient data to evaluate the complete history of publication of the Chicago Schools. We succeeded in implementing a knowledge-based approach on a massive, previously unattempted scale and found that it offered significant advantages over using the unstructured data alone. From our large Chicago School corpus, we also built three additional datasets together with a framework for non-consumptive research which allows us to filter, classify, and cross-validate the results. Among the 2016 ACS projects, we were the only one to rely primarily on a supervised, knowledge-based approach. Most other projects worked with unsupervised algorithms. This UnCamp presentation will focus on our choice of supervised algorithms and their methodological role within the framework for non-consumptive research.




Using contemporary technology to analyze historical social movements (Laura Nelson)
New methods in computational text and network analysis have opened up exciting possibilities to better understand the complex historical dynamics within large, diverse, and recurrent social movements such as women's movements, labor movements, and civil rights movements. The methods are readily available, but they require rich, digitized data that can capture multifaceted and temporal intra-movement dynamics. While libraries have made great progress providing digitized "collections as data" to researchers, documents produced by social movement actors are not systematically included in standard categorical collections. In this talk I discuss my experience working with HathiTrust, using metadata as well as vector space models, to identify and collect digitized texts produced by a diverse array of individuals and organizations involved in the women's movement between 1860 and 1975. I discuss the challenges involved in collecting such a corpus, as well as the new types of historical and cultural analyses these data enable.




Scalable Detection of Text Reuse (Doug Duhaime)
In 2016 Yale University's Digital Humanities Lab began work on a full-stack web application that allows users to detect and visualize text reuse in large collections. A prototype of the app is available here: http://52.89.1.166/. (Try searching for Thomas Gray).
This project builds off of research began during an Advanced Collaborative Research Grant with the HTRC, and implements a number of features that distinguish it from recently released packages for text reuse. During this talk, I would give a brief overview of the data processing pipeline, discuss the front-end UI options we've prioritized and sketched to-date, then open the floor for suggestions of other features that could help users study text reuse in large text collections.

Speakers
DB

David Bamman

UC Berkeley
avatar for Doug Duhaime

Doug Duhaime

Yale University
Data analysis and visualization!
LG

Loren Glass

University of Iowa
Loren Glass is Associate Professor of English at the University of Iowa. He writes on celebrity, obscenity, modernism, and the avant-garde. He is currently completing a history of Grove Press which will appear in the Post*45 Series with Stanford University Press. Abstract:"Killer... Read More →
NK

Nick Kelly

University of Iowa
LN

Laura Nelson

Northeastern University
NW

Nicole White

University of Iowa


Thursday January 25, 2018 3:00pm - 4:00pm
Moffitt Library, 5th floor

4:00pm

Lightning Talks
Session Moderator: Eleanor Dickson


Topic Modeling of anti-Imperial and Emancipatory Pamphlets form the Baltic States, 1917-1922 (Stanislav Pejša)
I plan to present the preliminary findings of my research on the impact of the Wilsonian "New Diplomacy" on the emancipatory and anti-imperial propaganda between 1917 and 1923. I investigated pamphlets and propaganda publications mainly presented to the participants of the Paris Peace Conference in 1919, but also shared with general public. In my research I intend to use pamphlets and other literature that is available in the HATHI Trust Digital library. The proposed lightning talk will summarize my pilot study that explored feasibility of the text mining and topic modeling for this type of transnational historical research.
I applied the LDA topic modeling via MALLET on the pamphlets, available in the HT Digital Library, from the Baltic states, i.e. Estonia, Latvia, and Lithuania that were till 1915 part of the Russian Empire. The goal of Estonians, Latvians, and Lithuanians in Paris was to achieve international recognition of their independence to counter both the German advances and the Soviet intrusions. Even if they coordinated their efforts, each nation had different priorities and their historical, cultural, and linguistic context differed too.
In further study, I plan to investigate other regions that asserted independence or home rule therefore it is import to be able to investigate the narratives both in aggregate and to see the common topics, but also it necessary to be able to distinguish the topics that are culturally or ethnically specific within the collection.


Text Analysis in the Intro to DH Classroom (Jason Cohen)
In this 5-minute lightning talk, I aim to show two related elements of a text analysis project under development for classroom use. As a teacher-scholar at a liberal arts college, I have been fortunate to win a grant to generate a DH curriculum that will include textual analysis at several levels, including an Intro to DH as well as a course involving higher level scripting and NLP tools. This talk will lay out some parameters for student introductory work with HTRC materials and their processing, particularly as the HTRC materials relate to a parallel archive, and it will solicit future possible applications or pedagogical approaches using these starting points. 


One Hundred Years of American Science : Topic Modeling of Scientific Journals in HathiTrust (Shawn Martin)
What if we could model the majority of American scientific articles for an entire century? What might this data tell researchers about the development of science? Could it help understand professionalization and scholarly communication patterns in the future? This paper uses topic modeling and statistical analysis of keywords within early American scientific journals in order to better understand the professionalization of American science in the late nineteenth century. The American Journal of Science was the first regularly published scientific journal in the United States, starting in 1819 and the Journal of the American Chemical Society was specialized scientific journal starting in 1879. Using the full-text of these journals from HathiTrust and topic modeling their content for the first one hundred years (1819-1922), it becomes clear that the professionalization of science had much to do with external factors affecting science in the U.S. Topics shift within the American Journal of Science between 1871 to 1897, at exactly the same period when specialized scientific professional societies such as the American Chemical Society form. Additionally, within the American Chemical Society, it was not until the 1890s that issues of professional identity became prominent. Both of these trends reflect wider trends of professionalization within universities, other professions such as medicine, and government in the late nineteenth century. Understanding how science developed may help to understand how scientific dissemination patterns have responded to outside pressures and the past, and may continue to do so as digital technologies influence scholarly communication.


Collections as Data on labs.loc.gov (Abigail Potter, Jaime Mears, Meghan Ferriter and Katherine Zwaard)
The Library of Congress launched labs.loc.gov as a place for innovation and a pathway to enable more computational use of Library collections. The Labs staff will update the HTRC community on initiatives of interest, including an upcoming OCR challenge, our Innovator-in-Residence program, and digital scholarship workshops.


The Representation of National Canons of Prestige in the HathiTrust Collection (Lisa Teichmann) 
**CANCELLED due to illness** 
To what extend does the HathiTrust collection represent national canons of fiction and prestigious canons of world literature? How can it be expanded to be a valuable source in education beyond academia? Based on this question, this presentation aims at giving insights into the national canons of fiction in German and Turkish within the HATHI collection as well as reflections on the representation of other canons of literary prestige, such as bestseller lists and translations in the mentioned languages.
Two annotated datasets for each cultural context will be used:
1. national canon: high school reading lists
2. canon of prestigious works/world literature: lists of bestsellers, most translated authors and works
Statistical measures presented in this project include the percentage of works on these lists in the HT collection, author gender, period and genre. I hope to illustrate how the collection incorporates national canons of prestige and present a project under development to curate a core corpus of national canons of fiction for German and Turkish that could be a useful resource in literary education.
Further, this project addresses the broader question of HATHITrust in literary education and establishing the HT collection as a valuable resource of digital pedagogy in high school classrooms.

Speakers
JC

Jason Cohen

Berea College
avatar for Meghan Ferriter

Meghan Ferriter

Library of Congress
Supporting digital scholarship, Crowdsourcing, Access & Use of Digital Collections, Piloting & Evaluating, Partnerships, plus "What are you excited about in/for 2018?"
SM

Shawn Martin

Indiana University Bloomington
avatar for Stanislav Pejša

Stanislav Pejša

data curator, Purdue University


Thursday January 25, 2018 4:00pm - 4:30pm
Moffitt Library, 5th floor

4:30pm

5:00pm

Poster Sessions
Names of Trees and Dedrograms 
Niek Veldhuis and Erin Becker


Bringing NL Tools to Digital Media: Using the LAPPS Grid for HTRC Data Capsules 
James Pustejovsky, Marc Verhagen, Kyeongmin Rim, Yu Ma, Samantha Liyanage, Jaimie Murdock, Robert McDonald and Beth Plale


Literary Genre in Digital Humanities Research 
Brian Matzke


Reconstructing historical libraries: Jefferson, Darwin, and the HathiTrust 
Jaimie Murdock and Colin Allen


WEB OF SCIENCE DATABASE: KNOWLEDGE OF THE HOPI; A Prototype for the Future Research: Sampling and Patterns 
Arina Melkozernova


Including Phrases in Bags of Words 
Peter Organisciak


Reconciling Accessibility and Open Access Platforms 
Anne Ferguson and Anushah Hossain


Defining the Cooperesque Novel in Nineteenth-Century France 
Mark Wolff


Are We Speaking the Same Language? Analysis of Librarian and Library User Conversation
Alexander Justice


The Difference Between flood*, #flood, and Other Things You Learn From 250 million Tweets: Supporting Textual Analysis of Twitter Data at a Small Liberal Arts College
Elizabeth Rodrigues

Speakers
CA

Colin Allen

University of Pittsburgh
EB

Erin Becker

Data Carpentry
AF

Anne Ferguson

UC Berkeley Law
AH

Anushah hossain

UC Berkeley
avatar for Alexander Justice

Alexander Justice

Reference Librarian, Loyola Marymount University
I work with my university community, especially students, to find helpful information and to promote information literacy. As a library liaison, I work with the faculty of History, Modern Languages, and Art Therapy. Extracurricular interests include Scottish Gaelic and Hungarian... Read More →
SL

Samitha Liyanage

Indiana University
YM

Yu (Marie) Ma

HathiTrust Research Center
BM

Brian Matzke

University of Michigan
avatar for Robert McDonald

Robert McDonald

Associate Dean for Research and Technology Strategies, Indiana University
As the Associate Dean for Research and Technology Strategies, Robert H. McDonald works to provide library information system services and discovery services to the entire IU system and manages projects related to scholarly communications, new model publishing, and technologies th... Read More →
avatar for Arina Melkozernova

Arina Melkozernova

Arizona State University
avatar for Jaimie Murdock

Jaimie Murdock

Indiana University Bloomington
Jaimie Murdock is a joint PhD student in Cognitive Science and Informatics. He studies the construction of knowledge representations and the dynamics of expertise. While majoring in two scientific disciplines, most of Jaimie's research occurs in the digital humanities, where he u... Read More →
PO

Peter Organisciak

University of Denver
avatar for Beth Plale

Beth Plale

Indiana University Bloomington
Science Director, Pervasive Technology Institute | Director, Data To Insight Center | Professor, Informatics and Computing | Indiana University
avatar for James Pustejovsky

James Pustejovsky

Brandeis University
KR

Kyeongmin Rim

Brandeis University
avatar for Elizabeth Rodrigues

Elizabeth Rodrigues

Grinnell College
@letsshall
NV

Niek Veldhuis

UC Berkeley
MV

Marc Verhagen

Brandeis University
avatar for Mark Wolff

Mark Wolff

Hartwick College


Thursday January 25, 2018 5:00pm - 7:00pm
BIDS (190 Doe Library)

5:00pm

Reception
Chat with colleagues over appetizers, beer, wine and soft drinks.

Thursday January 25, 2018 5:00pm - 7:00pm
Morrison (101 Doe Library)
 
Friday, January 26
 

8:00am

Breakfast
Friday January 26, 2018 8:00am - 8:30am
Moffitt Library, 5th floor

8:00am

Registration Table Open
Friday January 26, 2018 8:00am - 10:00am
Moffitt Library, 4th floor

8:30am

Keynote: David Mimno: Consistency and Confidence in the Million-book library
Title: Consistency and Confidence in the Million-book library
The promise of digitized million-book libraries is that we can get reliable measurements of complicated historical and cultural processes. In this talk I'll present a general framework for many of the most popular analytics of large scale text, including topic models and word embeddings. Based on this intuition I will show both the promise and potential pitfalls of such analyses. Through several case studies I will present recommendations on how researchers should get the most consistent, confident results, and how we might collectively make Hathi Trust more reliable.

Speakers
avatar for David Mimno

David Mimno

Cornell University


Friday January 26, 2018 8:30am - 9:15am
Moffitt Library, 5th floor

9:15am

Break (travel time in between session venues)
Friday January 26, 2018 9:15am - 9:30am
TBA

9:30am

Curriculum and Instruction
Session Moderator: Robert H. McDonald


Disrupting the Silence of HathiTrust Text: Using Computational Text Analysis to Enhance Music History in the Classroom (Olivia Wikle)
Undergraduate music history instructors often supplement textbook material on eighteenth- and nineteenth-century music by providing students with access to scores and modern recordings of compositions. However, undergraduates are rarely exposed to reviews, periodicals, or aesthetic and theoretical literature written in reaction to historical performances. These writings range from describing the music itself to analyzing the styles of composers and performers, thereby situating the music within its historical and ideological context and providing crucial insight into music’s broader cultural significance. The HathiTrust Digital Library facilitates access to a wealth of eighteenth- and nineteenth-century texts describing musical performances and composers, many of which have not been used to their full potential by music history educators. I will demonstrate how methods of computational text analysis facilitated by the HathiTrust Research Center can be incorporated into the undergraduate classroom to enhance students’ understanding of historical perceptions of music. The instructional activity I propose complements traditional pedagogical methods of listening to music, studying scores, and reading secondary literature by using the HathiTrust Research Center to analyze and explore worksets of literature contemporary to eighteenth- and nineteenth-century music and composers on a larger scale. The text analysis process will serve to introduce students to digital scholarship methods, and the results will instill in students a deeper appreciation of how music was perceived in the cultural context of its conception. This pedagogical model also has the potential to be adapted for use in graduate courses by incorporating data capsules for more advanced analysis.


Digital Pedagogy and Contemplation in Higher Education: The Contemplative Technopedagogy Framework (Justin Shanks)
This talk will present ongoing research about digital pedagogy and introduce a new framework for designing, utilizing, and assessing the possible role(s) of digital technology in higher education. The Contemplative Technopedagogy Framework (CTF) requires an educator to simultaneously consider both the positive and negative aspects of a digital technology. CTF creates a teaching-learning environment that necessitates purposeful and engaged approaches to pedagogical practices involving digital technology. Non-contemplative technopedagogy leads to uncritical adoption or knee-jerk dismissal of digital technology. Whether adoptive or dismissive, non-contemplative pedagogical decisions have substantial consequences for both educators and learners. Therefore, higher education must concern itself with the ways in which contemplation can inform instructional decisions involving digital technology. Contemplation involves thinking carefully, deeply, and attentively about a topic. Integrating contemplation into pedagogy takes many forms and has diverse meanings. Contemplative pedagogy can emphasize the value of incorporating mindfulness exercises into coursework or can also be integrated into curriculum through activities that provoke reflection, compassion, commitment, non-judgement, and creativity among students. CTF focuses on the digital technology aspects of contemplative pedagogy and asks the educator to make purposeful decisions about when, which, to what extent, how, with whom, and for what purpose to use digital technology. Through a review of literature, the CTF was developed and includes Pedagogy Focused, Learned Focused, Technology Focused, Attention Focus, and Context Focused attributes. While it is important for contemporary educators to pay close attention to digital technologies, they must incorporate CTF attributes into pedagogical decision-making to enhance the teaching-learning environment.


Using Voyant with HTRC Volumes (Tassie Gniady & Robert McDonald)
By working in conjunction with HTRC staff and scholars at Indiana University, we have brought about a marriage of non-consumptive analysis that can be carried out by novice text miners. An Open Humanities project at IU brings together nine scholars in a “friendly cloud space for thinking about Kurt Vonnegut and why his writing matters today.” Together these scholars have been reading and posting their thoughts on Vonnegut’s novels at http://salo.iu.edu/. In keeping with Vonnegut’s far-reaching scientific imagination, his texts are prime fodder for computational analysis. However, we wanted these scholars to be able to perform analysis easily. To that end, we loaded Voyant into a data capsule along with all of his works that are contained in the HathiTrust. Then we performed some sample analyses on individual texts as well as the corpus. Finally, we will be teaching the members of Salo University this workflow for future inclusion in their blog posts. We are excited that the HTRC staff is working to include Voyant in every new data capsule that is spun up to lower the barrier to entry for new users, and we are equally excited that the director of Salo University wants to encourage scholars new to DH workflows to become familiar with the opportunities of afforded by non-consumptive research.

Speakers
avatar for Robert McDonald

Robert McDonald

Associate Dean for Research and Technology Strategies, Indiana University
As the Associate Dean for Research and Technology Strategies, Robert H. McDonald works to provide library information system services and discovery services to the entire IU system and manages projects related to scholarly communications, new model publishing, and technologies th... Read More →
avatar for Justin Shanks

Justin Shanks

Montana State University
Digital Scholarship Librarian and Interim Department Head of Digital Library Initiatives at Montana State University Library. Director of MSU's Data Infrastructure and Scholarly Communication (DISC) group (montana.edu/disc). Ready to defend a dissertation examining the historical... Read More →
avatar for Olivia Wikle

Olivia Wikle

Indiana University
Master of Library Science student at Indiana University interested in digital scholarship and pedagogy.


Friday January 26, 2018 9:30am - 10:45am
AIS (117 Dwinelle)

9:30am

Non-Consumptive Research and Copyright Protected Text
Session Moderator: Cody Hennesy
U.S. text and data mining researchers creating or working with digitized text collections operate within a complex terrain of federal copyright law and fair use rights, and often confront contractual terms through license agreements or website terms of use that overlay these rights. With reference to key court opinions, this panel will address the legal landscape of non-consumptive text analysis, and explore how scholars' research and publishing needs will continue to shape the contours of fair use.

Speakers
BD

Ben Depoorter

Professor of Law, University of California, Hastings
MV

Molly Van Houweling

Professor of Law, University of California, Berkeley
RS

Rachael Samberg

Scholarly Communications Officer, University of California, Berkeley


Friday January 26, 2018 9:30am - 10:45am
BIDS (190 Doe Library)

9:30am

Use Cases
Session Moderator: Kathryn Stine


Subject headings and beyond: Mapping the HathiTrust Digital Library content for wider use (Trevor Edelblute, Angela Zoss, Inna Kouper)
With over 15 million volumes of digitized materials from many academic libraries, HathiTrust Digital Library (HTDL) provides opportunities for various types of research. Thus, projects that use HTDL examine works of fiction, build predictive algorithms and genre classifications, and create cultural models of text. Despite a number of creative uses of HTDL already in place, enabling wider computational use of this digital library is still a challenge. Such a challenge can be addressed through training, outreach, and tool creation. We propose a complementary approach to the outreach efforts that relies on scientometric perspective and argue that more sophisticated representations of HTDL contents that go beyond simple summaries of its metadata can invite more and varying types of research. If more researchers know about what is available, for example, from the social sciences domains, they may be more willing to incorporate HTDL in their teaching and research. To test this, a range of HTDL contents summaries and visualizations is needed. In this panel presentation we will describe our approach to using subject headings and other metadata in mapping topical content and evolution of books in three disciplines: sociology, anthropology, and psychology. We will describe our workflow for improving metadata quality and supplementing missing metadata, discuss our approach to record deduplication, and, finally, provide visualizations using metadata and title analysis. In conclusion, we will invite the audience to discuss the challenges of mapping the boundaries and comprehensiveness of HTDL in particular domains.


Digital Text Analysis for Quantifying Descriptivity of Writing (Sayan Bhattacharyya, Alex Anderson and José Eduardo González)
Our project is to quantify the notion of descriptive-ness, or descriptivity in writing. Digital text analysis using the resources of the HathiTrust Research Center offers an opportunity to operationalize the anecdotal notion of descriptivity by developing quantified metrics for descriptivity. Parameters such as the relative number and other features of adjective-noun co-occurrence, computed using part-of-speech-tagged tokens made available by the HathiTrust Research Center, serving as a proxy for the “descriptivity” of a text, can provide an initial estimation as to which volumes constitute the most and least “descriptive” volumes and/or pages within those volumes in relation to those parameters. By identifying volumes (and pages within volumes) from the existing, publicly distributed HathiTrust Research Center part-of-speech-tagged tokens, we will create an initial estimation as to which volumes constitute the most and least “descriptive” volumes (as well as the most and least descriptive pages within those volumes) in relation to those parameters.
Although these metrics will have many and varied uses, we, as literary scholars, are interested in them because of our interest in a conjecture made by several critics: that writing has, since World War II, taken an overall turn away from "tell" towards "show" — which arguably means that writers are becoming increasingly more interested in description. While this is considered primarily an Anglo-American phenomenon, there is reason to believe that the trend has extended into non-Anglophone writing, too: several critics have argued that, globally, writing is becoming increasingly more homogeneous and self-similar, a hypothesis our metrics can assess. 


Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library (Jaimie Murdock and Colin Allen)
We present the results of a NEH Digging into Data Challenge Grant, which partnered with the HTRC from 2012-2014. We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. Through a novel approach of “drill-down” topic modeling—simultaneously reducing both the size of the corpus and the unit of analysis—we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of “close reading” needed to generate original interpretations that is the heart of scholarly work in the humanities. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted. This work was recently published in PLoSone: https://doi.org/10.1371/journal.pone.0184188


The University of California ClioMetric History Project (Zach Bleemer)
Studies of higher education in the United States have long been limited by historical data availability and minimal centralized data collection. The University of California ClioMetric History Project leverages HathiTrust collections of university registers, course catalogs, professional directories--along with student transcripts and other university-held records--to visualize and analyze universities' contributions to California's 20th century growth, health, economic mobility, and gender/ethnic equality.

Speakers
CA

Colin Allen

University of Pittsburgh
AA

Alex Anderson

University of Pennsylvania
SB

Sayan Bhattacharyya

Price Lab for Digital Humanities, University of Pennsylvania
ZB

Zach Bleemer

UC Berkeley
TE

Trevor Edelblute

Indiana University Bloomington
JE

José Eduardo González

University of Nebraska-Lincoln
avatar for Inna Kouper

Inna Kouper

Indiana University
avatar for Jaimie Murdock

Jaimie Murdock

Indiana University Bloomington
Jaimie Murdock is a joint PhD student in Cognitive Science and Informatics. He studies the construction of knowledge representations and the dynamics of expertise. While majoring in two scientific disciplines, most of Jaimie's research occurs in the digital humanities, where he u... Read More →
AZ

Angela Zoss

Duke University


Friday January 26, 2018 9:30am - 10:45am
Moffitt Library, 5th floor

10:45am

Coffee Break
Friday January 26, 2018 10:45am - 11:00am
Moffitt Library, 5th floor

11:00am

Hands-On Sessions
Session Moderator: Cody Hennesy
Data Science Modules for Teaching HTRC at Berkeley (Chris Hench, Alex Chan)
As part of the modules development effort within the Division of Data Sciences at Berkeley, we’ve worked with the UC Berkeley Library to create a module that highlights the available digital resources in the realm of literature. The HTRC provides several ways to access and download data, which can be harnessed to answer research questions central to the humanities. This module utilizes Python in Jupyter notebooks to demonstrate the ease and tremendous research potential of constructing a large corpus and analyzing texts through visualization, mapping, and machine learning.


TextThresher: Qualitative Text Analysis at a Quantitative Scale (Nick Adams, Norman Gilmore)
Text Thresher improves the social science practice of content analysis, making it vastly more transparent and scalable to hundreds of thousands of documents. Text Thresher is a web-interface operating in citizen science and crowd working environments like CrowdCrafting. The interface allows researchers to clearly specify hand-labeling and text classification tasks in a user-friendly workflow that maximizes crowd worker accuracy and efficiency. As citizen scientists or crowd workers label and extract data from thousands of documents using Text Thresher, they simultaneously generate training sets enabling machine learning algorithms to augment or replace researchers' and crowd workers' efforts. Output is ready for a range of computational text analysis techniques and viewable as labels layered over original document text.


Collections as Data on labs.loc.gov (Abigail Potter, Jaime Mears, Meghan Ferriter and Katherine Zwaard)
The Library of Congress launched labs.loc.gov as a place for innovation and a pathway to enable more computational use of Library collections. The Labs staff will update the HTRC community on initiatives of interest, including an upcoming OCR challenge, our Innovator-in-Residence program, and digital scholarship workshops.

Speakers
NA

Nick Adams

University of California, Berkeley
I'm a sociologist and data science at the Berkeley Institute for Data Science and the founder of the GoodlyLabs.I create research tools to get the crowd in on solving big problems.
AC

Alex Chan

University of California, Berkeley
avatar for Meghan Ferriter

Meghan Ferriter

Library of Congress
Supporting digital scholarship, Crowdsourcing, Access & Use of Digital Collections, Piloting & Evaluating, Partnerships, plus "What are you excited about in/for 2018?"
NG

Norman Gilmore

Text Thresher
CH

Chris Hench

University of California, Berkeley


Friday January 26, 2018 11:00am - 12:00pm
Moffitt Library, 5th floor

12:00pm

Lunch (box lunches provided)
Friday January 26, 2018 12:00pm - 1:30pm
Moffitt Library, 5th floor

1:30pm

Curriculum Development
Speakers
avatar for Eleanor Dickson

Eleanor Dickson

University of Illinois; HTRC
HathiTrust Research Center, text analysis, #dlfteach, digital library pedagogy
avatar for Cody Hennesy

Cody Hennesy

Librarian, UC Berkeley


Friday January 26, 2018 1:30pm - 2:30pm
Moffitt Library, 5th floor

1:30pm

HathiTrust Quality Metadata - Use Cases and Implementation Approaches
The HathiTrust Quality Assurance and Standards Working Group is finalizing a metadata schema for the aggregation of metadata relating to an item's quality across all HathiTrust systems.  Once the schema is formally approved, the group will begin drafting a document to recommend an implementation strategy and details.  Please bring your ideas on how metadata that attempts to characterize the quality of HathiTrust resources could be useful to you in your work.









Speakers
PF

Paul Fogel

University of California


Friday January 26, 2018 1:30pm - 2:30pm
D-Lab (356 Barrows)

1:30pm

Secure Research Computing and Analytics Environments on Demand
Members of UC Berkeley Libraries, Research IT, Berkeley Research Computing, and the D-Lab will discuss challenges and opportunities for secure research computing, including the Analytics Environments on Demand (AEoD) service. AEoD is designed for researchers who need to run analytic software packages (such as ArcGIS, Stata, SPSS, R Studio, etc.) on a platform that is scaled up from a standard laptop or workstation, in a Windows-based environment.

Speakers
CH

Chris Hoffman

UC Berkeley, Research IT
EM

Erik Mitchell

UC Berkeley
avatar for Amy Neeser

Amy Neeser

Research Data Management Program Manager, University of California Berkeley
As the University of California Berkeley's Research Data Management (RDM) Program Manager, I ensure the effective design and coordination of RDM issues and services across campus. I oversee the program and its services to academic departments and service units and am responsible... Read More →
avatar for Patrick Schmitz

Patrick Schmitz

UC Berkeley, Research IT
Patrick Schmitz is Associate Director of Research IT Architecture and Strategy, providing IT strategy and solutions in support of campus research; and Program Director of Berkeley Research Computing. He has provided technical leadership to build IT solutions supporting museums an... Read More →
JS

Jon Stiles

UC Berkeley, D-Lab


Friday January 26, 2018 1:30pm - 2:30pm
BIDS (190 Doe Library)

2:30pm

Break (travel time in between session venues)
Friday January 26, 2018 2:30pm - 2:45pm
TBA

2:45pm

UnCamp Sessions (New Information)
All Sessions will be in the group spaces in Moffitt Library, 5th Floor.

1.) Interactive data visualizations - Facilitator (Doug) - Main presentation space

2.) GPU acceleration - Facilitator (Markus)

3.) How to deal with missing data when constructing a corpus - Facilitator (Amanda)

4.) Theory-building with CTA (Computational Text Analysis)?  - Facilitator (Rebecca)

5.) Liberating Text Project - Facilitator (Johannes)

6.) Librarians Promoting HTRC and working w/ CTA and TDM Researchers - Facilitator (Nikki)

7.) OCR Discussion - Facilitator  (Tom)

Friday January 26, 2018 2:45pm - 3:45pm
Moffitt Library, 5th floor

3:45pm

Coffee Break
Friday January 26, 2018 3:45pm - 4:00pm
Moffitt Library, 5th floor

4:00pm

Plenary and Wrap-Up
During this session our UnCamp Session breakout groups will have up to 5 mins per group to discuss their sessions and to offer insight to others on these topics.

Friday January 26, 2018 4:00pm - 5:00pm
Moffitt Library, 5th floor