HTRC UnCamp 2018 has ended

Welcome to the fourth iteration of the semi-annual HathiTrust Research Center (HTRC) UnCamp. This is where members of the HTRC community gather to explore the latest developments in using HTRC tools and services to anlayze the HathiTrust Digital Library corpus. Visit https://www.hathitrust.org/htrc_uncamp2018 for more information or see our online proceedings at https://osf.io/view/htrc_uncamp2018 hosted by OSF Meetings.

Back To Schedule
Friday, January 26 • 11:00am - 12:00pm
Hands-On Sessions

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Session Moderator: Cody Hennesy
Data Science Modules for Teaching HTRC at Berkeley (Chris Hench, Alex Chan)
As part of the modules development effort within the Division of Data Sciences at Berkeley, we’ve worked with the UC Berkeley Library to create a module that highlights the available digital resources in the realm of literature. The HTRC provides several ways to access and download data, which can be harnessed to answer research questions central to the humanities. This module utilizes Python in Jupyter notebooks to demonstrate the ease and tremendous research potential of constructing a large corpus and analyzing texts through visualization, mapping, and machine learning.

TextThresher: Qualitative Text Analysis at a Quantitative Scale (Nick Adams, Norman Gilmore)
Text Thresher improves the social science practice of content analysis, making it vastly more transparent and scalable to hundreds of thousands of documents. Text Thresher is a web-interface operating in citizen science and crowd working environments like CrowdCrafting. The interface allows researchers to clearly specify hand-labeling and text classification tasks in a user-friendly workflow that maximizes crowd worker accuracy and efficiency. As citizen scientists or crowd workers label and extract data from thousands of documents using Text Thresher, they simultaneously generate training sets enabling machine learning algorithms to augment or replace researchers' and crowd workers' efforts. Output is ready for a range of computational text analysis techniques and viewable as labels layered over original document text.

Collections as Data on labs.loc.gov (Abigail Potter, Jaime Mears, Meghan Ferriter and Katherine Zwaard)
The Library of Congress launched labs.loc.gov as a place for innovation and a pathway to enable more computational use of Library collections. The Labs staff will update the HTRC community on initiatives of interest, including an upcoming OCR challenge, our Innovator-in-Residence program, and digital scholarship workshops.


Nick Adams

University of California, Berkeley
I'm a sociologist and data science at the Berkeley Institute for Data Science and the founder of the GoodlyLabs.I create research tools to get the crowd in on solving big problems.

Alex Chan

University of California, Berkeley
avatar for Meghan Ferriter

Meghan Ferriter

Senior Innovation Specialist, Library of Congress
Senior Innovation Specialist with Library of Congress Labs. Anthropologist and historian by training; thoughty by nature.Come find me to talk about supporting digital scholarship, crowdsourcing, access & use of digital collections, piloting & evaluating, collaboration and partnerships... Read More →

Norman Gilmore

Text Thresher

Chris Hench

University of California, Berkeley

Friday January 26, 2018 11:00am - 12:00pm PST
Moffitt Library, 5th floor