Name: Hands-On Sessions
Start: 2018-01-26T11:00:00-0800
End: 2018-01-26T12:00:00-0800

Welcome to the fourth iteration of the semi-annual HathiTrust Research Center (HTRC) UnCamp. This is where members of the HTRC community gather to explore the latest developments in using HTRC tools and services to anlayze the HathiTrust Digital Library corpus. Visit https://www.hathitrust.org/htrc_uncamp2018 for more information or see our online proceedings at https://osf.io/view/htrc_uncamp2018 hosted by OSF Meetings.

Back To Schedule

Hands-On Sessions

Feedback form is now closed.

Session Moderator: Cody Hennesy
Data Science Modules for Teaching HTRC at Berkeley (Chris Hench, Alex Chan)
As part of the modules development effort within the Division of Data Sciences at Berkeley, we’ve worked with the UC Berkeley Library to create a module that highlights the available digital resources in the realm of literature. The HTRC provides several ways to access and download data, which can be harnessed to answer research questions central to the humanities. This module utilizes Python in Jupyter notebooks to demonstrate the ease and tremendous research potential of constructing a large corpus and analyzing texts through visualization, mapping, and machine learning.

TextThresher: Qualitative Text Analysis at a Quantitative Scale (Nick Adams, Norman Gilmore)
Text Thresher improves the social science practice of content analysis, making it vastly more transparent and scalable to hundreds of thousands of documents. Text Thresher is a web-interface operating in citizen science and crowd working environments like CrowdCrafting. The interface allows researchers to clearly specify hand-labeling and text classification tasks in a user-friendly workflow that maximizes crowd worker accuracy and efficiency. As citizen scientists or crowd workers label and extract data from thousands of documents using Text Thresher, they simultaneously generate training sets enabling machine learning algorithms to augment or replace researchers' and crowd workers' efforts. Output is ready for a range of computational text analysis techniques and viewable as labels layered over original document text.

Collections as Data on labs.loc.gov (Abigail Potter, Jaime Mears, Meghan Ferriter and Katherine Zwaard)
The Library of Congress launched labs.loc.gov as a place for innovation and a pathway to enable more computational use of Library collections. The Labs staff will update the HTRC community on initiatives of interest, including an upcoming OCR challenge, our Innovator-in-Residence program, and digital scholarship workshops.

Speakers

Nick Adams

University of California, Berkeley

I'm a sociologist and data science at the Berkeley Institute for Data Science and the founder of the GoodlyLabs.I create research tools to get the crowd in on solving big problems.

Alex Chan

University of California, Berkeley

Meghan Ferriter

Senior Innovation Specialist, Library of Congress

Senior Innovation Specialist with Library of Congress Labs. Anthropologist and historian by training; thoughty by nature.Come find me to talk about supporting digital scholarship, crowdsourcing, access & use of digital collections, piloting & evaluating, collaboration and partnerships... Read More →

Norman Gilmore

Text Thresher

Chris Hench

University of California, Berkeley

Friday January 26, 2018 11:00am - 12:00pm PST
Moffitt Library, 5th floor

General Session

HTRC UnCamp 2018

Nick Adams

Alex Chan

Meghan Ferriter

Norman Gilmore

Chris Hench

Attendees (34)

HTRC UnCamp 2018

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Nick Adams

Alex Chan

Meghan Ferriter

Norman Gilmore

Chris Hench

Attendees (34)