top of page
GHI_OFFIICAL_LOGO.jpg
ghi_logo.jpg

TEXT MINING &
STATISTICS FOR
HISTORIANS

By Dr. Luke Blaxill (University of Oxford) and Dr. Kaspar Beelen (Turing Institute) provided to the German Historical Institute

​

A pair of courses designed to introduce historians to text mining: theoretical approaches, practical skills, and applied methods. And secondly to statistics: analysing quantitative data with statistical techniques, interpreting results, and using them to build academically robust arguments.
Logo_Zusatz_Wissen_entgrenzen_MWS_dt_web_rgb_72dpi.jpg
GHI_OFFIICAL_LOGO.jpg
ABOUT
This course has been commissioned by the German Historical Institute London (GHI) and the Max Weber Foundation, and was funded by the German Federal Ministry of Education and Research. It is designed to introduce historians to text mining and statistics, build skills in these areas, and also to help course-takers develop the confidence and ability to use text mining, and the quantitative techniques often required to interpret its results, in their own historical research. The debate on computing in History - and indeed the role of quantitative analyses in the subject - has been a long and involved one stretching back to the 1960s. While the popularity of both has declined as History's status as a humanities subject has become entrenched, the data age, and increasing influence of the social sciences, have placed both issues prominently back on the agenda again. These courses does not attempt to create acolytes, but to critically reflect on the scope, power, and limitations of these methodologies.

The course leaders are Dr. Luke Blaxill of the University of Oxford, and Dr. Kaspar Beelen of the Turing Institute. The development of these modules was made possible by funding from the German Federal Ministry of Education and Research under the grant number U1UG1903. Responsibility for the content lies with the developers
COURSE 1: TEXT MINING FOR HISTORIANS
UNIT 1: INTRODUCTION TO TEXT MINING FOR HISTORIANS
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE
 

UNIT 2: DATA. SELECTION, COLLECTION, & STRUCTURE
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE
 

UNIT 3: BASIC ANALYSIS WITH ANTCONC pt1
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE
 

UNIT 4: BASIC ANALYSIS WITH ANTCONC pt2
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE
 

UNIT 5: (MORE) ADVANCED ANALYSIS: PYTHON pt1
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE
 

UNIT 6: (MORE) ADVANCED ANALYSIS: PYTHON pt2
An introduction to the potential uses, limitations, and implications of text mining in History

FULL COURSE

COURSE 2: STATISTICS FOR HISTORIANS
UNIT 1: INTRODUCTION TO STATISTICS FOR HISTORIANS
An introduction to using statistics in History

FULL COURSE
 

UNIT 2: BASIC DESCRIPTIVE STATISTICS
Mode, Mean, Median, Standard Deviation, Correlation, Scatterplots

FULL COURSE
 

UNIT 3: STATISTICAL SIGNIFICANCE & REGRESSION 
An introduction to statistical significance and regression

FULL COURSE
 

UNIT 4: INTRODUCTION TO SPSS STATISTICS
A very quick introduction to SPSS, and a guide on how to perform the analyses in Units 2&3

FULL COURSE
 

UNIT 5: EXPLORATIVE DATA ANALYSIS
Exploring DataFrames with Pandas

FULL COURSE
 

UNIT 6: HYPOTHESIS TESTING
More advanced inferential statistics

FULL COURSE
 

UNIT 7: (MORE) ADVANCED REGRESSION
Building on the basin introduction to regression in Lecture 3

FULL COURSE
 

​
DOWNLOADS
The five below 'Exercise Corpora' have been compiled and segmented by us from publically available textual historical archives. They support the exercises for both courses and can be

downloaded in full from GitHub.
MOH40.jpg
parliamentsquare.jpg
HMDSQUAREtled-3.jpg

Medical Officers of Health in London, 1848-1972
Medical Officers of Health (MOH) were appointed to investigate the health of the population, sanitary conditions, disease, housing, clinical services, in each London borough through reports. Our corpus enables comparisons between interwar and Victorian MOH, as well as a wealthy borough (Westminster) and a poor one (Poplar).

British House of Commons Debates, 1945-2014
Taken from the official record of Parliament (Hansard). Our corpus enables comparisons between parties, ministers, and male vs female MPs. The debates we have selected concern the issue of abortion law.

Heritage made Digital Newspapers, 1800-1880
Nineteenth century British articles from numerous newspapers. We have created two sub corpora: all of the articles containing the the word 'slavery' and all of the articles containing 'workhouse'. Subdivided by decades which saw key legisilation and campaigns on these issues.

ABOUT
TEXT MINING
STATISTICS
DOWNLOADS
manfiestosquared-2.jpg

British Election Manifestos, 1966-2019
The printed national manifestos of the Liberal, Labour, and Conservative parties in every general election from 1966 to present. We have set this corpus up to enable comaprisons between parties, and between two key decades: the 1960s (1964, 1966, 1970 elections) and the 1980s (1979, 1983, 1987 elections).

TIMESSQUARE.jpg

The Times Headlines from the 1960s onwards
The Times is often credited as being Britain's national 'newspaper of record'. This corpus is setup as a csv file and contains every headline from every day.

CONTACT
bottom of page