top of page
TEXT MINING FOR HISTORIANS COURSE
UNIT 2: DATA- SELECTING, COLLECTING &
STRUCTURING A CORPUS
Lecture (24 Mins)
KEY READINGS
* Martin Wynne, 'Creating a corpus' and 'Developing Linguistic Corpora: A Guide to Good Practice' (2005). Link
Then there are dedicated chapters on corpus creation in:
* Tony McEnery and Andrew Hardie, Corpus Linguistics: Method, Theory and Practice (Cambridge, 2011)
* Magali Paquot and Stefan Th. Gries, A Practical Handbook of Corpus Linguistics (Springer, 2020)
* Svenja Adolphs, Introducing Electronic Text Analysis: A Practical Guide for Language and Literary Studies (Trowbridge, 2006)
EXERCISES
Alone, or with a friend, consider what sort of corpus might support your research. What primary source texts are there, and what sort of research questions would be interesting to ask with them. In particular, consider:
1.) How would you structure such a corpus with metadata. What comparative groups are there for you to work with? Would you need to add metadata with markup of some kind?
2.) Where are the sources located? Are they available digitally? If not, could you digitise them yourself?
3.) Is your data structured, semi structured, or unstructured?
bottom of page