H9808A 2009 10 Machines

InfoInfo
Search:    

Return to UWO History 9808A Digital History Fall 2009

Machine Learning and Data Mining (25 Nov 2009)

In Data Mining, Witten and Frank define the subject as “the extraction of implicit, previously unknown, and potentially useful information from data,” as the process of “finding and describing patterns in data.” Machine learning, a sub-discipline of computer science, goes one step further by attempting to use these patterns to classify previously unseen data. Historians are now beginning to use both kinds of techniques in the research process.

Readings for Discussion

Background Readings

The following set of posts describe how to implement one complete machine learning / data mining project, using trial records from the [WWW]Old Bailey Online. The links to the source code in my blog are broken, but copies of all of the Python programs can be found [WWW]here. If you just want to get an idea of what I did, read posts 1 and 14. The naive bayesian learner is described in post 7.

Assignment

Do some simple text mining. This week you learned about some sophisticated tools that can be used by humanists to process large amounts of text and facilitate exploration. Some of these techniques require programming skills, but many do not. The Canadian [WWW]TAPoR project is a wonderful collection of resources that bring text processing and analysis within the reach of any scholar. Starting at the [WWW]TAPoR recipes page try choosing a historical text from [WWW]Gutenberg and [WWW]generating a concordance. What kinds of things can you learn about a work this way? Feel free to blog about the assignment if you find something interesting.

This is a Wiki Spot wiki. Wiki Spot is a non-profit organization that helps communities collaborate via wikis.