Performing Gender: Automatic Stylistic Analysis of Shakespeare’s Characters
Sobhan HOTA Shlomo ARGAMON Moshe KOPPEL Iris ZIGDON
Department of Computer Science, Illinois Institute of Technology
Department of Computer Science, Bar-Ilan University
1. Introduction
A
recent development in the study of language and gender is the use of automated text classification methods to examine how men and women might use language differently. Such work on classifying texts by gender has achieved accuracy rates of 70-80% for texts of different types (e-mail, novels, non-fiction articles), indicating that noticeable differences exist (de Vel et al. 2002; Argamon et al. 2003). More to the point, though, is the fact that the distinguishing language features that emerge from these studies are consistent, both with each other, as well as with other studies on language and gender. De Vel et al. (2002) point out that men prefer ‘report talk’, which signifies more independence and proactivity, while women tend to prefer ‘rapport talk’ which means agreeing, understanding and supporting attitudes in situations. Work on more formal texts from the British National Corpus (Argamon et al. 03) similarly shows that the male indicators are mainly noun specifiers (determiners, numbers, adjectives, prepositions, and post-modifiers) indicating an ‘informational style’, while female indicators are a variety of features indicating an ‘involved’ style (explicit negation, first- and second-person pronouns, present tense verbs, and the prepositions “for” and “with”). Our goal is to extend this research for analyzing the relation of language use and gender for literary characters. To the best of our knowledge, there has been little work on understanding how novelists and playwrights
P. 82 Single Sessions
DH.indb 82
6/06/06 10:55:29
DIGITAL HUMANITIES 2006 portray (if they do) differential language use by literary characters of different genders. To apply automated analysis techniques, we need a clean separation of the speech of different characters in a literary work. In novels, such speech is integrated