Research and Me

By education, I am a computational linguist. The approaches I've used have led me to the conclusion that I'm also a numbers nerd. Call it what you will - data scientist, statistical analyst, text miner - I love telling a story from data.

On this page you'll find:

Areas of Interest

My main research interest focuses on what I call Personal Language Analytics. PLA is a branch of text mining in which the object of analysis is the author of a document rather than the document itself. There is much that language use can tell us about in an individual. Across a variety of social media we talk a lot. I'm particularly interested in the effects that individual differences play in more conventional content analytics such as sentiment analysis.

Other topics and techniques that I am interested in include:

  • User behaviour modelling
  • Text classification and machine learning
  • Statistical analysis of sports fantasy/tipping data
  • Application of language technology as productivity tool

Doctoral Topic

My PhD thesis was titled The Language of Weblogs - A study of genre and individual differences. My focus was a linguistic study of online diaries - weblogs - in which I explore the hypotheses: Blogs are a linguistically distinctive genre; Personality and gender are projected linguistically in blogs.

Some of my main findings included:

  • As a genre, utilising a number of measures to compare various corpora, blogs have common elements with other types of computer- mediated communication, and are more like traditional written genres compared with email
  • Linguistic projection of five factors of personality and gender are identifiable using both top-down theory based dictionary approaches and data-driven statistical text-mining
  • The most useful approach was that of context-based word n-grams, which accounted for the majority of the statistical variance within the dimensions of study

Collaborators

Some of the excellent people with whom I have had the pleasure and honour of working with.

  • Alastair Gill: Research Fellow at the Department of Digital Humanities, King's College London. His work focuses on language processes in the expression and perception of characteristics such as personality, emotion, trust and group formation.
  • Jon Oberlander: Professor of Epistemics, University of Edinburgh. His main focus is the development of cognitively-motivated computational and formal models of the ways in which differing people produce fluent discourse.
  • Francisco Iacobelli: assistant professor in the Computer Science Department at Northeastern Illinois University. Interested in intelligent strategies of information retrieval and determining author personality from text.
  • Juanita Whalen: doctoral candidate of psychology, University of Calgary. Exploring the function of non-literal language in emails and weblogs.