What I do

My Resume

download here

Professional Interests
bullet Computational linguistics, statistical natural language processing, text data mining, knowledge discovery from unstructured data, information retrieval, and machine learning.
Current Work

At Uptake, my work involves automatically identifying and classifying travel-related Web sites and extracting relevant information from them. I also perform sentiment analysis on sites to infer people's attitudes toward them.

Previous Work

Previously at H5, I worked on text categorization and knowledge discovery projects within the legal domain. Most of my work involved automatically processing large document corpora to identify textual/linguistic patterns that help to classify a document into a number of desired categories or assess the relevance of a document to a topic, or question.

At Iowa State University, I was involved in two projects  involving the use of machine learning and text categorization techniques for linguistic purposes. The first project was automatic classification of language learner writings into language proficiency levels. This project involved statistical analyses of a number of textual features, finding reliable and automatically measurable linguistic features indicative of proficiency levels, and manual annotation and analysis of data. The goals of this project were twofold: (i) one aim is to develop reliable automatic evaluation software, and (ii) the other goal was to provide comparative data on second language development based on writers' first languages.

The second project, which was part of the Study for the Termination of Online Predators (STOP), involved automatic detection of child/pedophile communication in online text chats. The goals of this project were also twofold, as well: (i) one goal was to develop a software application that can flag a text chat as suspicious (for law enforcement officials or for parents/guardians), and (ii) the other aim was to provide a better understanding of child/predator communication. Other co-investigators of this project were Chad Harms (Greenlee School of Journalism and Communication/HCI) and Brian Monahan (Sociology).

Thesis Abstract

My thesis advocates a modular and parallel grammar architecture with declarative constraints on the syntactic, semantic, prosodic, and pragmatic structures which are derived in parallel while mutually constraining one another as proposed by Jackendoff (1997, 2002). The main claim of this thesis is that because of the many conflicting requirements among modules, the interfaces cannot employ crisp constraints. Instead, a soft-constraint satisfaction approach is required. We also argue that simply violable constraints are insufficient to account for certain linguistic phenomena; there is need for graded constraints that allow for degrees of violation.

The dissertation first provides a review of different conceptions of gradience in linguistics followed by a review of the concept of modularity in cognitive science and linguistics. The problem of conflicting requirements in the field of Constraint Logic Programming (CLP) has led to various soft constraint satisfaction approaches. The dissertation then presents a generalized theory of soft constraint satisfaction Bistarelli (2001) from the CLP literature. The dissertation then presents a case study of graded constraints showing that such constraints exist at interfaces and that they can exhibit degrees of violation. Another case study shows that the modular parallel architecture allows for simpler modules and is able to capture generalizations better. We then conclude by showing how the generalized theory of soft-constraint satisfaction can be incorporated within grammar without disrupting the existing explanatory power of constraint-based theories such as LOT (Keller, 2000) and HPSG (Pollard and Sag, 1994).

This thesis was co-supervised by Elizabeth Cowper and Gerald Penn. Other committee members were Elan Dresher, Jean-Pierre Koenig, and Frank Keller.

References

  • Bistarelli, S. (2001). Soft Constraint Solving and Programming: A General Framework. Ph.D. thesis, Universit� di Pisa.
  • Jackendoff, R. (1997). The Architecture of the Language Faculty. Linguistic Inquiry: Monograph Twenty-Eight. Cambridge, Mass.: The MIT Press.
  • Jackendoff, R. (2002). Foundations of language: Brain, Meaning, Grammar, Evolution. New York, NY: Oxford.
  • Keller, F. (2000). Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh.
  • Pollard, C. and I. Sag (1994). Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. Chicago: CSLI.

In graduate school, I wrote two generals papers as part of my Ph.D. requirements. The title of my first generals paper is "Plural and Sequential Events: Some Theoretical and Computational Implications." In this paper, I take a computational-semantic approach to plural and sequential events with special emphasis on Czech and Russian because of their rich aspectual systems. This research was supervised by Graeme Hirst and Elizabeth Cowper. The title of my second generals paper is "A Constraint-Based Approach to Information Structure and Prosody Correspondence." This paper proposes a parallel architecture for constraint-based theories of grammar, HPSG in particular, in favour of modular, readable, and maintainable theories/grammars. In this architecture, syntax/semantics and information structure constrain prosodic structure, which is generated in par with the other two structures mentioned. This paper was supervised by Elizabeth Cowper and Gerald Penn.

I also worked as a research assistant to Gerald Penn. We were part of Module A4 of the MiLCA consortium. I was involved in developing a grammar formalism and computational system for parsing freer word order languages.