|
Previously at H5, I worked on text categorization and knowledge discovery projects within the legal domain. Most of my work involved automatically processing large document corpora to identify textual/linguistic patterns that help to classify a document into a number of desired categories or assess the relevance of a document to a topic, or question.
At Iowa State University, I was involved in two projects involving the use of machine learning and text categorization techniques for linguistic purposes. The first
project was automatic classification of language learner writings
into language proficiency levels. This project involved statistical
analyses of a number of textual features, finding reliable and
automatically measurable linguistic features indicative of
proficiency levels, and manual annotation and analysis of data. The
goals of this project were twofold: (i) one aim is to develop
reliable automatic evaluation software, and (ii) the other goal was
to provide comparative data on second language development based on
writers' first languages.
The second project, which was part of the Study for the
Termination of Online Predators (STOP),
involved automatic detection of child/pedophile communication in
online text chats. The goals of this project were also twofold, as well: (i)
one goal was to develop a software application that can flag a text
chat as suspicious (for law enforcement officials or for
parents/guardians), and (ii) the other aim was to provide a better
understanding of child/predator communication. Other
co-investigators of this project were Chad Harms (Greenlee School
of Journalism and Communication/HCI) and Brian Monahan (Sociology).
| Thesis Abstract |
|
My thesis advocates a modular and parallel grammar
architecture with declarative constraints on the syntactic,
semantic, prosodic, and pragmatic structures which are derived in
parallel while mutually constraining one another as proposed by
Jackendoff (1997, 2002). The main claim of this thesis is that
because of the many conflicting requirements among modules, the
interfaces cannot employ crisp constraints. Instead, a
soft-constraint satisfaction approach is required. We also argue
that simply violable constraints are insufficient to account for
certain linguistic phenomena; there is need for graded constraints
that allow for degrees of violation.
The dissertation first provides a review of different
conceptions of gradience in linguistics followed by a review of the
concept of modularity in cognitive science and linguistics. The
problem of conflicting requirements in the field of Constraint
Logic Programming (CLP) has led to various soft constraint
satisfaction approaches. The dissertation then presents a
generalized theory of soft constraint satisfaction Bistarelli
(2001) from the CLP literature. The dissertation then presents a
case study of graded constraints showing that such constraints
exist at interfaces and that they can exhibit degrees of violation.
Another case study shows that the modular parallel architecture
allows for simpler modules and is able to capture generalizations
better. We then conclude by showing how the generalized theory of
soft-constraint satisfaction can be incorporated within grammar
without disrupting the existing explanatory power of
constraint-based theories such as LOT (Keller, 2000) and HPSG
(Pollard and Sag, 1994).
This thesis was co-supervised by Elizabeth Cowper
and Gerald Penn. Other
committee members were Elan
Dresher, Jean-Pierre Koenig, and Frank
Keller.
References
- Bistarelli, S. (2001). Soft Constraint Solving and
Programming: A General Framework. Ph.D. thesis, Universit�
di Pisa.
- Jackendoff, R. (1997). The Architecture of the Language
Faculty. Linguistic Inquiry: Monograph Twenty-Eight.
Cambridge, Mass.: The MIT Press.
- Jackendoff, R. (2002). Foundations of language: Brain,
Meaning, Grammar, Evolution. New York, NY: Oxford.
- Keller, F. (2000). Gradience in Grammar: Experimental and
Computational Aspects of Degrees of Grammaticality. Ph.D.
thesis, University of Edinburgh.
- Pollard, C. and I. Sag (1994). Head-Driven Phrase
Structure Grammar. Studies in Contemporary Linguistics.
Chicago: CSLI.
|
|
In graduate school, I wrote two generals papers as part of my Ph.D. requirements. The title of my first generals paper is "Plural and Sequential Events:
Some Theoretical and Computational Implications." In this
paper, I take a computational-semantic approach to plural and
sequential events with special emphasis on Czech and Russian
because of their rich aspectual systems. This research was
supervised by Graeme Hirst and Elizabeth Cowper. The title of my second generals paper is "A Constraint-Based Approach to
Information Structure and Prosody Correspondence." This paper
proposes a parallel architecture for constraint-based theories of
grammar, HPSG in particular, in favour of modular, readable, and
maintainable theories/grammars. In this architecture,
syntax/semantics and information structure constrain prosodic
structure, which is generated in par with the other two structures
mentioned. This paper was supervised by Elizabeth Cowper
and Gerald Penn.
I also worked as a research assistant to Gerald
Penn. We were part of Module A4 of the MiLCA consortium. I was involved in developing a
grammar formalism and computational system for parsing freer word
order languages.
|