NLP Tasks

Domain Adaptation for Syntactic Analysis

Introduction

Current NLP systems tend to perform well only on their training domain and nearby genres, while the performances often degrade on the data drawn from different domains. For instance, the performance of a statistical parser trained on the Penn Treebank Wall Street Journal (WSJ; newspaper text) significantly drops when evaluated on text from other domains, as shown in the following table (results are from McClosky, 2010; for the details of those domains, cf. the next section):

Constituent-to-dependency Conversion

Introduction

Since dependency structure is not constrained by word order, it is considered to be more domain or language independent than phrase structure. Most current state-of-the-art dependency parsers use supervised learning approaches, which usually requires a large amount of annotated data. For English, there are some manually annotated dependency Treebanks available 12. Nonetheless, constituent-based Treebanks such as Penn Treebank are more dominant. Therefore, it is quite natural to built the tools that convert phrase structures to dependency structures.

  1. Owen Rambow, Cassandre Creswell, Rachel Szekely, Harriet Taber, and Marilyn Walker. A dependency treebank for english. In Proceedings of LREC’02, 2002. 

  2. M. Cmejrek, J. Curín, and J. Havelka. Prague czech-english dependency treebank: Any hopes for a common annotation scheme? In HLT-NAACL’04 workshop on Frontiers in Corpus Annotation, pages 47–54, 2004. 

Semantic Parsing

Introduction

In fact, “semantic parsing” is, ironically, a semantically ambiguous term, which could refer to:

Semantic Role Labeling

Introduction

Semantic Role Labeling (SRL), sometimes called shallow semantic parsing, is a process in natural language processing that assigns semantic roles to constituents or their head words in a sentence according to their relationship to the predicates expressed in the sentence. Typical semantic roles can be divided into core arguments and adjuncts. The core arguments include Agent, Patient, Source, Goal, etc, While the adjuncts include Location, Time, Manner, Cause, etc.

Back to Top ↑

Meaning Representation

Brief Introduction to Abstract Meaning Representation

AMR (Abstract Meaning Representation), is a set of sentences paired with simple, readable semantic representations. Remember that we have talked about FrameNet, a well-known ontology for frame semantic representations. If you want to know more about AMR, click here.

Back to Top ↑

Linguistics

Brief Introduction to Chinese Morphology

This is the draft I wrote for my presentation for the course called English Lexicoiogy, where I was asked to introduce something about Chinese morphology to student from College of Foreign Languages.

Back to Top ↑

Coursework

Build a Simple C Compiler with a Parse Tree!

Hi, there! I has completed my assignment for the course called Principles of Compilers eventually - building a simple C compiler. Additionally, I also wrote a Python draft to visualize the parse tree :) For more information about the process, please click here.

Back to Top ↑