Zi Lin
by Zi Lin


  • NLP Tasks


In fact, “semantic parsing” is, ironically, a semantically ambiguous term, which could refer to:

  • Semantic role labeling (also known as shallow semantic parsing, see previous post for more information).
  • Finding generic relations in text.
  • Transforming a natural language sentence into its meaning representation.

In this post, semantic parsing is the task of mapping natural language sentences (NL) into complete formal meaning representation (MRs) which a computer can execute for some domain -specific application 1. Representing the meaning of natural language is ultimately a difficult philosophical question and many attempts have been made to define generic formal semantics of natural language, including:

  • Abstract Meaning Representation (AMR)

  • Semantic Dependency Parsing (SDP)

  • Meaning Representation Language (MRL)

  • SQL Query

Abstract Meaning Representation

Abstract Meaning Representation (AMR) 2 is a semantic formalism in which the meaning of a sentence is encoded as a rooted, directed, acyclic graph. Nodes represent concepts, and labeled directed graph edges represent the relationships between them. For example, for the sentence “the boy wants to believe the girl” we can have the corresponding AMR graph (in penman format) as:

(w / want-01
	:ARG0 (b / boy)
	:ARG1 (b2 / believe-01
		:ARG0 (g / girl)
		:ARG1 b))

AMR parsing, the task of transforming a sentence into its AMR graph, is a challenging task as it requires the parser to learn to predict not only concepts, which consist of predicates, lemmas, named entities, wiki-links and co-references, but also a large number of relation types based on the semantic role in PropBank (see the previous blog Semantic Role Labeling if you are interested).


There are multiple version of AMR data, including:

  • LDC2013E117: 10,853 sentences with 13,050 AMRs (corpus includes multiple annotations for some sentences)

  • LDC2014T12 (AMR 1.0): 13,051 sentences.

  • LDC2015E86: 19,572 sentences. This version includes more AMRs, wikification, AMR adoption of new unified PropBank frames, AMR deepening, automatic AMR-English alignments, and correction of AMR annotation errors.

  • LDC2016E25: 39,260 sentences.

  • LDC2017T10 (AMR 2.0): 39,260 sentences.

Experiment & Results

These are the AMR parsing results on LDC2014T12, LDC2015E86 and LDC2017T10 test sets, where 2014N refers to the newswire section of LDC2014T12. You should look the original paper if you are interested in the model. Please contact me (through email) if the statistics is incorrect.

Paper Model LDC2014N LDC2014 LDC2015 LDC2017
ACL-18 Lyu & Titov
University of Edinburgh & University of Amsterdam
- - 73.7 74.4
ACL-18 Groschwitz et al.
Saarland University & Macquerie University
- - 70.2 71.0
ACL-17 Buys & Blunsom
University of Oxford & Deepmind
- - - 61.9
ACL-17 Foland & Martin
University of Colorado
- - 70.7 -
EMNLP-18 Guo & Lu
Singapore University of Technology & Design
74.0 68.3 68.7 69.8
Arxiv-17 van Noord & Bos
University of Groningen
- - 68.5 71.0
EMNLP-17 Wang & Xue
Brandeis University
- 68.1 68.1 -
Semeval-16 Wang et al. (CAMR)
Brandeis University & Boulder Language Technologies
- 66.5 67.3 -
Semeval-16 Barzdins & Gosko
University of Latvia
- - 67.2 -
Semeval-16 Flanigan et al. (JAMR)
Carnegie Mellon University & University of Washington
- 66 67 -
EACL-17 Damonte et al.
University of Edinburgh & University of Padua
- 64 64 -
NAACL-18 Vilares & Gómez-Rodríguez
Universidade da Coruña
- - 64 -
AAAI-18 Peng et al.
University of Rochester & University of Padua
- - 64 -
ACL-17 Konstas et al.
University of Washington & Allen Institute for Artificial Intelligence
- - 62.1 -
EACL-17 Peng et al.
University of Rochester & Brandeis University
- - 52 -
EMNLP-15 Pust et al.
Univerisity of Southern California
- 67.1 - -
EMNLP-16 Zhou et al.
Nianjin Normal University & DFKI, Germany
71 66 - -
Semeval-16 Goodman et al.
University College London & University of Sheffeld
70 - 64 -
ACL-IJCNLP-15 Wang et al. (CAMR)
Brandeis University & Boulder Language Technologies
70 66 - -
EMNLP-17 Ballesteros & AI-Onaizan
IBM T.J Waston Research Center
69 64 - -
EMNLP-15 Artzi et al.
Cornell University & University of Washington
67 - - -
ACL-IJCNLP-15 Werling et al.
Stanford University
62 - - -
NAACL-15 Wang et al. (CAMR)
Brandeis University & Harvard Medical School
- 59 - -
ACL-14 Flanigan et al. (JAMR)
Carnegie Mellon University
59 58 - -
CoNLL-15 Peng et al.
University of Rochester
58 - - -

Semantic Dependency Parsing

Task 8 at SemEval 2014 defines Broad-Coverage Semantic Dependency Parsing (SDP) as the problem of recovering sentence-internal predicate-argument relationships for all content words, i.e., the semantic structure constituting the relational core of sentence meaning 3.

SDP seeks to stimulate the dependency parsing community to move towards more general graph processing, to thus enable a more direct analysis of Who did What to Whom? In addition to its relation to syntactic dependency parsing, the task also has some overlap with Semantic Role Labeling (SRL) 4.

The SDP parsers are required to identify all semantic dependencies, i.e., compute a representation that integrates all content words in one structure. Different from SRL, SDP task does not encompass predicate disambiguation, a design decision in part owed to SDP’s goal to focus on parsing-oriented, i.e., structural, analysis, and in part to lacking consensus on sense inventories for all content words.

SDP uses three distinct target representations for semantic dependencies. Including:

DM: DELPH-IN MRS-Derived Bi-Lexical Dependencies

These semantic dependency graphs originate in a manual re-annotation of Sections 00–21 of the WSJ Corpus with syntactico-semantic analyses derived from the LinGO English Resource Grammar (ERG) 5. The DM target representations are derived through a two-step ‘lossy’ conversion of MRSs, first to variable-free Elementary Dependency Structures (EDS) 6, then to ‘pure’ bi-lexical form - projecting some construction semantics onto word-to-word dependencies 7. For example, for the sentence “Ms. Haag plays Elianti.”, we have the DM representation as follows:

id form lemma pos top pred arg1 arg2
1 Ms. Ms. NNP - + _ _
2 Haag Haag NNP - - compound ARG1
3 plays play VBZ + + _ _
4 Elianti Elianti NNP - - _ ARG2
5 . . . - - _ _

PAS: Enju Predicate-Argument Structures

The Enju parsing system is an HPSG-based parser for English. The grammar and the disambiguation model of this parser are derived from the Enju HPSG treebank, which is automatically converted from the phrase structure and predicate-argument structure annotation of the PTB. The PAS data set is extracted from the WSJ portion of the Enju HPSG treebank. While the Enju treebank is annotated with full HPSG-style structures, only its predicate-argument structures are convert into the SDP data format for use in the SDP task.

PCEDT: Prague Tectogrammatical Bi-Lexical Dependencies

The Prague Czech-English Dependency TreeBank (PCEDT) 8 is a set of parallel dependency trees over the WSJ texts from the PTB, and their Czech translations. The specifics of the PCEDT representations are best observed in the procedure that converts the original PCEDT data to the SDP data format; see Miyao et al. 2014 9.


  • Semeval-2014 Task 8: For this task, there will be three data sets: training, development, and test. On November 4, 2013, a sample (of some 190 dependency graphs) from the training data was published as trial data, demonstrating key characteristics of the task. Since December 13, some 750,000 tokens of annotated text are available as training data; please subscribe to the task mailing list for access information.

  • Semeval-2015 Task 8: This Task is a re-run with some extensions of Task 8 at SemEval-2014.

Experiment & Results

The results are the labeled parsing performance (F1 score) on both in-domain and out-of-domain test data and are based on the English dataset from SemEval 2015 Task 18 closed track. The data split is 33,964 training sentences from 00-19 of the WSJ corpus, 1,692 development sentences from 20, 1,410 sentences from 21 as in domain test data, and 1,849 sentences sampled from the Brown Corpus as out-of-domain test data.


paper model DM PAS PSD
ACL-2018 Dozat & Manning
Stanford University
93.7 93.9 81.0
ACL-2017 Peng et al.
University of Washington & Carnegie Mellon University
90.4 92.7 78.5
AAAI-2018 Wang et al.
Harbin Institute of Technology
90.3 91.7 78.6
Semeval-2015 Du et al.
Peking University
89.1 91.3 75.7
Semeval-2015 Almeida & Martins
Priberam Labs & Instituto de Telecomunicacoes
88.2 90.9 76.4


paper model DM PAS PSD
ACL-2018 Dozat & Manning
Stanford University
88.9 90.6 79.4
ACL-2017 Peng et al.
University of Washington & Carnegie Mellon University
85.3 89.0 76.4
AAAI-2018 Wang et al.
Harbin Institute of Technology
84.9 87.6 75.9
Semeval-2015 Du et al.
Peking University
81.8 87.2 73.3
Semeval-2015 Almeida & Martins
Priberam Labs & Instituto de Telecomunicacoes
81.8 86.9 74.8

Logical Form Representation

First-order logic is used as a semantic representation language (MRL), which is designed by the creator of the application to suit the application’s need independent of natural language. Some MRL data sets are based on Prolog database, where two well-known semantic parsing benchmark data sets are Jobs and Geo 10.

Here we explain the features of the Geoquery representation language through a sample query:

Input: "What is the largest city in Texas?"
Query: answer(C, largest(C, (city(C), loc(C, S), const(S, stateid(texas)))))

Objects are represented as logical terms and are typed with a semantic category using logical functions applied to possibly ambiguous English constants (e.g. stateid(Mississippi), riverid(Mississippi)). Relationships between objects are expressed using predicates; for instance, loc(X, Y) states that X is located in Y.

We also need to handle quantifiers such as ‘largest’. We represent these using meta-predicates for which at least one argument is a conjunction of literals. For example, largest(X, Goal) states that the object X satisfies Goal and is the largest object that does so, using the appropriate measure of size for objects of its type (e.g. area for states, population for cities). Finally, an unspecified object required as an argument to a predicate can appear elsewhere in the sentence, requiring the use of the predicate const(X, C) to bind the variable X to the constant C.


  • Geo 880: A set of 880 queries to a database of U.S. geography containing basic information about the U.S. states like population, area, capital city, neighboring states, and so on. It is originally annotated with Prolog style semantics. Geo has 880 instances split into a training set of 680 training examples and 200 test examples 11.

  • Jobs 640: A set of 640 queries to a database of computer-related job postings, such as job announcements, from the USENET newsgroup austin.jobs. Information from these job postings are extracted to create a database which contains the following types of information: 1) the job title, 2) the company, 3) the recruiter, 4) the location, 5) the salary, 6) the languages and platforms used, and 7) required or desired years of experience and degrees. The training-test split contains 500 training and 140 test instances.

  • ATIS: A set of 5,410 queries to a flight booking system, and the representations are of lambda-calculus logical forms. The ATIS database consists of data obtained from the Official Airline Guide (OAG, 1990), organized under a relational schema. The database remained fixed throughout the pilot phase. It contains information about flights, fares, airlines, cities, airports, and ground services, and includes twenty-five supporting tables. The large majority of the questions posed by subjects can be answered from the database with a single relational query.

Experiment & Results

The results are based on accuracies which are defined as the proportion of the input sentences that are correctly parsed to their gold standard logical forms.

paper model Geo880 Jobs640 ATIS
EMNLP-18 Xu et al.
Tencent AI Lab & IBM Research & Peking University
88.1 91.2 85.9
ACL-17 Rabinovich et al.
UC Berkeley
85.7 92.9 85.3
ACL-11 Liang et al.
UC Berkeley
87.9 90.7 -
ACL-14 Wang et al.
University of Washington & Allen Institute for AI
90.4 - 91.3
EMNLP-13 Kwiatkowski et al.
University of Washington
89.0 - -
NAACL-15 Zhao and Huang
City University of New York
88.9 85.0 84.2
ACL-13 Poon
Microsoft Research
- - 83.5
EMNLP-11 Kwiatkowski et al.
University of Edinburgh & University of Washington
88.6 - 82.8
EMNLP-10 Kwiatkowski et al.
University of Edinburgh & University of Washington
87.9 - 71.4
ACL-16 Dong & Lapata -Seq2Tree
University of Edinburgh
87.1 90.0 85.3
ACL-07 Wong and Mooney
University of Texas
86.6 - -
EMNLP-CoNLL-07 Zettlemoyer and Collins
86.1 - 84.6
ACL-16 Dong & Lapata -Seq2Seq
University of Edinburgh
85.0 87.1 84.2
ACL-16 Jia & Liang
Stanford University
85.0 - 76.3
EMNLP-08 Lu et al.
Singapore-MIT alliance & National University of Singapore & MIT
81.8 - -
ECML-01 Tang & Mooney
University of Texas
79.8 79.4 -
UAI-05 Zettlemoyer and Collins
79.3 79.3 -
IUI-03 Poposcu et al.
University of Washington
77.5 88.0 -
NAACL-06 Wong and Mooney
University of Texas
74.8 - -
CoNLL-05 Ge and Mooney
University of Texas
72.3 - -
COLING-ACL-06 Kate and Mooney
University of Texas
71.7 - -

SQL Query

In recent years, there has been considerable interest in automated synthesis of programs from informal specifications. A particularly promising application domain for such computer-aided programming techniques is the automated synthesis of database queries. Although many end-users need to query data stored in some relational database, they typically lack the expertise to write complex queries in declarative query languages such as SQL. As a result, there has been a flurry of interest in automatically synthesizing SQL queries from informal specifications. Here we will talk about the techniques that can generate SQL queries from natural language (NL) descriptions.

For example, for the sentence “how many CFL teams are from York College”, we will have the SRL query like:

CFLDraft WHERE College = "York"


  • WikiSQL: WikiSQL is a collection of questions, corresponding SQL queries, and SQL tables. It is the largest hand-annotated semantic parsing dataset to data - it is an order of magnitude larger than other datasets that have logical forms, either in terms of the number of examples or the number of tables. The corpus consists of 80654 hand-annotated instances of natural language questions, SQL queries, and SQL tables extracted from 24241 HTML tables from Wikipedia 12.

Experiment and Results

To learn more about the experiment and results on WikiSQL, see https://github.com/salesforce/WikiSQL for more details.

  1. Rohit J Kate and YukWahWong. 2010. Semantic parsing: The task, the state of the art and the future. Tutorial Abstracts of ACL 2010 page 6. 

  2. Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for SemBanking. In Proc. of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. 

  3. Oepen, S., Kuhlmann, M., Miyao, Y., Zeman, D., Flickinger, D., Hajic, J., Ivanova, Angelina, Zhang, Y. 2014. SemEval 2014 Task 8: Broad-coverage semantic dependency parsing. In Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. 

  4. Gildea, D., & Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28, 71:245–288. 

  5. Flickinger, D., Zhang, Y., & Kordoni, V. 2012. DeepBank. A dynamically annotated treebank of the Wall Street Journal. In Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories (p. 85–96). Lisbon, Portugal: Edições Colibri. 

  6. Oepen, S., & Lønning, J. T. 2006. Discriminantbased MRS banking. In Proceedings of the 5th International Conference on Language Resources and Evaluation (p. 1250–1255). Genoa, Italy. 

  7. Ivanova, A., Oepen, S., Øvrelid, L., & Flickinger, D. 2012. Who did what to whom? A contrastive study of syntacto-semantic dependencies. In Proceedings of the Sixth Linguistic Annotation Workshop (p. 2–11). Jeju, Republic of Korea. 

  8. Hajic, J., Hajicová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., … & Semecký, J. 2012. Announcing Prague Czech-English Dependency Treebank 2.0. In LREC (pp. 3153-3160). 

  9. Miyao, Y., Oepen, S., & Zeman, D. 2014. In-house: An ensemble of pre-existing off-the-shelf parsers. In Proceedings of the 8th International Workshop on Semantic Evaluation. Dublin, Ireland. 

  10. Lappoon R. Tang and Raymond J. Mooney. 2001. Using multiple clause constructors in inductive logic programming for semantic parsing. In Proceedings of the 12th ECML, pages 466–477, Freiburg, Germany. 

  11. Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the 21st UAI, pages 658–666, Toronto, ON. 

  12. Zhong, V., Xiong, C. and Socher, R., 2017. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103.